Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an interface to add conditions in ScanAll #889

Merged
merged 5 commits into from
Jun 7, 2023

Conversation

jnmt
Copy link
Contributor

@jnmt jnmt commented May 24, 2023

This PR adds an interface to add arbitrary conditions in ScanAll, i.e., to support a relational scan for JDBC databases. We decided to add them to the existing ScanAll instead of adding a new RelationalScan to keep the backward compatibility and simplicity of the current class inheritance. Actual relational scan processing will be introduced by another PR.

By this PR, users can add a kind of "where()" clause after calling "all()" in a Scan builder. We currently support only a disjunctive normal form (DNF) or a conjunctive normal form (CNF), like the following patterns.

  • .where( conditionalExpression )
  • .where( conditionalExpression | andConditionSet ) [ .or( conditionalExpression | andConditionSet ) ] ...
  • .where( conditionalExpression | orConditionSet ) [ .and( conditionalExpression | orConditionSet ) ] ...

Users can use ConditionSetBuilder to build AndConditionSet and OrCondtionSet, which is a set of conjunctive (a && b && ...) or disjunctive (a || b || ...) conditions. Note that users need to call clearConditions() before updating a where clause in an existing ScallAll object.

See also inline comments for details.

@jnmt jnmt added the enhancement New feature or request label May 24, 2023
@jnmt jnmt self-assigned this May 24, 2023
@@ -34,6 +37,7 @@ public class Scan extends Selection {
private Optional<Key> endClusteringKey;
private boolean endInclusive;
private int limit;
private final Set<Conjunction> conjunctions;
Copy link
Contributor Author

@jnmt jnmt May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, we use a disjunctive set of conjunctions (i.e., as a DNF) to manage whole conditions.

Comment on lines 1105 to 1108
scan.withConjunctions(
Sets.cartesianProduct(disjunctions).stream()
.map(Scan.Conjunction::of)
.collect(Collectors.toSet()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building a where clause with ".and().and()...", the builder holds a (conjunctive) set of disjunctions, and its conjunction should be empty. In this case, we convert CNF to DNF according to the distributive law.

.map(Scan.Conjunction::of)
.collect(Collectors.toSet()));
} else {
scan.withConjunctions(conjunctions);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast, when building a where clause with ".or().or()...", the builder naturally holds conditions as a DNF. So, it just adds them.

*/
@Deprecated
@Override
public ScanAll withOrdering(Ordering ordering) {
throw new UnsupportedOperationException();
return (ScanAll) super.withOrdering(ordering);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike a current ScanAll, a relational scan accept orderings. So, now we need to implement this, although we would like to use this internally for the builder and users do not have to use this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed that super.withOrdering() returns ScanAll in terms of the type signature and it seems an unsafe downcast to me. I'm wondering if any type check is needed (or applying Generics type parameter, but it takes effort...) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was safe because the caller object of super.withOrdering() here is always ScanAll (and we are already doing the same in other similar methods).
@brfrn169 What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it is not safe. If we change the withOrdering() method of the superclass Scan so that it does not return this, a ClassCastException may be thrown. However, at present, Scan.withOrdering() does return this, and since we have control over the Scan class, I am comfortable with the downcast.

@komamitsu What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems there is no reasonable solution to it. I think we can go with the implementation as is.

I wish Java had a special generics type THIS or something which represents the concrete class type itself so that we could write the method like public THIS withOrdering()...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169 @komamitsu Thank you! I left this as is.

*/
@Deprecated
@Override
public ScanAll withOrdering(Ordering ordering) {
throw new UnsupportedOperationException();
return (ScanAll) super.withOrdering(ordering);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed that super.withOrdering() returns ScanAll in terms of the type signature and it seems an unsafe downcast to me. I'm wondering if any type check is needed (or applying Generics type parameter, but it takes effort...) ?

}

@Immutable
public static class AndConditionSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Conjunction and AndConditionSet sound similar to me. Could you add a comment to clarify what they are ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Essentially, both are the same. But I intentionally used the different classes for the internal representation (Conjunction) and user API (AndConditionSet) for ease of understanding since the conjunction would be unfamiliar for many users. From the developer's perspective, I just prefer the term "conjunction" since it's short and technically correct to use many places internal modules, including gRPC proto. Do you think "internal use" or something like that is enough here? Or should we use AndConditionSet also for the internal representation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we should move AndConditionSet and OrConditionSet and their builder classes into ScanBuilder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the explanation. Either option, adding a comment that describes the relationship between Conjunction and AndConditionSet and the usages or using AndConditionSet for internal use, sounds fine with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@komamitsu @brfrn169 Thanks for the comments. I will move both sets to the builder and describe the relationship and usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d3bb194.

@komamitsu komamitsu self-requested a review May 25, 2023 03:02
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed it briefly and left some comments. As we should examine it thoroughly, let me review it again later.

Comment on lines +300 to +303
Scan withConjunctions(Collection<Conjunction> conjunctions) {
this.conjunctions.addAll(conjunctions);
return this;
}
Copy link
Collaborator

@brfrn169 brfrn169 May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are going to make Scan an immutable class, please don't add a mutation method or make it deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstand something, but this method is package private. Is "deprecated" still necessary? I imagined current public withFooBar() mutation methods would be changed to package private ones like this function in v5.0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169 Just a confirmation. Will those mutation methods themselves be deleted in v5.0.0? In addition, for example, Scan will have ScanBuilder, and the builder can directly access the private member of Scan without using withFooBar(). That's why we should make it deprecated even if a package private method, is my guess correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Since the method in question is package-private and not public, there's no need to remove it or mark it as deprecated. Let's keep it as it is. If further changes or considerations are needed in the future, we can revisit this decision.

* @return set of {@code ConditionalExpression} which this conjunction is composed of
*/
public Set<ConditionalExpression> getConditions() {
return ImmutableSet.copyOf(conditions);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the private constructor always gets ImmutableSet from the static factory methods, we don't need to copy the conditions here.

Suggested change
return ImmutableSet.copyOf(conditions);
return conditions;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. Fixed in d3bb194.


@Immutable
public static class Conjunction {
private final Set<ConditionalExpression> conditions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the private constructor always gets ImmutableSet from the static factory methods, we might be able to make it ImmutableSet to make it more explicit.

Suggested change
private final Set<ConditionalExpression> conditions;
private final ImmutableSet<ConditionalExpression> conditions;

@komamitsu What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It's a good improvement in terms of type consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. Fixed in d3bb194.


private final Set<ConditionalExpression> conditions;

AndOrConditionSetBuilder(Set<ConditionalExpression> conditions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, we can make it private:

Suggested change
AndOrConditionSetBuilder(Set<ConditionalExpression> conditions) {
private AndOrConditionSetBuilder(Set<ConditionalExpression> conditions) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! Fixed in d3bb194.

*/
@Deprecated
@Override
public ScanAll withOrdering(Ordering ordering) {
throw new UnsupportedOperationException();
return (ScanAll) super.withOrdering(ordering);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it is not safe. If we change the withOrdering() method of the superclass Scan so that it does not return this, a ClassCastException may be thrown. However, at present, Scan.withOrdering() does return this, and since we have control over the Scan class, I am comfortable with the downcast.

@komamitsu What do you think?

Comment on lines 113 to 127
// private Set<Conjunction> prepareConjunctions() {
// return ImmutableSet.of(
// RelationalScan.Conjunction.conjunction(
// ConditionBuilder.column(ANY_NAME_1).isEqualToText(ANY_TEXT_1)),
// RelationalScan.Conjunction.conjunction(
// ConditionBuilder.column(ANY_NAME_1).isEqualToText(ANY_TEXT_2)));
// }
//
// private Set<Conjunction> prepareConjunctionsWithDifferentOrder() {
// return ImmutableSet.of(
// RelationalScan.Conjunction.conjunction(
// ConditionBuilder.column(ANY_NAME_1).isEqualToText(ANY_TEXT_2)),
// RelationalScan.Conjunction.conjunction(
// ConditionBuilder.column(ANY_NAME_1).isEqualToText(ANY_TEXT_1)));
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this commented out code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. Fixed in d3bb194.

Comment on lines 475 to 486
private BuildableScanAllWithWhere(String namespaceName, String tableName) {
this.namespaceName = namespaceName;
this.tableName = tableName;
this.condition = null;
}

private BuildableScanAllWithWhere(
String namespaceName, String tableName, ConditionalExpression condition) {
this.namespaceName = namespaceName;
this.tableName = tableName;
this.condition = condition;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private BuildableScanAllWithWhere(String namespaceName, String tableName) {
this.namespaceName = namespaceName;
this.tableName = tableName;
this.condition = null;
}
private BuildableScanAllWithWhere(
String namespaceName, String tableName, ConditionalExpression condition) {
this.namespaceName = namespaceName;
this.tableName = tableName;
this.condition = condition;
}
private BuildableScanAllWithWhere(String namespaceName, String tableName) {
this(namespaceName, tableName, null);
}
private BuildableScanAllWithWhere(
String namespaceName, String tableName, @Nullable ConditionalExpression condition) {
this.namespaceName = namespaceName;
this.tableName = tableName;
this.condition = condition;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the advice. Fixed in d3bb194.

Comment on lines 811 to 812
throw new UnsupportedOperationException(
"This operation is supported only when no conditions are specified at all.");
Copy link
Collaborator

@brfrn169 brfrn169 May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of UnsupportedOperationException, I think IllegalStateException is proper in this case. And I think we can suggest using clearConditions() to clear the conditions in the message:

Suggested change
throw new UnsupportedOperationException(
"This operation is supported only when no conditions are specified at all.");
throw new IllegalStateException(
"This operation is supported only when no conditions are specified at all. "
+ "If you want to modify the condition, please use clearConditions() to remove all existing conditions first.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Fixed in d3bb194.

}

@Immutable
public static class AndConditionSet {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we should move AndConditionSet and OrConditionSet and their builder classes into ScanBuilder?

@jnmt
Copy link
Contributor Author

jnmt commented May 29, 2023

@brfrn169 @komamitsu Thank you for the valuable feedback. I fixed all other than this. PTAL again when you get a chance.

@jnmt jnmt requested a review from brfrn169 May 29, 2023 05:27
Comment on lines 468 to 469
protected final Set<Scan.Conjunction> conjunctions = new HashSet<>();
protected final List<Set<ConditionalExpression>> disjunctions = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this delayed question. Why the types of conjunctions and disjunctions are different (disjunctions needs an order? disjunctions should be an internal class as well as conjunctions?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. Actually, disjunctions don't need the order information, but Sets.cartesianProduct(), which is used for converting a CNF, requires a list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If the reason of having different types for conjunctions and disjunctions is only for Sets.cartesianProduct(), we can write like this?

  Sets.cartesianProduct(
      disjunctions.stream().map(Disjunction::getConditions).collect(Collectors.toList()))
      .stream()
      .map(Conjunction::of)
      .collect(Collectors.toSet()));

It would be great if the types of conjunctions and disjunctions are the same-level abstraction because I guess the readers of this code expect that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's better for readers. Thank you. I think we can do just like the following. BTW, I'm not going to introduce Scan.Disjunction since the CNF and disjunctions are only used in the ScanBuilder internally at least at this moment. Even so, is it weird and should we have it?

      scan.withConjunctions(
          Sets.cartesianProduct(new ArrayList<>(disjunctions)).stream()
              .map(Scan.Conjunction::of)
              .collect(Collectors.toSet()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm not going to introduce Scan.Disjunction since the CNF and disjunctions are only used in the ScanBuilder internally at least at this moment. Even so, is it weird and should we have it?

@komamitsu Or, how about using Set<Set<ConditionalExpression>> for conjunctions when building and converting them at the last, similar to the disjunctions? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using Set<Set> for conjunctions

It means like this?

    protected final Set<Set<ConditionalExpression>> conjunctions = new HashSet<>();
    protected final Set<Set<ConditionalExpression>> disjunctions = new HashSet<>();

If so, completely agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I meant. Fixed in a64862d. PTAL when you get a chance.

@jnmt jnmt requested a review from komamitsu May 30, 2023 08:25
@jnmt
Copy link
Contributor Author

jnmt commented May 30, 2023

Dear all,
Sorry, I missed some cases for specifying orderings without conditions, updating a namespace and table after the where clause, and clearing the namespace after the where clause. Fixed in 2c305c9. PTAL, including those.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

scan.withConjunctions(ImmutableSet.of(Scan.Conjunction.of(condition)));
} else if (conjunctions.isEmpty()) {
scan.withConjunctions(
Sets.cartesianProduct(new ArrayList<>(disjunctions)).stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for the delayed comment. I thought I put the following comment here, but it's gone...)

This seems to generate O x A conditions (O sets of A conditions), where O is the number of ORs and A is the number of ANDs. I'm not sure how these sets of conditions would be used in next PRs, but I guess ScalarDB fetches all entries and then filter the fetched records using these conditions set.

(This would be a silly question..., but) it's not possible to pushdown original conditions basically as-is to underlying RDBMS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the question. It will be used in something like the following way. We don't filter results using the conditions (at least for JDBC storage) in the ScalarDB layer.

  private String relationalConditionSqlString() {
   (snip)
    List<String> conjunctionList = new ArrayList<>();
    conjunctions.forEach(
        conjunction -> {
          List<String> conditions = new ArrayList<>();
          conjunction
              .getConditions()
              .forEach(
                  condition ->
                      conditions.add(condition.getColumn().getName() + convert(condition.getOperator())));
          conjunctionList.add(String.join(" AND ", conditions));
        });
    return " WHERE " + String.join(" OR ", conjunctionList);
  }

(This would be a silly question..., but) it's not possible to pushdown original conditions basically as-is to underlying RDBMS?

Do you mean pushing down both DNF and CNF as is? It's technically possible, and it sometimes would perform better, but I think it trades off a development effort and maintainability for two internal representations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer. It sounds good!

What I meant was...
converting the following Scan

    Scan.newBuilder()
        .namespace("my_ns")
        .table("my_tbl")
        .all()
        .where(ConditionBuilder.column("c1").isEqualToInt(10))
        .and(ConditionBuilder.column("c2").isEqualToBigInt(20))
        .and(
            ConditionSetBuilder.condition(
                ConditionBuilder.column("c3").isEqualToInt(30))
                .or(ConditionBuilder.column("c4").isEqualToInt(40))
                .build())
           :
        .build();

into the following SQL

SELECT ...
FROM my_ns.my_tbl
WHERE c1 = 10 AND c2 = 20 AND (c3 = 30 OR c4 = 40)

although I'm not sure it works well 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That's pushing down a CNF as is. Yeah, as I said, technically, it works if we make Scan always have either conjunctions or disjunctions. But my concern is that we probably need to do the same for the gRPC proto and take care of which one should be used in many places. Anyway, this is kind of an internal part; we can revisit this in the future. Any additional discussions are welcome. Thanks for the comment.

Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!
(Very sorry for the late review 🙇 )

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!
(Very sorry for the late review 🙇 )

@feeblefakie feeblefakie merged commit 1bd01a5 into master Jun 7, 2023
12 checks passed
@feeblefakie feeblefakie deleted the add-condition-interface-for-scanall branch June 7, 2023 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants