Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add like operator #1046

Merged
merged 8 commits into from
Sep 8, 2023
Merged

Add like operator #1046

merged 8 commits into from
Sep 8, 2023

Conversation

jnmt
Copy link
Contributor

@jnmt jnmt commented Aug 30, 2023

This PR adds the like operator for the relational scan. In the LIKE conditional expression, you can specify an SQL-LIKE-like pattern and an escape character (optional). The following specification of wild cards and escape characters is basically the same as PostgreSQL and MySQL.

  • _ matches any single character
  • % matches any sequence of zero or more characters
  • \ works as the escape character by default
  • Escape character can be disabled by specifying an empty escape character

*
* @return true if the LIKE pattern matches a given {@code String}
*/
public boolean isMatchedWith(String value) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need our own matcher in the ScalarDB layer to avoid scan-after-write properly in Snapshot. The matcher uses regular expressions converted from "like" patterns.

Comment on lines +318 to +319
if (escape.isEmpty()) {
// MySQL accepts an empty escape character to disable the escape function, but MariaDB ignores
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a tricky conversion for MySQL and MariaDB to handle both in the same manner since we don't want to distinguish them at this moment. But, in the future, we probably support the MariaDB driver and introduce RdbEngineMariadb or something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an idea, but how about giving up disabling escape on ScalarDB side like MariaDB does? In this way, users would need to use the default escape or specify a single escape character, but can't specify an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the idea. That's one of the options I thought. However, all other databases other than MariaDB support disabling the escape, and it's probably convenient for users. So, can we go with it as is?

Comment on lines +285 to +286
if (escape.isEmpty()) {
// Even if users do not want to use escape character in ScalarDB (i.e., specifying "" for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need special care for SQL Server due to its dialect.

@jnmt jnmt self-assigned this Aug 31, 2023
@jnmt jnmt added the enhancement New feature or request label Aug 31, 2023
LikeExpression(TextColumn column, Operator operator, String escape) {
super(column, operator);
if (operator != Operator.LIKE && operator != Operator.NOT_LIKE) {
throw new IllegalArgumentException("Operator must be like or not-like");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] How about adding the received unexpected operator in the error message to make debugging easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9654df0.

* @param escape an escape character.
* @return the equivalent Java regular expression of the given pattern
*/
private String convertRegexPatternFrom(String likePattern, Character escape) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private String convertRegexPatternFrom(String likePattern, Character escape) {
private String convertRegexPatternFrom(String likePattern, @Nullable Character escape) {

escape can be null, right?
Probably an additional import for this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9654df0, but removed when fixing others in 0c14b83.

@Immutable
public class LikeExpression extends ConditionalExpression {
private static final String DEFAULT_ESCAPE_CHAR = "\\";
@Nonnull private final String escape;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[trivial] IIUC, instance variables are regarded as non-null by default and we can remove this @Nonnull...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9654df0.

* @param operator an operator used to compare the target column
* @param escape an escape character for the like operator
*/
LikeExpression(TextColumn column, Operator operator, String escape) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems escape is basically used as a single length string (or empty.) How about simply treating it as a nullable character? It would work as same as empty string if it's null.

  @Nullable private final Character escape;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I also considered using Character as you suggested. But, in that case, users need to pass null to disable the escape character, right? (e.g., .isLikeText("%xyz", null)) This is because we want to use the default escape char \ for .isLikeText("%xyz"). I prefer passing an empty string "" rather than null since its semantics is the same as SQL. What do you think?

Copy link
Contributor

@komamitsu komamitsu Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users need to pass null to disable the escape character, right? (e.g., .isLikeText("%xyz", null))
Yes.

I understand your point to use consistent semantics with SQL. I think I was confused a bit because com.scalar.db.storage.jdbc.RdbEngineStrategy#getEscape returns null while com.scalar.db.api.LikeExpression#getEscape returns an empty. How about making RdbEngineStrategy#getEscape return an empty as well as LikeExpression#getEscape?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading your comment #1046 (comment), I think my suggestion above was wrong... Please ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer passing an empty string "" rather than null since its semantics is the same as SQL.
👍

Maybe #1046 (comment) is also related. It would be great if you take care of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe #1046 (comment) is also related. It would be great if you take care of it.

Sure. I will revise it. Thanks for the feedback.

*
* @param column a target column used to compare
* @param operator an operator used to compare the target column
* @param escape an escape character for the like operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing empty string looks to disable escapes. How about describing the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9654df0.

Comment on lines +318 to +319
if (escape.isEmpty()) {
// MySQL accepts an empty escape character to disable the escape function, but MariaDB ignores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an idea, but how about giving up disabling escape on ScalarDB side like MariaDB does? In this way, users would need to use the default escape or specify a single escape character, but can't specify an empty string.

@Override
public String getEscape(LikeExpression likeExpression) {
String escape = likeExpression.getEscape();
return escape.isEmpty() ? null : escape;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also needed in RdbEnginePostgresql?

Moving this logic to com.scalar.db.storage.jdbc.RdbEngineStrategy#getEscape might be an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we always need the escape clause for PostgreSQL (if it's empty, we need to return the empty string as is). But for Oracle and SQLite, we need to remove the escape clause to disable the escape character.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several comments. Please take a look when you have time!

* @param escape an escape character.
* @return the equivalent Java regular expression of the given pattern
*/
private String convertRegexPatternFrom(String likePattern, @Nullable Character escape) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this logic into Snapshot since it's only used there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular expression is only used in Snapshot, so technically, yes. But I think we should check if a specified pattern is an expected one because throwing an exception with "pattern must not ..." from Snapshot is weird. So, the choices are 1) keeping this as is and 2) only adding the check here. IMHO, 1) is not so bad since the check and conversion are tightly coupled with. What do you think?
p.s. isMatchedWith can be moved to Snapshot in any case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should choose the 2) option. We should separate the regular expression thing from LikeExpression because the responsibility of the regular expression thing is not for LikeExpression but Consensus Commit (e.g., JdbcTransaction doesn't use the regular expression). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the responsibility of the regular expression thing is not for LikeExpression but Consensus Commit (e.g., JdbcTransaction doesn't use the regular expression).

Right. Totally agreed. Thanks for pointing it out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0c14b83. Note that I tried to simply check it with String.matched(regex), but it's hard to describe the expected one, and it will be error-prone for some corner cases when escaping the specified escape character (e.g., "++text" with the escape "+" is a possible pattern but "+++text" is not). So, I added a similar check to the original one, although it seems a bit exaggerated.

*
* @return true if the LIKE pattern matches a given {@code String}
*/
public boolean isMatchedWith(String value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Can we move this logic into Snapshot since it's only used there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0c14b83.

@@ -133,6 +134,19 @@ public void visit(BlobColumn column) {
}
}

public void bindLikeClause(LikeExpression likeExpression) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we can't use the visitor pattern only for the LIKE operation. Honestly, I don't find it ideal, but I'm out of alternatives, and maybe it's acceptable...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have the same feelings...

if ((condition.getOperator().equals(Operator.LIKE)
|| condition.getOperator().equals(Operator.NOT_LIKE))) {
binder.bindLikeClause((LikeExpression) condition);
} else if (!condition.getColumn().hasNullValue()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't directly related to this PR, but do we need to bind a NULL value when condition.getColumn().hasNullValue() returns true, similar to what PreparedStatementBinder does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need bindings for the columns with null values in conjunctions because the string IS (NOT) NULL is used for Operator.IS(_NOT)_NULL. But I'm not sure we should also use bind markers for them. Do you have any idea about that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. We don't need to bind null values in conjunctions. Thanks.

Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several minor suggestions, but LGTM! Thank you!

@@ -37,6 +37,9 @@ protected Map<String, String> getCreationOptions() {
protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor suggestion. I think we can modify this in the following way, ensuring we won't need any adjustments for similar changes in the future:

  @Override
  protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() {
    return super.getOperatorAndDataTypeListForTest().stream()
        .filter(
            operatorAndDataType -> {
              // Cosmos DB only supports the 'equal' and 'not equal' and 'is null' and 'is not null'
              // conditions for BLOB type
              if (operatorAndDataType.getDataType() == DataType.BLOB) {
                return operatorAndDataType.getOperator() == Operator.EQ
                    || operatorAndDataType.getOperator() == Operator.NE
                    || operatorAndDataType.getOperator() == Operator.IS_NULL
                    || operatorAndDataType.getOperator() == Operator.IS_NOT_NULL;
              }

              return true;
            })
        .collect(Collectors.toList());
  }

@@ -27,6 +27,9 @@ protected Map<String, String> getCreationOptions() {
protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

  @Override
  protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() {
    return super.getOperatorAndDataTypeListForTest().stream()
        .filter(
            operatorAndDataType -> {
              // DynamoDB only supports the 'equal' and 'not equal' and 'is null' and 'is not null'
              // conditions for BOOLEAN type
              if (operatorAndDataType.getDataType() == DataType.BOOLEAN) {
                return operatorAndDataType.getOperator() == Operator.EQ
                    || operatorAndDataType.getOperator() == Operator.NE
                    || operatorAndDataType.getOperator() == Operator.IS_NULL
                    || operatorAndDataType.getOperator() == Operator.IS_NOT_NULL;
              }

              return true;
            })
        .collect(Collectors.toList());
  }

Comment on lines 560 to 562
if (likePattern == null) {
throw new IllegalArgumentException("LIKE pattern must not be null");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the validation for likePattern is already done here, I think using assertions is more appropriate:

Suggested change
if (likePattern == null) {
throw new IllegalArgumentException("LIKE pattern must not be null");
}
assert likePattern != null : "LIKE pattern must not be null";

} else if (nextChar == escape) {
out.append(Pattern.quote(Character.toString(nextChar)));
} else {
throw new IllegalArgumentException("LIKE pattern must not include only escape character");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Suggested change
throw new IllegalArgumentException("LIKE pattern must not include only escape character");
throw new AssertionError("LIKE pattern must not include only escape character");

throw new IllegalArgumentException("LIKE pattern must not include only escape character");
}
} else if (escape != null && c == escape) {
throw new IllegalArgumentException("LIKE pattern must not end with escape character");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new IllegalArgumentException("LIKE pattern must not end with escape character");
throw new AssertionError("LIKE pattern must not end with escape character");

@jnmt
Copy link
Contributor Author

jnmt commented Sep 5, 2023

@brfrn169 Thank you for the additional feedback! I fixed all in 4aabb08.

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides a minor suggestion, LGTM, thank you!

@jnmt jnmt merged commit e7ae27e into master Sep 8, 2023
13 checks passed
@jnmt jnmt deleted the add-like-operator branch September 8, 2023 01:20
brfrn169 pushed a commit that referenced this pull request Oct 26, 2023
Co-authored-by: Jun Nemoto <jun.nemoto@gmail.com>
Co-authored-by: Vincent Guilpain <vincent.guilpain@scalar-labs.com>
brfrn169 pushed a commit that referenced this pull request Oct 26, 2023
Co-authored-by: Jun Nemoto <jun.nemoto@gmail.com>
Co-authored-by: Vincent Guilpain <vincent.guilpain@scalar-labs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants