-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add like operator #1046
Add like operator #1046
Conversation
* | ||
* @return true if the LIKE pattern matches a given {@code String} | ||
*/ | ||
public boolean isMatchedWith(String value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need our own matcher in the ScalarDB layer to avoid scan-after-write properly in Snapshot
. The matcher uses regular expressions converted from "like" patterns.
if (escape.isEmpty()) { | ||
// MySQL accepts an empty escape character to disable the escape function, but MariaDB ignores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a tricky conversion for MySQL and MariaDB to handle both in the same manner since we don't want to distinguish them at this moment. But, in the future, we probably support the MariaDB driver and introduce RdbEngineMariadb
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an idea, but how about giving up disabling escape on ScalarDB side like MariaDB does? In this way, users would need to use the default escape or specify a single escape character, but can't specify an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the idea. That's one of the options I thought. However, all other databases other than MariaDB support disabling the escape, and it's probably convenient for users. So, can we go with it as is?
if (escape.isEmpty()) { | ||
// Even if users do not want to use escape character in ScalarDB (i.e., specifying "" for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need special care for SQL Server due to its dialect.
LikeExpression(TextColumn column, Operator operator, String escape) { | ||
super(column, operator); | ||
if (operator != Operator.LIKE && operator != Operator.NOT_LIKE) { | ||
throw new IllegalArgumentException("Operator must be like or not-like"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] How about adding the received unexpected operator in the error message to make debugging easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9654df0.
* @param escape an escape character. | ||
* @return the equivalent Java regular expression of the given pattern | ||
*/ | ||
private String convertRegexPatternFrom(String likePattern, Character escape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private String convertRegexPatternFrom(String likePattern, Character escape) { | |
private String convertRegexPatternFrom(String likePattern, @Nullable Character escape) { |
escape
can be null, right?
Probably an additional import
for this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Immutable | ||
public class LikeExpression extends ConditionalExpression { | ||
private static final String DEFAULT_ESCAPE_CHAR = "\\"; | ||
@Nonnull private final String escape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[trivial] IIUC, instance variables are regarded as non-null by default and we can remove this @Nonnull
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9654df0.
* @param operator an operator used to compare the target column | ||
* @param escape an escape character for the like operator | ||
*/ | ||
LikeExpression(TextColumn column, Operator operator, String escape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems escape
is basically used as a single length string (or empty.) How about simply treating it as a nullable character? It would work as same as empty string if it's null.
@Nullable private final Character escape;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I also considered using Character
as you suggested. But, in that case, users need to pass null
to disable the escape character, right? (e.g., .isLikeText("%xyz", null)
) This is because we want to use the default escape char \
for .isLikeText("%xyz")
. I prefer passing an empty string "" rather than null since its semantics is the same as SQL. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users need to pass null to disable the escape character, right? (e.g., .isLikeText("%xyz", null))
Yes.
I understand your point to use consistent semantics with SQL. I think I was confused a bit because com.scalar.db.storage.jdbc.RdbEngineStrategy#getEscape
returns null while com.scalar.db.api.LikeExpression#getEscape
returns an empty. How about making RdbEngineStrategy#getEscape
return an empty as well as LikeExpression#getEscape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading your comment #1046 (comment), I think my suggestion above was wrong... Please ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer passing an empty string "" rather than null since its semantics is the same as SQL.
👍
Maybe #1046 (comment) is also related. It would be great if you take care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe #1046 (comment) is also related. It would be great if you take care of it.
Sure. I will revise it. Thanks for the feedback.
* | ||
* @param column a target column used to compare | ||
* @param operator an operator used to compare the target column | ||
* @param escape an escape character for the like operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing empty string looks to disable escapes. How about describing the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9654df0.
if (escape.isEmpty()) { | ||
// MySQL accepts an empty escape character to disable the escape function, but MariaDB ignores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an idea, but how about giving up disabling escape on ScalarDB side like MariaDB does? In this way, users would need to use the default escape or specify a single escape character, but can't specify an empty string.
@Override | ||
public String getEscape(LikeExpression likeExpression) { | ||
String escape = likeExpression.getEscape(); | ||
return escape.isEmpty() ? null : escape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also needed in RdbEnginePostgresql
?
Moving this logic to com.scalar.db.storage.jdbc.RdbEngineStrategy#getEscape
might be an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we always need the escape clause for PostgreSQL (if it's empty, we need to return the empty string as is). But for Oracle and SQLite, we need to remove the escape clause to disable the escape character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several comments. Please take a look when you have time!
* @param escape an escape character. | ||
* @return the equivalent Java regular expression of the given pattern | ||
*/ | ||
private String convertRegexPatternFrom(String likePattern, @Nullable Character escape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this logic into Snapshot
since it's only used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular expression is only used in Snapshot
, so technically, yes. But I think we should check if a specified pattern is an expected one because throwing an exception with "pattern must not ..." from Snapshot
is weird. So, the choices are 1) keeping this as is and 2) only adding the check here. IMHO, 1) is not so bad since the check and conversion are tightly coupled with. What do you think?
p.s. isMatchedWith
can be moved to Snapshot
in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should choose the 2)
option. We should separate the regular expression thing from LikeExpression
because the responsibility of the regular expression thing is not for LikeExpression
but Consensus Commit (e.g., JdbcTransaction
doesn't use the regular expression). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the responsibility of the regular expression thing is not for LikeExpression but Consensus Commit (e.g., JdbcTransaction doesn't use the regular expression).
Right. Totally agreed. Thanks for pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0c14b83. Note that I tried to simply check it with String.matched(regex)
, but it's hard to describe the expected one, and it will be error-prone for some corner cases when escaping the specified escape character (e.g., "++text" with the escape "+" is a possible pattern but "+++text" is not). So, I added a similar check to the original one, although it seems a bit exaggerated.
* | ||
* @return true if the LIKE pattern matches a given {@code String} | ||
*/ | ||
public boolean isMatchedWith(String value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Can we move this logic into Snapshot
since it's only used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0c14b83.
@@ -133,6 +134,19 @@ public void visit(BlobColumn column) { | |||
} | |||
} | |||
|
|||
public void bindLikeClause(LikeExpression likeExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we can't use the visitor pattern only for the LIKE operation. Honestly, I don't find it ideal, but I'm out of alternatives, and maybe it's acceptable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have the same feelings...
if ((condition.getOperator().equals(Operator.LIKE) | ||
|| condition.getOperator().equals(Operator.NOT_LIKE))) { | ||
binder.bindLikeClause((LikeExpression) condition); | ||
} else if (!condition.getColumn().hasNullValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't directly related to this PR, but do we need to bind a NULL value when condition.getColumn().hasNullValue()
returns true, similar to what PreparedStatementBinder
does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need bindings for the columns with null values in conjunctions because the string IS (NOT) NULL
is used for Operator.IS(_NOT)_NULL
. But I'm not sure we should also use bind markers for them. Do you have any idea about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. We don't need to bind null values in conjunctions. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several minor suggestions, but LGTM! Thank you!
@@ -37,6 +37,9 @@ protected Map<String, String> getCreationOptions() { | |||
protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor suggestion. I think we can modify this in the following way, ensuring we won't need any adjustments for similar changes in the future:
@Override
protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() {
return super.getOperatorAndDataTypeListForTest().stream()
.filter(
operatorAndDataType -> {
// Cosmos DB only supports the 'equal' and 'not equal' and 'is null' and 'is not null'
// conditions for BLOB type
if (operatorAndDataType.getDataType() == DataType.BLOB) {
return operatorAndDataType.getOperator() == Operator.EQ
|| operatorAndDataType.getOperator() == Operator.NE
|| operatorAndDataType.getOperator() == Operator.IS_NULL
|| operatorAndDataType.getOperator() == Operator.IS_NOT_NULL;
}
return true;
})
.collect(Collectors.toList());
}
@@ -27,6 +27,9 @@ protected Map<String, String> getCreationOptions() { | |||
protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@Override
protected List<OperatorAndDataType> getOperatorAndDataTypeListForTest() {
return super.getOperatorAndDataTypeListForTest().stream()
.filter(
operatorAndDataType -> {
// DynamoDB only supports the 'equal' and 'not equal' and 'is null' and 'is not null'
// conditions for BOOLEAN type
if (operatorAndDataType.getDataType() == DataType.BOOLEAN) {
return operatorAndDataType.getOperator() == Operator.EQ
|| operatorAndDataType.getOperator() == Operator.NE
|| operatorAndDataType.getOperator() == Operator.IS_NULL
|| operatorAndDataType.getOperator() == Operator.IS_NOT_NULL;
}
return true;
})
.collect(Collectors.toList());
}
if (likePattern == null) { | ||
throw new IllegalArgumentException("LIKE pattern must not be null"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the validation for likePattern is already done here, I think using assertions is more appropriate:
if (likePattern == null) { | |
throw new IllegalArgumentException("LIKE pattern must not be null"); | |
} | |
assert likePattern != null : "LIKE pattern must not be null"; |
} else if (nextChar == escape) { | ||
out.append(Pattern.quote(Character.toString(nextChar))); | ||
} else { | ||
throw new IllegalArgumentException("LIKE pattern must not include only escape character"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
throw new IllegalArgumentException("LIKE pattern must not include only escape character"); | |
throw new AssertionError("LIKE pattern must not include only escape character"); |
throw new IllegalArgumentException("LIKE pattern must not include only escape character"); | ||
} | ||
} else if (escape != null && c == escape) { | ||
throw new IllegalArgumentException("LIKE pattern must not end with escape character"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new IllegalArgumentException("LIKE pattern must not end with escape character"); | |
throw new AssertionError("LIKE pattern must not end with escape character"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides a minor suggestion, LGTM, thank you!
core/src/test/java/com/scalar/db/storage/jdbc/query/QueryBuilderTest.java
Outdated
Show resolved
Hide resolved
…erTest.java Co-authored-by: Vincent Guilpain <vincent.guilpain@scalar-labs.com>
Co-authored-by: Jun Nemoto <jun.nemoto@gmail.com> Co-authored-by: Vincent Guilpain <vincent.guilpain@scalar-labs.com>
Co-authored-by: Jun Nemoto <jun.nemoto@gmail.com> Co-authored-by: Vincent Guilpain <vincent.guilpain@scalar-labs.com>
This PR adds the like operator for the relational scan. In the LIKE conditional expression, you can specify an SQL-LIKE-like pattern and an escape character (optional). The following specification of wild cards and escape characters is basically the same as PostgreSQL and MySQL.
_
matches any single character%
matches any sequence of zero or more characters\
works as the escape character by default