Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Is_Empty, Not_Empty, Like and Not_Like to Filter_Condition #3775

Merged
merged 25 commits into from
Oct 10, 2022

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Oct 6, 2022

Pull Request Description

Implements https://www.pivotaltracker.com/story/show/183389890

Important Notes

Checklist

  • Ensure that the SQL-like -> Regex conversion happens only once in the in-memory vectorized operation when comparing with a scalar (currently it is recomputed on each element which is very wasteful).
  • The documentation has been updated if necessary.
  • All code conforms to the
    Scala,
    Java,
    and
    Rust
    style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH
      ./run ide build and ./run ide watch.

Base automatically changed from wip/radeusgd/base-types-filter-condition-183389901 to develop October 7, 2022 04:02
@radeusgd radeusgd force-pushed the wip/radeusgd/filter-condition-empty-like-183389890 branch from a4f9ae9 to 3c16613 Compare October 7, 2022 08:40
@radeusgd radeusgd marked this pull request as ready for review October 7, 2022 09:45
@radeusgd radeusgd force-pushed the wip/radeusgd/filter-condition-empty-like-183389890 branch from 54111d3 to 62ab69e Compare October 7, 2022 09:52
@@ -2039,7 +2039,7 @@ buildEngineDistribution := {
log.info(s"Engine package created at $root")
}

val stdBitsProjects = List("Base", "Database", "Google_Api", "Image", "Table")
val stdBitsProjects = List("Base", "Database", "Google_Api", "Image", "Table", "All")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly it is not possible to skip the suffix so buildStdLibAll looks good!

Returns a column with boolean values indicating whether values of this
column fit between the lower and upper bounds (both ends inclusive).
between : (Column | Any) -> (Column | Any) -> Column
between self lower upper =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: maybe we could think of adding a parameter if it shouldn't be inclusive by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL BETWEEN this is more or less supposed to emulate is always inclusive. So if we added an option, we'd have to make it unsupported on Database backend or emulate it using regular comparison operators.

But we already can always just apply two filters: Greater+Less to get an exclusive between.

So I'm not sure if it is necessary to add it, I guess it's mostly a question to @jdunkerley who designed the original set of Filter_Conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was aiming to keep BETWEEN simple and just match SQLs behavior with a view that is what our users would expect.

@radeusgd radeusgd force-pushed the wip/radeusgd/filter-condition-empty-like-183389890 branch 2 times, most recently from c6010aa to 38b9b21 Compare October 7, 2022 11:45
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

One thing to check on the like conversion to regex to ensure it matches only the whole string, possible it does and I read the code wrongly.

* <p>Special regex characters present in the input pattern are quoted to match them literally
* according to the SQL-like format.
*/
public static String sql_like_pattern_to_regex(String sql_pattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think the pattern needs to include ^ and $ to indicate the start and end of the value.

hello_ ==> ^hello_$ as SQL like patterns mean that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained below, I already test that the LIKE matches only the whole string.

I prefer to use the specific 'match whole string' regex function and keep the expression itself free of ^$ because they can have different meaning (sometimes they can also mean match beginning/end of line which may give us slightly different semantics.

Returns a column with boolean values indicating whether values of this
column fit between the lower and upper bounds (both ends inclusive).
between : (Column | Any) -> (Column | Any) -> Column
between self lower upper =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was aiming to keep BETWEEN simple and just match SQLs behavior with a view that is what our users would expect.

@radeusgd radeusgd self-assigned this Oct 10, 2022
@@ -165,9 +168,21 @@ spec = Test.group "Vectors" <|
txtvec.filter (Filter_Condition.Between "b" "c") . should_equal ["bbb", "baaa"]
Test.expect_panic_with (txtvec.filter (Filter_Condition.Starts_With 42)) Unsupported_Argument_Types_Data

["", Nothing, " ", "a"].filter (Filter_Condition.Is_Empty) . should_equal ["", Nothing]
["", Nothing, " ", "a"].filter (Filter_Condition.Not_Empty) . should_equal [" ", "a"]
["abab", "aaabaaaa", "ba"].filter (Filter_Condition.Like "ba") . should_equal ["ba"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdunkerley this is the test that verifies that our LIKE matches the whole string if no wildcards are added. We don't need ^$ because we do the regex matches check which itself is meant to check if the whole text matches the pattern. We'd need ^$ if we used something like search instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great - thanks just wanted to check.

@radeusgd radeusgd force-pushed the wip/radeusgd/filter-condition-empty-like-183389890 branch from 4bb1a02 to fbce8ad Compare October 10, 2022 11:23
@radeusgd radeusgd force-pushed the wip/radeusgd/filter-condition-empty-like-183389890 branch from 6c45b90 to 0cbf8d1 Compare October 10, 2022 16:13
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Oct 10, 2022
@mergify mergify bot merged commit 592a851 into develop Oct 10, 2022
@mergify mergify bot deleted the wip/radeusgd/filter-condition-empty-like-183389890 branch October 10, 2022 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants