Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Is_Empty, Not_Empty, Like and Not_Like to Filter_Condition #3775

Merged
merged 25 commits into from
Oct 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
9906bbf
Preliminary implementation for Base types
radeusgd Oct 6, 2022
b082949
Fix like, move to Java in anticipation of vectorized ops
radeusgd Oct 6, 2022
6ea5061
tests, initial impl of like, empty, update between
radeusgd Oct 6, 2022
6ab3f08
Add empty, like tests for Table
radeusgd Oct 6, 2022
9d5e271
shortcut to rebuild all stdlib quickly
radeusgd Oct 6, 2022
eeb02d3
update previously skipped packages too
radeusgd Oct 6, 2022
28f0542
Add is_empty and like to Database
radeusgd Oct 6, 2022
d3ec514
fix for postgres
radeusgd Oct 6, 2022
a28b63e
Ensure SQL BETWEEN is used for the Between filter in Database
radeusgd Oct 6, 2022
72458e0
fix after rebase
radeusgd Oct 6, 2022
777fa28
Precompile pattern only once
radeusgd Oct 7, 2022
50b7828
changelog
radeusgd Oct 7, 2022
1bbece0
add a test
radeusgd Oct 7, 2022
4a184a5
formatting
radeusgd Oct 7, 2022
593ef02
Fix after rebase
radeusgd Oct 7, 2022
6711066
Add test for newlines in wildcards
radeusgd Oct 10, 2022
78906f0
More tests, fix regex in Table
radeusgd Oct 10, 2022
07be26a
Fix Like in Base types
radeusgd Oct 10, 2022
0cbf8d1
Add some Unicode tests, some of them pending due to a bug
radeusgd Oct 10, 2022
7388c49
Merge branch 'develop' into wip/radeusgd/filter-condition-empty-like-…
mergify[bot] Oct 10, 2022
f0a82e0
Add comments on the known bug, make the fix better
radeusgd Oct 10, 2022
7a91fd1
Merge branch 'develop' into wip/radeusgd/filter-condition-empty-like-…
mergify[bot] Oct 10, 2022
18b9bdd
Merge branch 'develop' into wip/radeusgd/filter-condition-empty-like-…
mergify[bot] Oct 10, 2022
fa3e2c1
Merge branch 'develop' into wip/radeusgd/filter-condition-empty-like-…
mergify[bot] Oct 10, 2022
4264689
Merge branch 'develop' into wip/radeusgd/filter-condition-empty-like-…
mergify[bot] Oct 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,8 @@
- [Added `Date_Period.Week` to `start_of` and `end_of` methods.][3733]
- [Replaced `Table.where` with a new API relying on `Table.filter`.][3750]
- [Added `Filter_Condition` to `Vector`, `Range` and `List`.][3770]
- [Extended `Filter_Condition` with `Is_Empty`, `Not_Empty`, `Like` and
`Not_Like`.][3775]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -332,6 +334,7 @@
[3749]: https://github.com/enso-org/enso/pull/3749
[3750]: https://github.com/enso-org/enso/pull/3750
[3770]: https://github.com/enso-org/enso/pull/3770
[3775]: https://github.com/enso-org/enso/pull/3775

#### Enso Compiler

Expand Down
30 changes: 21 additions & 9 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -2039,7 +2039,7 @@ buildEngineDistribution := {
log.info(s"Engine package created at $root")
}

val stdBitsProjects = List("Base", "Database", "Google_Api", "Image", "Table")
val stdBitsProjects = List("Base", "Database", "Google_Api", "Image", "Table", "All")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly it is not possible to skip the suffix so buildStdLibAll looks good!

val allStdBits: Parser[String] =
stdBitsProjects.map(v => v: Parser[String]).reduce(_ | _)

Expand All @@ -2057,7 +2057,7 @@ buildStdLib := Def.inputTaskDyn {
}.evaluated

lazy val pkgStdLibInternal = inputKey[Unit]("Use `buildStdLib`")
pkgStdLibInternal := Def.inputTaskDyn {
pkgStdLibInternal := Def.inputTask {
val cmd = allStdBits.parsed
val root = engineDistributionRoot.value
val log: sbt.Logger = streams.value.log
Expand All @@ -2073,15 +2073,27 @@ pkgStdLibInternal := Def.inputTaskDyn {
(`std-image` / Compile / packageBin).value
case "Table" =>
(`std-table` / Compile / packageBin).value
case "All" =>
(`std-base` / Compile / packageBin).value
(`std-table` / Compile / packageBin).value
(`std-database` / Compile / packageBin).value
(`std-image` / Compile / packageBin).value
(`std-google-api` / Compile / packageBin).value
case _ =>
}
StdBits.buildStdLibPackage(
cmd,
root,
cacheFactory,
log,
defaultDevEnsoVersion
)
val libs = if (cmd != "All") Seq(cmd) else {
val prefix = "Standard."
Editions.standardLibraries.filter(_.startsWith(prefix)).map(_.stripPrefix(prefix))
}
libs.foreach { lib =>
StdBits.buildStdLibPackage(
lib,
root,
cacheFactory,
log,
defaultDevEnsoVersion
)
}
}.evaluated

lazy val buildLauncherDistribution =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ from Standard.Base import all

from Standard.Base.Data.Filter_Condition.Filter_Condition import all

polyglot java import org.enso.base.Regex_Utils

type Filter_Condition
## Is less than a value (or another column, in case of Table operations)?
Less than:Any
Expand Down Expand Up @@ -57,6 +59,52 @@ type Filter_Condition
## Is the value equal to False (Boolean only)?
Is_False

## Is equal to "" or Nothing (Text only)?
Is_Empty

## Is not equal to "" and Nothing (Text only)?
Not_Empty

## Does the value match the SQL pattern (Text only)?

It accepts a Text value representing the matching pattern. In case of
Table operations, it can accept another column - then the corresponding
values from the source column and the provided column are checked.

The pattern is interpreted according to the standard SQL convention:
- the `%` character matches any sequence of characters,
- the `_` character matches any single character,
- any other character is matched literally.

! Known Bugs
There is a known bug in Java Regex where escape characters are not
handled properly in Unicode-normalized matching mode. Due to this
limitation, Unicode normalization has been disabled for this function,
so beware that some equivalent graphemes like 'ś' and 's\u0301' will
not be matched.
See https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8032926
Like pattern:Text

## Does the value not match the SQL pattern (Text only)?

It accepts a Text value representing the matching pattern. In case of
Table operations, it can accept another column - then the corresponding
values from the source column and the provided column are checked.

The pattern is interpreted according to the standard SQL convention:
- the `%` character matches any sequence of characters,
- the `_` character matches any single character,
- any other character is matched literally.

! Known Bugs
There is a known bug in Java Regex where escape characters are not
handled properly in Unicode-normalized matching mode. Due to this
limitation, Unicode normalization has been disabled for this function,
so beware that some equivalent graphemes like 'ś' and 's\u0301' will
not be matched.
See https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8032926
Not_Like pattern:Text

## Converts a `Filter_Condition` condition into a predicate taking an
element and returning a value indicating whether the element should be
accepted by the filter.
Expand All @@ -80,3 +128,25 @@ type Filter_Condition
_ -> True
Is_True -> ==True
Is_False -> ==False
Is_Empty -> elem -> case elem of
Nothing -> True
"" -> True
_ -> False
Not_Empty -> elem -> case elem of
Nothing -> False
"" -> False
_ -> True
Like sql_pattern ->
regex = sql_like_to_regex sql_pattern
regex.matches
Not_Like sql_pattern ->
regex = sql_like_to_regex sql_pattern
elem -> regex.matches elem . not

## PRIVATE
sql_like_to_regex sql_pattern =
regex_pattern = Regex_Utils.sql_like_pattern_to_regex sql_pattern
## There is a bug with Java Regex in Unicode normalized mode (CANON_EQ) with quoting.
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8032926
Once that bug is fixed, `match_ascii` may be set back to `False`.
Regex.compile regex_pattern dot_matches_newline=True match_ascii=True
75 changes: 57 additions & 18 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,34 @@ type Column
to_sql : SQL_Statement
to_sql self = self.to_table.to_sql

## PRIVATE
Sets up an operation of arbitrary arity.

Arguments:
- op_kind: The kind of the operation
- operands: A vector of additional operation arguments (the column itself
is always passed as the first argument).
- new_type: The type of the SQL column that results from applying the
operator. If not specified, the type of this column is used.
- operand_types: The SQL types of the additional arguments. They are used
if additional arguments are constants (and if not provided, the type of
this column is used). If the other argument is a column, its type is
used.
make_op self op_kind operands new_type=Nothing operand_types=Nothing =
prepare_operand operand operand_type = case operand of
other_column : Column ->
if Helpers.check_integrity self other_column then other_column.expression else
Error.throw <| Unsupported_Database_Operation_Error "Cannot use columns coming from different contexts in one expression without a join."
constant ->
actual_operand_type = operand_type.if_nothing self.sql_type
Expression.Constant actual_operand_type constant
actual_operand_types = operand_types.if_nothing (Vector.fill operands.length Nothing)
expressions = operands.zip actual_operand_types prepare_operand

actual_new_type = new_type.if_nothing self.sql_type
new_expr = Expression.Operation op_kind ([self.expression] + expressions)
Column.Value self.name self.connection actual_new_type new_expr self.context

## PRIVATE

Creates a binary operation with given kind and operand.
Expand All @@ -129,20 +157,7 @@ type Column
defaults to the current type if not provided.
make_binary_op : Text -> Text -> (Column | Any) -> (SQL_Type | Nothing) -> (SQL_Type | Nothing) -> Column
make_binary_op self op_kind operand new_type=Nothing operand_type=Nothing =
actual_new_type = new_type.if_nothing self.sql_type
case operand of
Column.Value _ _ _ other_expr _ ->
case Helpers.check_integrity self operand of
False ->
Error.throw <| Unsupported_Database_Operation_Error "Cannot compare columns coming from different contexts. Only columns of a single table can be compared."
True ->
new_expr = Expression.Operation op_kind [self.expression, other_expr]
Column.Value self.name self.connection actual_new_type new_expr self.context
_ ->
actual_operand_type = operand_type.if_nothing self.sql_type
other = Expression.Constant actual_operand_type operand
new_expr = Expression.Operation op_kind [self.expression, other]
Column.Value self.name self.connection actual_new_type new_expr self.context
self.make_op op_kind [operand] new_type [operand_type]

## PRIVATE

Expand All @@ -153,10 +168,7 @@ type Column
- new_type: The type of the SQL column that results from applying the
operator.
make_unary_op : Text -> Text -> (SQL_Type | Nothing) -> Column
make_unary_op self op_kind new_type=Nothing =
actual_new_type = new_type.if_nothing self.sql_type
new_expr = Expression.Operation op_kind [self.expression]
Column.Value self.name self.connection actual_new_type new_expr self.context
make_unary_op self op_kind new_type=Nothing = self.make_op op_kind [] new_type

## UNSTABLE

Expand Down Expand Up @@ -314,6 +326,22 @@ type Column
< : Column | Any -> Column
< self other = self.make_binary_op "<" other new_type=SQL_Type.boolean

## Element-wise inclusive bounds check.

Arguments:
- lower: The lower bound to compare elements of `self` against. If
`lower` is a column, the comparison is performed pairwise between
corresponding elements of `self` and `lower`.
- upper: The upper bound to compare elements of `self` against. If
`upper` is a column, the comparison is performed pairwise between
corresponding elements of `self` and `upper`.

Returns a column with boolean values indicating whether values of this
column fit between the lower and upper bounds (both ends inclusive).
between : (Column | Any) -> (Column | Any) -> Column
between self lower upper =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: maybe we could think of adding a parameter if it shouldn't be inclusive by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL BETWEEN this is more or less supposed to emulate is always inclusive. So if we added an option, we'd have to make it unsupported on Database backend or emulate it using regular comparison operators.

But we already can always just apply two filters: Greater+Less to get an exclusive between.

So I'm not sure if it is necessary to add it, I guess it's mostly a question to @jdunkerley who designed the original set of Filter_Conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was aiming to keep BETWEEN simple and just match SQLs behavior with a view that is what our users would expect.

self.make_op "BETWEEN" [lower, upper] new_type=SQL_Type.boolean

## UNSTABLE

Element-wise addition.
Expand Down Expand Up @@ -407,6 +435,12 @@ type Column
is_missing : Column
is_missing self = self.make_unary_op "ISNULL" new_type=SQL_Type.boolean

## PRIVATE
Returns a column of booleans, with `True` items at the positions where
this column contains an empty string or `Nothing`.
is_empty : Column
is_empty self = self.make_unary_op "ISEMPTY" new_type=SQL_Type.boolean

## UNSTABLE

Returns a new column where missing values have been replaced with the
Expand Down Expand Up @@ -517,6 +551,11 @@ type Column
contains : Column | Text -> Column
contains self other = self.make_binary_op "contains" other new_type=SQL_Type.boolean

## PRIVATE
Checks for each element of the column if it matches an SQL-like pattern.
like : Column | Text -> Column
like self other = self.make_binary_op "LIKE" other new_type=SQL_Type.boolean

## PRIVATE
as_internal : Internal_Column
as_internal self = Internal_Column.Value self.name self.sql_type self.expression
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,15 +168,39 @@ base_dialect =
bin = name -> [name, make_binary_op name]
unary = name -> [name, make_unary_op name]
fun = name -> [name, make_function name]

arith = [bin "+", bin "-", bin "*", bin "/"]
logic = [bin "AND", bin "OR", unary "NOT"]
compare = [bin "=", bin "!=", bin "<", bin ">", bin "<=", bin ">="]
compare = [bin "=", bin "!=", bin "<", bin ">", bin "<=", bin ">=", ["BETWEEN", make_between]]
agg = [fun "MAX", fun "MIN", fun "AVG", fun "SUM"]
counts = [fun "COUNT", ["COUNT_ROWS", make_constant "COUNT(*)"]]
text = [["ISEMPTY", make_is_empty], bin "LIKE"]
nulls = [["ISNULL", make_right_unary_op "IS NULL"], ["FILLNULL", make_function "COALESCE"]]
base_map = Map.from_vector (arith + logic + compare + agg + nulls + counts)
base_map = Map.from_vector (arith + logic + compare + agg + counts + text + nulls)
Internal_Dialect.Value base_map wrap_in_quotes

## PRIVATE
make_is_empty : Vector Builder -> Builder
make_is_empty arguments = case arguments.length of
1 ->
arg = arguments.at 0
is_null = (arg ++ " IS NULL").paren
is_empty = (arg ++ " = ''").paren
(is_null ++ " OR " ++ is_empty).paren
_ ->
Error.throw <| Illegal_State_Error_Data ("Invalid amount of arguments for operation ISEMPTY")

## PRIVATE
make_between : Vector Builder -> Builder
make_between arguments = case arguments.length of
3 ->
expr = arguments.at 0
lower = arguments.at 1
upper = arguments.at 2
(expr ++ " BETWEEN " ++ lower ++ " AND " ++ upper).paren
_ ->
Error.throw <| Illegal_State_Error_Data ("Invalid amount of arguments for operation BETWEEN")

## PRIVATE

Builds code for an expression.
Expand Down
27 changes: 27 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,21 @@ type Column
< : Column | Any -> Column
< self other = run_vectorized_binary_op self "<" (<) other

## Element-wise inclusive bounds check.

Arguments:
- lower: The lower bound to compare elements of `self` against. If
`lower` is a column, the comparison is performed pairwise between
corresponding elements of `self` and `lower`.
- upper: The upper bound to compare elements of `self` against. If
`upper` is a column, the comparison is performed pairwise between
corresponding elements of `self` and `upper`.

Returns a column with boolean values indicating whether values of this
column fit between the lower and upper bounds (both ends inclusive).
between : (Column | Any) -> (Column | Any) -> Column
between self lower upper = (self >= lower) && (self <= upper)

## ALIAS Add Columns

Element-wise addition.
Expand Down Expand Up @@ -444,6 +459,12 @@ type Column
is_missing : Column
is_missing self = run_vectorized_unary_op self "is_missing" (== Nothing)

## PRIVATE
Returns a column of booleans, with `True` items at the positions where
this column contains an empty string or `Nothing`.
is_empty : Column
is_empty self = run_vectorized_unary_op self "is_empty" Filter_Condition.Is_Empty.to_predicate

## Returns a column of booleans, with `True` items at the positions where
this column does not contain a `Nothing`.

Expand Down Expand Up @@ -564,6 +585,12 @@ type Column
contains self other =
run_vectorized_binary_op self "contains" (a -> b -> a.contains b) other

## PRIVATE
Checks for each element of the column if it matches an SQL-like pattern.
like : Column | Text -> Column
like self other =
run_vectorized_binary_op self "like" (_ -> _ -> Error.throw (Illegal_State_Error "The `Like` operation should only be used on Text columns.")) other

## ALIAS Transform Column

Applies `function` to each item in this column and returns the column
Expand Down
Loading