Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple parts of Table.take and Table.drop functions to Database table #7615

Merged
merged 35 commits into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
8a64f97
filter arn works
GregoryTravis Aug 15, 2023
4a5de55
bools but still window problem
GregoryTravis Aug 15, 2023
26c2fe3
moved into arn
GregoryTravis Aug 16, 2023
0e8fc52
wip
GregoryTravis Aug 16, 2023
ca237ed
Merge branch 'develop' into wip/gmt/5131-db-take-drop
GregoryTravis Aug 17, 2023
92d5b78
tests pass
GregoryTravis Aug 17, 2023
160a1d8
wip
GregoryTravis Aug 18, 2023
dfdff75
pg pass
GregoryTravis Aug 18, 2023
52609c4
cleanup
GregoryTravis Aug 18, 2023
41b66f8
no order test
GregoryTravis Aug 18, 2023
33a6cdb
no order test in-mem
GregoryTravis Aug 18, 2023
716efef
collect_ranges
GregoryTravis Aug 18, 2023
ebca4c6
should_equals
GregoryTravis Aug 18, 2023
2bd8427
rm boolean
GregoryTravis Aug 18, 2023
24a98c8
with_temp col
GregoryTravis Aug 18, 2023
98a1bbc
docs
GregoryTravis Aug 18, 2023
7ae1cb0
filter_on_predicate_column
GregoryTravis Aug 18, 2023
6d4cf4c
docs
GregoryTravis Aug 18, 2023
22ff497
cleanup, changelog
GregoryTravis Aug 18, 2023
ea602c3
asdf
GregoryTravis Aug 18, 2023
8aab397
review
GregoryTravis Aug 21, 2023
36e63e8
reiew
GregoryTravis Aug 21, 2023
0418e2a
review
GregoryTravis Aug 21, 2023
dce8a11
review
GregoryTravis Aug 21, 2023
8e97f7a
Merge branch 'develop' into wip/gmt/5131-db-take-drop
GregoryTravis Aug 21, 2023
6e6a8fa
take 0
GregoryTravis Aug 22, 2023
d799cbb
after agg test
GregoryTravis Aug 22, 2023
c812477
cleanup
GregoryTravis Aug 22, 2023
cc74bc4
Update distribution/lib/Standard/Database/0.0.0-dev/src/Data/Take_Dro…
GregoryTravis Aug 22, 2023
14d69d6
Update distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
GregoryTravis Aug 22, 2023
394ef74
merge
GregoryTravis Aug 22, 2023
8697bfd
merge
GregoryTravis Aug 23, 2023
92c788f
Merge branch 'develop' into wip/gmt/5131-db-take-drop
GregoryTravis Aug 29, 2023
4b00414
Merge branch 'develop' into wip/gmt/5131-db-take-drop
GregoryTravis Aug 30, 2023
640c63a
merge
GregoryTravis Aug 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,7 @@
- [Retire `Column_Selector` and allow regex based selection of columns.][7295]
- [`Text.parse_to_table` can take a `Regex`.][7297]
- [Expose `Text.normalize`.][7425]
- [Added `take` and `drop` to database tables.][7615]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -786,6 +787,7 @@
[7295]: https://github.com/enso-org/enso/pull/7295
[7297]: https://github.com/enso-org/enso/pull/7297
[7425]: https://github.com/enso-org/enso/pull/7425
[7615]: https://github.com/enso-org/enso/pull/7615

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1009,10 +1009,7 @@ type Column
- range: The selection of rows from the table to return.
@range Index_Sub_Range.default_widget
take : (Index_Sub_Range | Range | Integer) -> Column
take self range=(First 1) =
_ = range
msg = "`Column.take` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
take self range=(First 1) = self.to_table.take range . at 0

## Creates a new Column from the input with the specified range of rows
removed.
Expand All @@ -1021,10 +1018,7 @@ type Column
- range: The selection of rows from the table to remove.
@range Index_Sub_Range.default_widget
drop : (Index_Sub_Range | Range | Integer) -> Column
drop self range=(First 1) =
_ = range
msg = "`Column.drop` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
drop self range=(First 1) = self.to_table.drop range . at 0

## Checks for each element of the column if it starts with `other`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import project.Data.Column.Column
import project.Data.SQL_Query.SQL_Query
import project.Data.SQL_Statement.SQL_Statement
import project.Data.SQL_Type.SQL_Type
import project.Data.Take_Drop_Helpers
import project.Internal.Aggregate_Helper
import project.Internal.Base_Generator
import project.Internal.Common.Database_Join_Helper
Expand All @@ -53,6 +54,7 @@ import project.Internal.IR.Query.Query
import project.Internal.IR.SQL_Expression.SQL_Expression
import project.Internal.IR.SQL_Join_Kind.SQL_Join_Kind
import project.Internal.SQL_Type_Reference.SQL_Type_Reference
from project.Data.Take_Drop_Helpers import Take_Drop
from project.Errors import Integrity_Error, Table_Not_Found, Unsupported_Database_Operation

polyglot java import java.sql.JDBCType
Expand Down Expand Up @@ -596,25 +598,63 @@ type Table

Arguments:
- range: The selection of rows from the table to return.

For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Take first 10 rows of the table.

table.take (First 10)

> Example
Take rows from the top of the table as long as their values sum to 10.

table.take (While row-> row.to_vector.compute Statistic.Sum == 10)
@range Index_Sub_Range.default_widget
take : (Index_Sub_Range | Range | Integer) -> Table
take self range=(First 1) =
_ = range
msg = "`Table.take` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
Take_Drop_Helpers.take_drop_helper Take_Drop.Take self range

## Creates a new Table from the input with the specified range of rows
removed.


Arguments:
- range: The selection of rows from the table to remove.

For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Drop first 10 rows of the table.

table.drop (First 10)

> Example
Drop rows from the top of the table as long as their values sum to 10.

table.drop (While row-> row.to_vector.compute Statistic.Sum == 10)
@range Index_Sub_Range.default_widget
drop : (Index_Sub_Range | Range | Integer) -> Table
drop self range=(First 1) =
_ = range
msg = "`Table.drop` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
Take_Drop_Helpers.take_drop_helper Take_Drop.Drop self range

## PRIVATE
Filter out all rows.
remove_all_rows : Table
remove_all_rows self = self.filter_by_expression "0=1"
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

## ALIAS rank, record id, add index column
Adds a new column to the table enumerating the rows.
Expand Down Expand Up @@ -684,7 +724,8 @@ type Table
rebuild_table columns =
self.updated_columns (columns.map .as_internal)
renamed_table = Add_Row_Number.rename_columns_if_needed self name on_problems rebuild_table
renamed_table.updated_columns (renamed_table.internal_columns + [new_column])
updated_table = renamed_table.updated_columns (renamed_table.internal_columns + [new_column])
updated_table.as_subquery


## UNSTABLE
Expand Down Expand Up @@ -838,6 +879,28 @@ type Table
new_type_ref = SQL_Type_Reference.from_constant sql_type
Column.Value ("Constant_" + UUID.randomUUID.to_text) self.connection new_type_ref expr self.context

## PRIVATE
Create a unique temporary column name.
make_temp_column_name : Text
make_temp_column_name self = self.column_naming_helper.make_temp_column_name self.column_names

## PRIVSATE
Run a table transformer with a temporary column added.
with_temporary_column : Column -> (Text -> Table -> Table) -> Table
with_temporary_column self new_column:Column f:(Text -> Table -> Table) =
new_column_name = self.make_temp_column_name
with_new_column = self.set new_column new_column_name set_mode=Set_Mode.Add
modified_table = f new_column_name with_new_column
modified_table.remove_columns new_column_name

## PRIVATE
Filter a table on a boolean column. The column does not have to be part
of the table, but it must be derived from it and share a context.
filter_on_predicate_column : Column -> Table
filter_on_predicate_column self predicate_column =
self.with_temporary_column predicate_column name-> table->
table.filter name Filter_Condition.Is_True

## Returns the vector of columns contained in this table.
columns : Vector Column
columns self = Vector.from_polyglot_array <|
Expand Down Expand Up @@ -2026,6 +2089,12 @@ type Table
False ->
Table.Value self.name self.connection internal_columns ctx

## PRIVATE
Nests a table as a subquery, using `updated_context_and_columns`, which
causes its columns to be referenced as names rather than expressions.
as_subquery : Table
as_subquery self = self.updated_context_and_columns self.context self.internal_columns subquery=True

## PRIVATE
Checks if this table is a 'trivial query'.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
from Standard.Base import all

import Standard.Base.Data.Index_Sub_Range as Index_Sub_Range_Module
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument
import Standard.Base.Errors.Illegal_State.Illegal_State
from Standard.Base.Data.Index_Sub_Range import normalize_ranges, resolve_ranges, sort_and_merge_ranges

from Standard.Table import Set_Mode

import project.Data.Column.Column
import project.Data.Table.Table
from project.Errors import Unsupported_Database_Operation

type Take_Drop
Take
Drop
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

## PRIVATE
Apply `take` or `drop` to a table, returning the specified by the selector.
take_drop_helper : Take_Drop -> Table -> (Index_Sub_Range | Range | Integer) -> Table
take_drop_helper take_drop table selector =
check_supported selector <|
length = table.row_count
ranges = cleanup_ranges (collect_ranges take_drop length selector)

if ranges.is_empty then table.remove_all_rows else
# Filter on row column. Add the row column at the start, remove it at the end.
row_column_name = table.make_temp_column_name
table_with_row_number = table.add_row_number name=row_column_name from=0

subqueries = ranges.map range->
generate_subquery table_with_row_number row_column_name range
combined = subqueries.reduce (a-> b-> a.union b)
combined.remove_columns row_column_name
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

## PRIVATE
Turn the selector into a vector of ranges
collect_ranges : Take_Drop -> Integer -> (Index_Sub_Range | Range | Integer) -> Vector Range
collect_ranges take_drop length selector =
at _ = Panic.throw (Illegal_State.Error "Impossible: at called in Database take/drop. This is a bug in the Database library.")
single_slice s e = [Range.new s e]
slice_ranges selectors =
slice_range selector = case selector of
i : Integer -> Range.new i i+1
r : Range -> r
selectors.map slice_range
helper = case take_drop of
Take_Drop.Take -> Index_Sub_Range_Module.take_helper
Take_Drop.Drop -> Index_Sub_Range_Module.drop_helper
helper length at single_slice slice_ranges selector

## PRIVATE
Throw Unsupported_Database_Operation for selectors that are not supported by database backends.
check_supported : (Index_Sub_Range | Range | Integer) -> Any -> Any | Unsupported_Database_Operation
check_supported selector ~cont =
err =
msg = selector.to_display_text + " is not supported for database backends"
Error.throw (Unsupported_Database_Operation.Error msg)

case selector of
Index_Sub_Range.While _ -> err
Index_Sub_Range.Sample _ _ -> err
_ -> cont

## PRIVATE
Remove empty ranges.
cleanup_ranges : Vector Range -> Vector Range
cleanup_ranges ranges:(Vector Range) =
ranges.filter (range-> range.end > range.start)

## PRIVATE
Filter a table with a single range. Returns only those rows whose row column fall within the range.
generate_subquery : Table -> Text -> Range -> Table
generate_subquery table row_column_name range =
case range.step of
1 ->
filter_condition = Filter_Condition.Between range.start range.end-1
table.filter row_column_name filter_condition
_ ->
table.filter_on_predicate_column ((((table.at row_column_name - range.start) % range.step) == 0) && (table.at row_column_name < range.end))
22 changes: 22 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1192,6 +1192,12 @@ type Table
For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Take first 10 rows of the table.

Expand All @@ -1215,6 +1221,12 @@ type Table
For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Drop first 10 rows of the table.

Expand All @@ -1229,6 +1241,11 @@ type Table
drop self range=(First 1) =
Index_Sub_Range_Module.drop_helper self.row_count self.rows.at self.slice (slice_ranges self) range

## PRIVATE
Filter out all rows.
remove_all_rows : Table
remove_all_rows self = self.take 0

## ALIAS rank, record id, add index column
Adds a new column to the table enumerating the rows.

Expand Down Expand Up @@ -1372,6 +1389,11 @@ type Table
if Table_Helpers.is_column value then Error.throw (Illegal_Argument.Error "A constant value may only be created from a scalar, not a Column") else
Column.from_vector_repeated ("Constant_" + UUID.randomUUID.to_text) [value] self.row_count

## PRIVATE
Create a unique temporary column name.
make_temp_column_name : Text
make_temp_column_name self = self.column_naming_helper.make_temp_column_name self.column_names

## Returns the vector of columns contained in this table.

> Examples
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,11 @@ type Column_Naming_Helper
in_memory : Column_Naming_Helper
in_memory =
Column_Naming_Helper.Value Unlimited_Naming_Properties.Instance

## PRIVATE
Create a column called "temp", possibly renamed.
make_temp_column_name : Vector Text -> Text
make_temp_column_name self existing_column_names =
renamer = self.create_unique_name_strategy
renamer.mark_used existing_column_names
renamer.make_unique "temp"
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
from Standard.Base import all

import Standard.Database.Extensions.Upload_Database_Table
import Standard.Database.Extensions.Upload_In_Memory_Table

from Standard.Table import Sort_Column
from Standard.Table.Data.Aggregate_Column.Aggregate_Column import Group_By, Sum
from Standard.Table.Errors import Missing_Input_Columns, Duplicate_Output_Column_Names, Floating_Point_Equality

from Standard.Test import Test, Problems
import Standard.Test.Extensions

import Standard.Database.Extensions.Upload_Database_Table
import Standard.Database.Extensions.Upload_In_Memory_Table

import project.Database.Helpers.Name_Generator
from project.Common_Table_Operations.Util import run_default_backend

Expand Down Expand Up @@ -94,6 +95,15 @@ spec setup =
t2.at "Y" . to_vector . should_equal [10, 20, 30, 40]
t2.at "Row" . to_vector . should_equal [1, 2, 3, 4]

Test.specify "Should work correctly after aggregation" <|
t0 = table_builder [["X", ["a", "b", "a", "c"]], ["Y", [1, 2, 4, 8]]]
t1 = t0.aggregate [Group_By "X", Sum "Y"]

t2 = t1.order_by "X" . add_row_number
t2.at "X" . to_vector . should_equal ['a', 'b', 'c']
t2.at "Sum Y" . to_vector . should_equal [5.0, 2.0, 8.0]
t2.at "Row" . to_vector . should_equal [1, 2, 3]

if setup.is_database.not then Test.group prefix+"Table.add_row_number (in-memory specific)" <|
Test.specify "should add a row numbering column" <|
t = table_builder [["X", ['a', 'b', 'a', 'a', 'c']]]
Expand Down
5 changes: 3 additions & 2 deletions test/Table_Tests/src/Common_Table_Operations/Main.enso
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import project.Common_Table_Operations.Missing_Values_Spec
import project.Common_Table_Operations.Order_By_Spec
import project.Common_Table_Operations.Select_Columns_Spec
import project.Common_Table_Operations.Take_Drop_Spec
import project.Common_Table_Operations.Temp_Column_Spec
import project.Common_Table_Operations.Transpose_Spec

from project.Common_Table_Operations.Util import run_default_backend
Expand Down Expand Up @@ -77,7 +78,6 @@ type Test_Selection
- order_by_unicode_normalization_by_default: Specifies if the backend
supports unicode normalization in its default ordering.
- case_insensitive_ascii_only:
- take_drop: Specifies if the backend supports take/drop operations.
- allows_mixed_type_comparisons: Specifies if mixed operations comparing
mixed types are allowed by a given backend. Some backends will allow
such comparisons, when mixed type storage is allowed or by coercing to
Expand Down Expand Up @@ -105,7 +105,7 @@ type Test_Selection
columns.
- supported_replace_params: Specifies the possible values of
Replace_Params that a backend supports.
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False take_drop=True allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False supports_decimal_type=False supports_time_duration=False supports_nanoseconds_in_time=False supports_mixed_columns=False supported_replace_params=Nothing
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False supports_decimal_type=False supports_time_duration=False supports_nanoseconds_in_time=False supports_mixed_columns=False supported_replace_params=Nothing

spec setup =
Core_Spec.spec setup
Expand All @@ -130,5 +130,6 @@ spec setup =
Transpose_Spec.spec setup
Add_Row_Number_Spec.spec setup
Integration_Tests.spec setup
Temp_Column_Spec.spec setup

main = run_default_backend spec
Loading
Loading