Skip to content

Commit

Permalink
Add simple parts of Table.take and Table.drop functions to Database t…
Browse files Browse the repository at this point in the history
…able (#7615)

Implements database Table and Column take/drop, except While and Sample.

Additional features and optimizations are in #7614.
  • Loading branch information
GregoryTravis committed Aug 31, 2023
1 parent 24f263b commit 061876e
Show file tree
Hide file tree
Showing 14 changed files with 395 additions and 118 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,7 @@
- [Expose `Text.normalize`.][7425]
- [Implemented new value types (various sizes of `Integer` type, fixed-length
and length-limited `Char` type) for the in-memory `Table` backend.][7557]
- [Added `take` and `drop` to database tables.][7615]
- [Added ability to specify expected value type in `Column.from_vector`,
`Column.map` and `Column.zip`.][7637]

Expand Down Expand Up @@ -796,6 +797,7 @@
[7297]: https://github.com/enso-org/enso/pull/7297
[7425]: https://github.com/enso-org/enso/pull/7425
[7557]: https://github.com/enso-org/enso/pull/7557
[7615]: https://github.com/enso-org/enso/pull/7615
[7637]: https://github.com/enso-org/enso/pull/7637

#### Enso Compiler
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1095,10 +1095,7 @@ type Column
- range: The selection of rows from the table to return.
@range Index_Sub_Range.default_widget
take : (Index_Sub_Range | Range | Integer) -> Column
take self range=(First 1) =
_ = range
msg = "`Column.take` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
take self range=(First 1) = self.to_table.take range . at 0

## GROUP Standard.Base.Selections
Creates a new Column from the input with the specified range of rows
Expand All @@ -1108,10 +1105,7 @@ type Column
- range: The selection of rows from the table to remove.
@range Index_Sub_Range.default_widget
drop : (Index_Sub_Range | Range | Integer) -> Column
drop self range=(First 1) =
_ = range
msg = "`Column.drop` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
drop self range=(First 1) = self.to_table.drop range . at 0

## GROUP Standard.Base.Text
Checks for each element of the column if it starts with `other`.
Expand Down
85 changes: 77 additions & 8 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import project.Data.Column.Column
import project.Data.SQL_Query.SQL_Query
import project.Data.SQL_Statement.SQL_Statement
import project.Data.SQL_Type.SQL_Type
import project.Data.Take_Drop_Helpers
import project.Internal.Aggregate_Helper
import project.Internal.Base_Generator
import project.Internal.Common.Database_Join_Helper
Expand All @@ -53,6 +54,7 @@ import project.Internal.IR.Query.Query
import project.Internal.IR.SQL_Expression.SQL_Expression
import project.Internal.IR.SQL_Join_Kind.SQL_Join_Kind
import project.Internal.SQL_Type_Reference.SQL_Type_Reference
from project.Data.Take_Drop_Helpers import Take_Drop
from project.Errors import Integrity_Error, Table_Not_Found, Unsupported_Database_Operation

polyglot java import java.sql.JDBCType
Expand Down Expand Up @@ -610,26 +612,64 @@ type Table

Arguments:
- range: The selection of rows from the table to return.

For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Take first 10 rows of the table.

table.take (First 10)

> Example
Take rows from the top of the table as long as their values sum to 10.

table.take (While row-> row.to_vector.compute Statistic.Sum == 10)
@range Index_Sub_Range.default_widget
take : (Index_Sub_Range | Range | Integer) -> Table
take self range=(First 1) =
_ = range
msg = "`Table.take` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
Take_Drop_Helpers.take_drop_helper Take_Drop.Take self range

## GROUP Standard.Base.Selections
Creates a new Table from the input with the specified range of rows
removed.


Arguments:
- range: The selection of rows from the table to remove.

For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Drop first 10 rows of the table.

table.drop (First 10)

> Example
Drop rows from the top of the table as long as their values sum to 10.

table.drop (While row-> row.to_vector.compute Statistic.Sum == 10)
@range Index_Sub_Range.default_widget
drop : (Index_Sub_Range | Range | Integer) -> Table
drop self range=(First 1) =
_ = range
msg = "`Table.drop` is not yet implemented."
Error.throw (Unsupported_Database_Operation.Error msg)
Take_Drop_Helpers.take_drop_helper Take_Drop.Drop self range

## PRIVATE
Filter out all rows.
remove_all_rows : Table
remove_all_rows self = self.filter_by_expression "0==1"

## ALIAS add index column, rank, record id
GROUP Standard.Base.Values
Expand Down Expand Up @@ -700,7 +740,8 @@ type Table
rebuild_table columns =
self.updated_columns (columns.map .as_internal)
renamed_table = Add_Row_Number.rename_columns_if_needed self name on_problems rebuild_table
renamed_table.updated_columns (renamed_table.internal_columns + [new_column])
updated_table = renamed_table.updated_columns (renamed_table.internal_columns + [new_column])
updated_table.as_subquery


## UNSTABLE
Expand Down Expand Up @@ -856,6 +897,28 @@ type Table
new_type_ref = SQL_Type_Reference.from_constant sql_type
Column.Value ("Constant_" + UUID.randomUUID.to_text) self.connection new_type_ref expr self.context

## PRIVATE
Create a unique temporary column name.
make_temp_column_name : Text
make_temp_column_name self = self.column_naming_helper.make_temp_column_name self.column_names

## PRIVSATE
Run a table transformer with a temporary column added.
with_temporary_column : Column -> (Text -> Table -> Table) -> Table
with_temporary_column self new_column:Column f:(Text -> Table -> Table) =
new_column_name = self.make_temp_column_name
with_new_column = self.set new_column new_column_name set_mode=Set_Mode.Add
modified_table = f new_column_name with_new_column
modified_table.remove_columns new_column_name

## PRIVATE
Filter a table on a boolean column. The column does not have to be part
of the table, but it must be derived from it and share a context.
filter_on_predicate_column : Column -> Table
filter_on_predicate_column self predicate_column =
self.with_temporary_column predicate_column name-> table->
table.filter name Filter_Condition.Is_True

## Returns the vector of columns contained in this table.
columns : Vector Column
columns self = Vector.from_polyglot_array <|
Expand Down Expand Up @@ -2067,6 +2130,12 @@ type Table
False ->
Table.Value self.name self.connection internal_columns ctx

## PRIVATE
Nests a table as a subquery, using `updated_context_and_columns`, which
causes its columns to be referenced as names rather than expressions.
as_subquery : Table
as_subquery self = self.updated_context_and_columns self.context self.internal_columns subquery=True

## PRIVATE
Checks if this table is a 'trivial query'.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
from Standard.Base import all

import Standard.Base.Data.Index_Sub_Range as Index_Sub_Range_Module
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument
import Standard.Base.Errors.Illegal_State.Illegal_State
from Standard.Base.Data.Index_Sub_Range import normalize_ranges, resolve_ranges, sort_and_merge_ranges

from Standard.Table import Set_Mode

import project.Data.Column.Column
import project.Data.Table.Table
from project.Errors import Unsupported_Database_Operation

## PRIVATE
type Take_Drop
## PRIVATE
Take

## PRIVATE
Drop

## PRIVATE
Apply `take` or `drop` to a table, returning the specified by the selector.
take_drop_helper : Take_Drop -> Table -> (Index_Sub_Range | Range | Integer) -> Table
take_drop_helper take_drop table selector =
check_supported selector <|
length = table.row_count
ranges = cleanup_ranges (collect_ranges take_drop length selector)

if ranges.is_empty then table.remove_all_rows else
# Filter on row column. Add the row column at the start, remove it at the end.
row_column_name = table.make_temp_column_name
table_with_row_number = table.add_row_number name=row_column_name from=0

subqueries = ranges.map range->
generate_subquery table_with_row_number row_column_name range
combined = subqueries.reduce (a-> b-> a.union b)
combined.remove_columns row_column_name

## PRIVATE
Turn the selector into a vector of ranges
collect_ranges : Take_Drop -> Integer -> (Index_Sub_Range | Range | Integer) -> Vector Range
collect_ranges take_drop length selector =
at _ = Panic.throw (Illegal_State.Error "Impossible: at called in Database take/drop. This is a bug in the Database library.")
single_slice s e = [Range.new s e]
slice_ranges selectors =
slice_range selector = case selector of
i : Integer -> Range.new i i+1
r : Range -> r
selectors.map slice_range
helper = case take_drop of
Take_Drop.Take -> Index_Sub_Range_Module.take_helper
Take_Drop.Drop -> Index_Sub_Range_Module.drop_helper
helper length at single_slice slice_ranges selector

## PRIVATE
Throw Unsupported_Database_Operation for selectors that are not supported by database backends.
check_supported : (Index_Sub_Range | Range | Integer) -> Any -> Any | Unsupported_Database_Operation
check_supported selector ~cont =
err =
msg = selector.to_display_text + " is not supported for database backends"
Error.throw (Unsupported_Database_Operation.Error msg)

case selector of
Index_Sub_Range.While _ -> err
Index_Sub_Range.Sample _ _ -> err
_ -> cont

## PRIVATE
Remove empty ranges.
cleanup_ranges : Vector Range -> Vector Range
cleanup_ranges ranges:(Vector Range) =
ranges.filter (range-> range.end > range.start)

## PRIVATE
Filter a table with a single range. Returns only those rows whose row column fall within the range.
generate_subquery : Table -> Text -> Range -> Table
generate_subquery table row_column_name range =
case range.step of
1 ->
filter_condition = Filter_Condition.Between range.start range.end-1
table.filter row_column_name filter_condition
_ ->
table.filter_on_predicate_column ((((table.at row_column_name - range.start) % range.step) == 0) && (table.at row_column_name < range.end))
22 changes: 22 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -1215,6 +1215,12 @@ type Table
For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Take first 10 rows of the table.

Expand All @@ -1239,6 +1245,12 @@ type Table
For the purposes of the `Index_Sub_Range.While` predicate a single
"element" of the table is represented by the `Row` type.

? Supported Range Types

Database backends support all range types except `While` and `Sample`

In-memory tables support all range types.

> Example
Drop first 10 rows of the table.

Expand All @@ -1253,6 +1265,11 @@ type Table
drop self range=(First 1) =
Index_Sub_Range_Module.drop_helper self.row_count self.rows.at self.slice (slice_ranges self) range

## PRIVATE
Filter out all rows.
remove_all_rows : Table
remove_all_rows self = self.take 0

## ALIAS add index column, rank, record id
GROUP Standard.Base.Values
Adds a new column to the table enumerating the rows.
Expand Down Expand Up @@ -1398,6 +1415,11 @@ type Table
if Table_Helpers.is_column value then Error.throw (Illegal_Argument.Error "A constant value may only be created from a scalar, not a Column") else
Column.from_vector_repeated ("Constant_" + UUID.randomUUID.to_text) [value] self.row_count

## PRIVATE
Create a unique temporary column name.
make_temp_column_name : Text
make_temp_column_name self = self.column_naming_helper.make_temp_column_name self.column_names

## Returns the vector of columns contained in this table.

> Examples
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,11 @@ type Column_Naming_Helper
in_memory : Column_Naming_Helper
in_memory =
Column_Naming_Helper.Value Unlimited_Naming_Properties.Instance

## PRIVATE
Create a column called "temp", possibly renamed.
make_temp_column_name : Vector Text -> Text
make_temp_column_name self existing_column_names =
renamer = self.create_unique_name_strategy
renamer.mark_used existing_column_names
renamer.make_unique "temp"
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
from Standard.Base import all

import Standard.Database.Extensions.Upload_Database_Table
import Standard.Database.Extensions.Upload_In_Memory_Table

from Standard.Table import Sort_Column
from Standard.Table.Data.Aggregate_Column.Aggregate_Column import Group_By, Sum
from Standard.Table.Errors import Missing_Input_Columns, Duplicate_Output_Column_Names, Floating_Point_Equality

from Standard.Test import Test, Problems
import Standard.Test.Extensions

import Standard.Database.Extensions.Upload_Database_Table
import Standard.Database.Extensions.Upload_In_Memory_Table

import project.Database.Helpers.Name_Generator
from project.Common_Table_Operations.Util import run_default_backend

Expand Down Expand Up @@ -94,6 +95,15 @@ spec setup =
t2.at "Y" . to_vector . should_equal [10, 20, 30, 40]
t2.at "Row" . to_vector . should_equal [1, 2, 3, 4]

Test.specify "Should work correctly after aggregation" <|
t0 = table_builder [["X", ["a", "b", "a", "c"]], ["Y", [1, 2, 4, 8]]]
t1 = t0.aggregate [Group_By "X", Sum "Y"]

t2 = t1.order_by "X" . add_row_number
t2.at "X" . to_vector . should_equal ['a', 'b', 'c']
t2.at "Sum Y" . to_vector . should_equal [5.0, 2.0, 8.0]
t2.at "Row" . to_vector . should_equal [1, 2, 3]

if setup.is_database.not then Test.group prefix+"Table.add_row_number (in-memory specific)" <|
Test.specify "should add a row numbering column" <|
t = table_builder [["X", ['a', 'b', 'a', 'a', 'c']]]
Expand Down
5 changes: 3 additions & 2 deletions test/Table_Tests/src/Common_Table_Operations/Main.enso
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import project.Common_Table_Operations.Missing_Values_Spec
import project.Common_Table_Operations.Order_By_Spec
import project.Common_Table_Operations.Select_Columns_Spec
import project.Common_Table_Operations.Take_Drop_Spec
import project.Common_Table_Operations.Temp_Column_Spec
import project.Common_Table_Operations.Transpose_Spec

from project.Common_Table_Operations.Util import run_default_backend
Expand Down Expand Up @@ -77,7 +78,6 @@ type Test_Selection
- order_by_unicode_normalization_by_default: Specifies if the backend
supports unicode normalization in its default ordering.
- case_insensitive_ascii_only:
- take_drop: Specifies if the backend supports take/drop operations.
- allows_mixed_type_comparisons: Specifies if mixed operations comparing
mixed types are allowed by a given backend. Some backends will allow
such comparisons, when mixed type storage is allowed or by coercing to
Expand Down Expand Up @@ -109,7 +109,7 @@ type Test_Selection
columns.
- supported_replace_params: Specifies the possible values of
Replace_Params that a backend supports.
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False take_drop=True allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False different_size_integer_types=True supports_8bit_integer=False supports_decimal_type=False supports_time_duration=False supports_nanoseconds_in_time=False supports_mixed_columns=False supported_replace_params=Nothing
Config supports_case_sensitive_columns=True order_by=True natural_ordering=False case_insensitive_ordering=True order_by_unicode_normalization_by_default=False case_insensitive_ascii_only=False allows_mixed_type_comparisons=True supports_unicode_normalization=False is_nan_and_nothing_distinct=True distinct_returns_first_row_from_group_if_ordered=True date_time=True fixed_length_text_columns=False different_size_integer_types=True supports_8bit_integer=False supports_decimal_type=False supports_time_duration=False supports_nanoseconds_in_time=False supports_mixed_columns=False supported_replace_params=Nothing

spec setup =
Core_Spec.spec setup
Expand All @@ -134,5 +134,6 @@ spec setup =
Transpose_Spec.spec setup
Add_Row_Number_Spec.spec setup
Integration_Tests.spec setup
Temp_Column_Spec.spec setup

main = run_default_backend spec
Loading

0 comments on commit 061876e

Please sign in to comment.