Table.group_by #3305

jdunkerley · 2022-02-25T17:34:55Z

Pull Request Description

Functioning group_by based of Enso Map.

Important Notes

This is an initial version which will be used to establish the API.
The grouping map will need to be moved to Java code for performance.

Checklist

Please include the following checklist in your PR:

The documentation has been updated if necessary.
All code conforms to the Scala, Java, and Rust style guides.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

radeusgd

Looks good to me.
Some style comments and questions but given it's a prototype, most are just suggestions for the future.

The important thing is the Group_By_Key - IMO it should not repeat the implementations of existing functions if it can delegate to them unless it is really absolutely necessary.

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso

radeusgd · 2022-02-28T13:55:20Z

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso

+       - population argument specifies if group is a sample or the population
+    type Standard_Deviation (column:Column|Text|Integer) (name:Text|Nothing=Nothing) (population:Boolean=False)
+
+    ## Creates a new column with the values concatenated together. NULL values will become an empty string.


Btw. why is NULL collated with "" here instead of being ignored like in other cases ?

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Group_By.enso

radeusgd · 2022-02-28T14:35:38Z

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso

+                array = new_table.at (j + key_length) . at 1 . to_array
+                current = array . at row_index
+                new = aggregator current i
+                array . set_at row_index new


This may stop working when we fix the "to_array is leaking mutability" issue.

As this prototype may live shorter than the above issue - probably ignore this comment.

But just for future - maybe we could just create an Expandable_Array which wraps a regular, explicitly-mutable array but also has the append operation and does the allocations that give us O(1) amortized complexity of append. Because I don't like that we are abusing the Vector_Builder a bit when we could just create a data structure meant for such use-cases. But since this is a prototype - just commenting a suggestion for future.

Handle no key

Use first/second Expanding tests

Sum of all Nothing == Nothing StDev calculation fixes

Support for Booleans and Nothing in the keys

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Group_By.enso

radeusgd

Looks good to me

radeusgd approved these changes Feb 28, 2022

View reviewed changes

jdunkerley force-pushed the wip/jd/group-by-181268783 branch from 81746c2 to 2f74a42 Compare March 1, 2022 11:17

jdunkerley added 19 commits March 1, 2022 15:04

Initial Group By idea

7631a51

Prototype for Group_By

3c6d6ee

More Group_By work

ecb81cb

Getting prototype together

7c59d6c

Compiles now...

332f87f

Some tests and making it work

98bed28

Nearly there I think

d029164

Functional...

7eb7c7d

Move to design spec

253719f

PR work

56480a4

working again and some more PR comments

596e28b

Handle no rows

3b94cc6

Handle no key

StDev Sample

23ab7cd

Use first/second Expanding tests

Median implementation

8e72d0c

Sum of all Nothing == Nothing StDev calculation fixes

Median empty fix

3151b16

First and Last

64e56fd

Concatenate

e528081

Count Distinct

675b939

Support for Booleans and Nothing in the keys

ChangeLog update

9f99739

jdunkerley force-pushed the wip/jd/group-by-181268783 branch from 434a021 to 9f99739 Compare March 1, 2022 15:05

jdunkerley requested a review from radeusgd March 1, 2022 15:06

jdunkerley marked this pull request as ready for review March 1, 2022 15:06

jdunkerley requested a review from 4e6 as a code owner March 1, 2022 15:06

radeusgd reviewed Mar 1, 2022

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso Outdated Show resolved Hide resolved

radeusgd reviewed Mar 1, 2022

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Group_By.enso Outdated Show resolved Hide resolved

radeusgd approved these changes Mar 1, 2022

View reviewed changes

Revert 2 changes

b1f6cba

jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Mar 1, 2022

mergify bot merged commit 738a691 into develop Mar 1, 2022

mergify bot deleted the wip/jd/group-by-181268783 branch March 1, 2022 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table.group_by #3305

Table.group_by #3305

jdunkerley commented Feb 25, 2022 •

edited

Loading

radeusgd left a comment

radeusgd Feb 28, 2022

radeusgd Feb 28, 2022

radeusgd left a comment

Table.group_by #3305

Table.group_by #3305

Conversation

jdunkerley commented Feb 25, 2022 • edited Loading

Pull Request Description

Important Notes

Checklist

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd Feb 28, 2022

Choose a reason for hiding this comment

radeusgd Feb 28, 2022

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

jdunkerley commented Feb 25, 2022 •

edited

Loading