Refactor Aggregate Column #3349

jdunkerley · 2022-03-18T20:14:10Z

Pull Request Description

Make it easier to understand the computations.
Fix issue with First.
Improve quote handling in Concatenate
Added validation and warnings to input

Checklist

Please include the following checklist in your PR:

The documentation has been updated if necessary.
All code conforms to the Scala, Java, and Rust style guides.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

Still to support Order By on First/Last Need to refactor tests and Table calls

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso

radeusgd

Looks good to me, some thoughts on warning-handling.

I think we should replace create_closure for resolve_columns, if we chose to go that way - as would be good to be consistent in how we do it across functions/sub-libs.

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Aggregator.enso

radeusgd · 2022-03-22T12:25:51Z

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Aggregator.enso

+## Given a Table and a Column create an aggregator
+new : Table->Aggregate_Column->Aggregate_Column_Aggregator
+new table column =
+    create_closure c function:(Column->Any->Integer->Any) = function (column.resolve_column table c)


Hmmmmm, I don't like this name - it doesn't really tell what it does.

More importantly - this function is not necessary anymore, at least once we also merge the resolve_columns.

Maybe you could borrow the resolve_columns from my PR to have it here already. Because I think we both agreed that the pattern to use here is to first do aggregate_column.resolve_columns and then all the other helper functions can assume that they only get these already-resolved aggregates - and thus the create_closure function would be obsolete because it would boil down to function c.

radeusgd · 2022-03-22T12:32:26Z

test/Table_Tests/src/Aggregate_Spec.enso

+        Test.specify "should raise warnings when an invalid column index and no valid output" <|
+            action = table.aggregate [Group_By -3] on_problems=_
+            problems = [Column_Indexes_Out_Of_Range [-3], No_Output_Columns]
+            tester = expect_column_names []
+            Problems.test_problem_handling action problems tester
+
+        Test.specify "should raise a warning when an invalid output name" <|
+            action = table.aggregate [Group_By "Index" ""] on_problems=_
+            problems = [Invalid_Output_Column_Names [""]]
+            tester = expect_column_names ["Column"]
+            Problems.test_problem_handling action problems tester
+
+        Test.specify "should raise a warning when a duplicate column name" <|
+            action = table.aggregate [Group_By "Index", Group_By 0] on_problems=_
+            problems = [Duplicate_Output_Column_Names ["Index"]]
+            tester = expect_column_names ["Index", "Index_1"]
+            Problems.test_problem_handling action problems tester
+
+        Test.specify "should raise a warning when a duplicate column name and rename default names first" <|
+            action = table.aggregate [Group_By "Value", Group_By "Index" "Value"] on_problems=_
+            problems = [Duplicate_Output_Column_Names ["Value"]]
+            tester = expect_column_names ["Value_1", "Value"]
+            Problems.test_problem_handling action problems tester


There seem to be no tests for aggregates besides the special one Group_By and it is a bit different from others, so I think would be worth to make some tests use Sum or Count.

More importantly - what do we do in situations like [Count_Distinct "Foo" ignore_nothing=True, Count_Distinct "Foo" ignore_nothing=False]? They get the same name (fair, makes sense) but compute conceptually different things - so the warning may be confusing to users.

jdunkerley added 6 commits March 21, 2022 15:20

Restructure the Aggregate Column

911ace3

Still to support Order By on First/Last Need to refactor tests and Table calls

Some repairs...

d637e16

Passing first set of tests now

baac0fb

Separate out the aggregator

d12933e

Fix the table aggregate call

03e1ee1

First pass with warnings...

761aff6

jdunkerley force-pushed the wip/jd/refactor-aggregate-column branch from b2726c6 to 761aff6 Compare March 21, 2022 16:14

Wrong name

569f467

radeusgd reviewed Mar 21, 2022

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Data/Aggregate_Column.enso Outdated Show resolved Hide resolved

radeusgd reviewed Mar 21, 2022

View reviewed changes

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Aggregate_Column_Helper.enso Outdated Show resolved Hide resolved

jdunkerley added 3 commits March 22, 2022 10:56

Fix issues

95dba43

Some unhappy path fixes

2431d8e

Warning tests

720e19e

jdunkerley marked this pull request as ready for review March 22, 2022 12:03

jdunkerley requested a review from 4e6 as a code owner March 22, 2022 12:03

jdunkerley and others added 2 commits March 22, 2022 12:03

Merge branch 'develop' into wip/jd/refactor-aggregate-column

95c2657

Change log

8c62c2f

radeusgd reviewed Mar 22, 2022

View reviewed changes

radeusgd approved these changes Mar 22, 2022

View reviewed changes

Fix build issues

6fca227

jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Mar 22, 2022

mergify bot merged commit 02bcfbb into develop Mar 22, 2022

mergify bot deleted the wip/jd/refactor-aggregate-column branch March 22, 2022 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Aggregate Column #3349

Refactor Aggregate Column #3349

jdunkerley commented Mar 18, 2022 •

edited

Loading

radeusgd left a comment

radeusgd Mar 22, 2022

radeusgd Mar 22, 2022

Refactor Aggregate Column #3349

Refactor Aggregate Column #3349

Conversation

jdunkerley commented Mar 18, 2022 • edited Loading

Pull Request Description

Checklist

radeusgd left a comment

Choose a reason for hiding this comment

radeusgd Mar 22, 2022

Choose a reason for hiding this comment

radeusgd Mar 22, 2022

Choose a reason for hiding this comment

jdunkerley commented Mar 18, 2022 •

edited

Loading