Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Aggregate Column #3349

Merged
merged 13 commits into from
Mar 22, 2022
Merged

Refactor Aggregate Column #3349

merged 13 commits into from
Mar 22, 2022

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Mar 18, 2022

Pull Request Description

  • Make it easier to understand the computations.
  • Fix issue with First.
  • Improve quote handling in Concatenate
  • Added validation and warnings to input

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

@jdunkerley jdunkerley force-pushed the wip/jd/refactor-aggregate-column branch from b2726c6 to 761aff6 Compare March 21, 2022 16:14
@jdunkerley jdunkerley marked this pull request as ready for review March 22, 2022 12:03
@jdunkerley jdunkerley requested a review from 4e6 as a code owner March 22, 2022 12:03
Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, some thoughts on warning-handling.

I think we should replace create_closure for resolve_columns, if we chose to go that way - as would be good to be consistent in how we do it across functions/sub-libs.

## Given a Table and a Column create an aggregator
new : Table->Aggregate_Column->Aggregate_Column_Aggregator
new table column =
create_closure c function:(Column->Any->Integer->Any) = function (column.resolve_column table c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm, I don't like this name - it doesn't really tell what it does.

More importantly - this function is not necessary anymore, at least once we also merge the resolve_columns.

Maybe you could borrow the resolve_columns from my PR to have it here already. Because I think we both agreed that the pattern to use here is to first do aggregate_column.resolve_columns and then all the other helper functions can assume that they only get these already-resolved aggregates - and thus the create_closure function would be obsolete because it would boil down to function c.

Comment on lines +569 to +591
Test.specify "should raise warnings when an invalid column index and no valid output" <|
action = table.aggregate [Group_By -3] on_problems=_
problems = [Column_Indexes_Out_Of_Range [-3], No_Output_Columns]
tester = expect_column_names []
Problems.test_problem_handling action problems tester

Test.specify "should raise a warning when an invalid output name" <|
action = table.aggregate [Group_By "Index" ""] on_problems=_
problems = [Invalid_Output_Column_Names [""]]
tester = expect_column_names ["Column"]
Problems.test_problem_handling action problems tester

Test.specify "should raise a warning when a duplicate column name" <|
action = table.aggregate [Group_By "Index", Group_By 0] on_problems=_
problems = [Duplicate_Output_Column_Names ["Index"]]
tester = expect_column_names ["Index", "Index_1"]
Problems.test_problem_handling action problems tester

Test.specify "should raise a warning when a duplicate column name and rename default names first" <|
action = table.aggregate [Group_By "Value", Group_By "Index" "Value"] on_problems=_
problems = [Duplicate_Output_Column_Names ["Value"]]
tester = expect_column_names ["Value_1", "Value"]
Problems.test_problem_handling action problems tester
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be no tests for aggregates besides the special one Group_By and it is a bit different from others, so I think would be worth to make some tests use Sum or Count.

More importantly - what do we do in situations like [Count_Distinct "Foo" ignore_nothing=True, Count_Distinct "Foo" ignore_nothing=False]? They get the same name (fair, makes sense) but compute conceptually different things - so the warning may be confusing to users.

@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Mar 22, 2022
@mergify mergify bot merged commit 02bcfbb into develop Mar 22, 2022
@mergify mergify bot deleted the wip/jd/refactor-aggregate-column branch March 22, 2022 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants