Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed aggregation on sparse grouping columns #1068

Merged
merged 5 commits into from
Nov 22, 2023
Merged

Conversation

JohanMabille
Copy link
Collaborator

Reference Issues/PRs

Second part of #1007
Fixes #528

What does this implement or fix?

  • Fixes aggregations when grouping on a sparse column
  • Adds C++ tests for this use case

Any other comments?

  • Limitation: this requires to call mark_absent_rows on the grouping column, which is fragile. More generally, Column constructors should force the user to pass the size of the column for sparse columns.

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

Copy link
Collaborator

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @JohanMabille.

LGTM modulo a few comment.

Here are some comments. I think I would wait for someone else's approval before pursuing, because I do not have yet an entire comprehensive knowledge of Column's internals.

Also, is there a non-regression test we could craft for #528?

cpp/arcticdb/processing/test/test_clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Show resolved Hide resolved
cpp/arcticdb/processing/test/test_clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/test/test_clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/test/test_clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/test/test_clause.cpp Show resolved Hide resolved
{
iter = std::make_optional(input_data.bit_vector()->first());
// We use 0 for the missing value group id
next_group_id++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I cannot tell if this can invalidate some potential implicit invariant the aggregation clause or its tests might have been relying one.

@JohanMabille JohanMabille force-pushed the sparse_aggregation branch 3 times, most recently from d5bc06b to ae94957 Compare November 20, 2023 18:38
@@ -304,6 +304,20 @@ Composite<ProcessingUnit> AggregationClause::process(std::shared_ptr<Store> stor
// 11.01 seconds with caching
// Not worth worrying about right now
robin_hood::unordered_flat_map<RawType, size_t> offset_to_group;

bool is_sparse = col.column_->is_sparse();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be const, although I suspect it won't make a lot of difference

@JohanMabille JohanMabille merged commit 774483a into master Nov 22, 2023
131 checks passed
@JohanMabille JohanMabille deleted the sparse_aggregation branch November 22, 2023 11:46
@JohanMabille
Copy link
Collaborator Author

Sparsy Sparsy Sparsy me
I'm a lucky kind of Column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupBy and aggregation will not work with sparse aggregation columns
3 participants