Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add missing aggregation part 2 #2149

Merged
merged 2 commits into from
Aug 31, 2023
Merged

add missing aggregation part 2 #2149

merged 2 commits into from
Aug 31, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Aug 18, 2023

Add missing support for:

  • Mixed types columns
  • Key of type string on numerical fields

The special aggregation is slower than the integrated one in TermsAggregation and therefore not
chosen by default, although it can cover all use cases.

closes #1789

@codecov-commenter
Copy link

Codecov Report

Merging #2149 (3a93887) into main (52d9e6f) will increase coverage by 0.12%.
Report is 3 commits behind head on main.
The diff coverage is 98.79%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main    #2149      +/-   ##
==========================================
+ Coverage   94.32%   94.45%   +0.12%     
==========================================
  Files         320      321       +1     
  Lines       62032    62424     +392     
==========================================
+ Hits        58514    58961     +447     
+ Misses       3518     3463      -55     
Files Changed Coverage Δ
columnar/src/dynamic_column.rs 70.93% <ø> (+0.40%) ⬆️
src/aggregation/bucket/mod.rs 96.15% <ø> (ø)
src/fastfield/readers.rs 89.25% <ø> (+1.11%) ⬆️
src/query/boost_query.rs 73.17% <ø> (ø)
src/aggregation/intermediate_agg_result.rs 93.85% <25.00%> (-0.82%) ⬇️
columnar/src/column/dictionary_encoded.rs 79.36% <100.00%> (+2.17%) ⬆️
src/aggregation/agg_req_with_accessor.rs 98.11% <100.00%> (+0.36%) ⬆️
src/aggregation/bucket/term_agg.rs 99.07% <100.00%> (+4.17%) ⬆️
src/aggregation/bucket/term_missing_agg.rs 100.00% <100.00%> (ø)
src/aggregation/segment_agg_result.rs 95.58% <100.00%> (+0.20%) ⬆️

... and 4 files with indirect coverage changes

@@ -54,38 +59,68 @@ impl AggregationWithAccessor {
reader: &SegmentReader,
limits: AggregationLimits,
) -> crate::Result<Vec<AggregationWithAccessor>> {
let mut missing_value_term_agg = None;
let mut str_dict_column = None;
let get_agg_with_accessor = |aggs: &mut Vec<AggregationWithAccessor>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have to be inlined? can it be a proper funciton?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:
I suggest we try to stick to a rule for argument lists: the mut stuff at the end of the list of arguments.
This is arbitrary. (that's the rule enforced at Google)

Copy link
Collaborator

@fulmicoton fulmicoton Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like it could have been a function with a clearer prototype:

Fn(.., .., .., .., ..) -> crate::Result<Option<AggregationWithAccessor>> {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closure has access to the parameters and is mainly there to remove code duplication, moving it to its own function has too many parameters imo

fn add_agg_with_accessor(
    aggs: &mut Vec<AggregationWithAccessor>,
    accessor: Column<u64>,
    column_type: ColumnType,
    str_dict_column: Option<StrColumn>,
    missing_value_term_agg: Option<Key>,
    agg: &Aggregation,
    sub_aggregation: &Aggregations,
    reader: &SegmentReader,
    limits: AggregationLimits,
) -> crate::Result<()> {

I refactored it a little bit and added types to the closure instead.

nitpick:
I suggest we try to stick to a rule for argument lists: the mut stuff at the end of the list of arguments.
This is arbitrary. (that's the rule enforced at Google)

I'm not really in favor of that, since this would cause churn when refactoring something to mut for very little gain (at least for how I read code).
I changed it in this case though.

@PSeitz PSeitz requested a review from fulmicoton August 29, 2023 10:33
Add missing support for:
- Mixed types columns
- Key of type string on numerical fields

The special aggregation is slower than the integrated one in TermsAggregation and therefore not
chosen by default, although it can cover all use cases.
@PSeitz PSeitz merged commit b1d8b07 into main Aug 31, 2023
5 checks passed
@PSeitz PSeitz deleted the missing_agg branch August 31, 2023 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add missing parameter on relevant aggregations
3 participants