Port remaining parallelizable aggregate functions to off-heap data structures #4120

puzpuzpuz · 2024-01-11T12:35:32Z

Is your feature request related to a problem?

#4097 ported min(str), max(str), as well as count_distinct() for long, int, and IPv4 types to parallel GROUP BY, but some functions remain unported. Namely:

count_distinct(uuid): requires a new long128 hash set, similar to the GroupByLongHashSet one
count_distinct(long256): requires a new long256 hash set, similar to the GroupByLongHashSet one
approx_percentile(double): this one is tricky as we'll have to port HdrHistogram to become off-heap and flyweight
string_agg(str): we already have GroupByCharSink which we can use here
all first/last and first_not_null/last_not_null functions: to port them, we'll have to access and store row ids in the group by map
isOrdered(IPv4)/isOrdered(long) functions: again, we need to track row ids

There is also count_distinct(symbol), but we have early exit logic in that function (see #3974), so we don't want to port it, at least for now.

Describe the solution you'd like.

No response

Describe alternatives you've considered.

No response

Full Name:

Andrei Pechkurov

Affiliation:

QuestDB

Additional context

No response

The text was updated successfully, but these errors were encountered:

nwoolmer · 2024-01-16T12:41:12Z

I would like to pick up the long128/long256 tasks, thank you!

puzpuzpuz · 2024-01-16T12:52:18Z

@nwoolmer that's awesome, thank you! Look forward to your contribution.

puzpuzpuz added Enhancement Enhance existing functionality Help wanted Assistance or additional information is wanted SQL Issues or changes relating to SQL execution Performance Performance improvements labels Jan 11, 2024

nwoolmer mentioned this issue Jan 17, 2024

perf(sql): support uuid and long256 in parallel GROUP BY #4140

Merged

nwoolmer mentioned this issue Mar 15, 2024

AssertionError with WHERE clause containing 4 or more AND clauses with =, >= and <= operators *Bug* #4187

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port remaining parallelizable aggregate functions to off-heap data structures #4120

Port remaining parallelizable aggregate functions to off-heap data structures #4120

puzpuzpuz commented Jan 11, 2024 •

edited

nwoolmer commented Jan 16, 2024

puzpuzpuz commented Jan 16, 2024

Port remaining parallelizable aggregate functions to off-heap data structures #4120

Port remaining parallelizable aggregate functions to off-heap data structures #4120

Comments

puzpuzpuz commented Jan 11, 2024 • edited

Is your feature request related to a problem?

Describe the solution you'd like.

Describe alternatives you've considered.

Full Name:

Affiliation:

Additional context

nwoolmer commented Jan 16, 2024

puzpuzpuz commented Jan 16, 2024

puzpuzpuz commented Jan 11, 2024 •

edited