Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port remaining parallelizable aggregate functions to off-heap data structures #4120

Open
3 of 6 tasks
puzpuzpuz opened this issue Jan 11, 2024 · 2 comments
Open
3 of 6 tasks
Labels
Enhancement Enhance existing functionality Help wanted Assistance or additional information is wanted Performance Performance improvements SQL Issues or changes relating to SQL execution

Comments

@puzpuzpuz
Copy link
Contributor

puzpuzpuz commented Jan 11, 2024

Is your feature request related to a problem?

#4097 ported min(str), max(str), as well as count_distinct() for long, int, and IPv4 types to parallel GROUP BY, but some functions remain unported. Namely:

  • count_distinct(uuid): requires a new long128 hash set, similar to the GroupByLongHashSet one
  • count_distinct(long256): requires a new long256 hash set, similar to the GroupByLongHashSet one
  • approx_percentile(double): this one is tricky as we'll have to port HdrHistogram to become off-heap and flyweight
  • string_agg(str): we already have GroupByCharSink which we can use here
  • all first/last and first_not_null/last_not_null functions: to port them, we'll have to access and store row ids in the group by map
  • isOrdered(IPv4)/isOrdered(long) functions: again, we need to track row ids

There is also count_distinct(symbol), but we have early exit logic in that function (see #3974), so we don't want to port it, at least for now.

Describe the solution you'd like.

No response

Describe alternatives you've considered.

No response

Full Name:

Andrei Pechkurov

Affiliation:

QuestDB

Additional context

No response

@puzpuzpuz puzpuzpuz added Enhancement Enhance existing functionality Help wanted Assistance or additional information is wanted SQL Issues or changes relating to SQL execution Performance Performance improvements labels Jan 11, 2024
@nwoolmer
Copy link
Contributor

I would like to pick up the long128/long256 tasks, thank you!

@puzpuzpuz
Copy link
Contributor Author

@nwoolmer that's awesome, thank you! Look forward to your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Enhance existing functionality Help wanted Assistance or additional information is wanted Performance Performance improvements SQL Issues or changes relating to SQL execution
Projects
None yet
Development

No branches or pull requests

2 participants