Port remaining parallelizable aggregate functions to off-heap data structures #4120
Open
3 of 6 tasks
Labels
Enhancement
Enhance existing functionality
Help wanted
Assistance or additional information is wanted
Performance
Performance improvements
SQL
Issues or changes relating to SQL execution
Is your feature request related to a problem?
#4097 ported
min(str)
,max(str)
, as well ascount_distinct()
for long, int, and IPv4 types to parallel GROUP BY, but some functions remain unported. Namely:count_distinct(uuid)
: requires a new long128 hash set, similar to theGroupByLongHashSet
onecount_distinct(long256)
: requires a new long256 hash set, similar to theGroupByLongHashSet
oneapprox_percentile(double)
: this one is tricky as we'll have to port HdrHistogram to become off-heap and flyweightstring_agg(str)
: we already haveGroupByCharSink
which we can use herefirst
/last
andfirst_not_null
/last_not_null
functions: to port them, we'll have to access and store row ids in the group by mapisOrdered(IPv4)
/isOrdered(long)
functions: again, we need to track row idsThere is also
count_distinct(symbol)
, but we have early exit logic in that function (see #3974), so we don't want to port it, at least for now.Describe the solution you'd like.
No response
Describe alternatives you've considered.
No response
Full Name:
Andrei Pechkurov
Affiliation:
QuestDB
Additional context
No response
The text was updated successfully, but these errors were encountered: