Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support grouping multiple rows of first match in an interval #347

Closed
metalmatze opened this issue Feb 15, 2023 · 4 comments
Closed

Support grouping multiple rows of first match in an interval #347

metalmatze opened this issue Feb 15, 2023 · 4 comments
Labels
enhancement New feature or request Stale

Comments

@metalmatze
Copy link
Member

Right now Parca queries FrostDB for all the timestamps and on the server side chooses to ignore metric samples when a sample has been found in the step's bucket.

https://github.com/parca-dev/parca/blob/ed90dbeb684186e9cdb295bc0f62c723ed3c5a9f/pkg/parcacol/querier.go#L571-L579

This isn't ideal since the data is still queried...

FrostDB should add support to find all rows for a bucket of interval and only then start aggregating.

The row data would look like this:

timestamp stacktrace value
1 stack1 3
1 stack2 5
1 stack3 8
2 stack1 2
2 stack2 3

What we want. We want to only return the sum(value) for the first timestamp we find in the bucket that ranges from 0-9.
So what we want as a result is:

timestamp sum(value)
1 16

What we currently get is however the very first row that falls within that bucket (1, stack1, 3) and then the sum of it's value, which is basically a noop:

timestamp sum(value)
1 3
@metalmatze metalmatze added the enhancement New feature or request label Feb 15, 2023
@metalmatze metalmatze changed the title Support Support grouping multiple rows of first match in an interval Feb 15, 2023
@metalmatze
Copy link
Member Author

This issue is especially bad for Parca itself. On our cloud with the aggregate_view table, it shouldn't be such a bad problem for now.

@albertlockett
Copy link
Contributor

albertlockett commented Aug 28, 2023

@metalmatze As far as I an tell the Duration aggregation does what we want. I added a test to demonstrate.
https://github.com/polarsignals/frostdb/pull/503/files

That said, I think this is different than what we do here in parca, where we take the first result in the interval.
parca-dev/parca#2598

I'm wondering if maybe what we want is a First aggregation instead like this (or something like it)?
#504

Copy link

github-actions bot commented Jan 4, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 4, 2024
@metalmatze metalmatze removed the Stale label Jan 9, 2024
Copy link

github-actions bot commented Feb 9, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Feb 9, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants