Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Sort Profiles across row groups when flushing blocks #803

Merged
merged 8 commits into from
Jul 4, 2023

Conversation

cyriltovena
Copy link
Collaborator

@cyriltovena cyriltovena commented Jun 27, 2023

Fixes #800

This k-way merge rowgroups on disk when we flush to a full block. Which means now a single series is expected to appear in only on rowgroup.

The only caveat if we only cut the final file rowgroup per rowcount but I think we can live with this since we never cut by size.

Base automatically changed from improve-memory-layout-2 to main June 27, 2023 20:28
@cyriltovena
Copy link
Collaborator Author

This is already running in dev everything looks good
image

Seems like finding values in using row.Range is the new CPU additions. Nothing problematic but I want to see if we can do better.

@cyriltovena
Copy link
Collaborator Author

I was able to speed this up quite significantly

❯ benchstat before.txt after.txt
name            old time/op    new time/op    delta
ProfileRows-16     310ns ± 3%      31ns ± 5%  -90.05%  (p=0.008 n=5+5)

name            old alloc/op   new alloc/op   delta
ProfileRows-16     0.00B          0.00B          ~     (all equal)

name            old allocs/op  new allocs/op  delta
ProfileRows-16      0.00           0.00          ~     (all equal)

Copy link
Contributor

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 109 to 116
n, err := r.reader.ReadRows(r.buff)
if n == 0 {
return false
}
if err != nil && err != io.EOF {
r.err = err
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check error before n == 0 to not miss it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point

@cyriltovena cyriltovena enabled auto-merge (squash) July 4, 2023 11:22
@cyriltovena cyriltovena merged commit 466ffdd into main Jul 4, 2023
17 checks passed
@cyriltovena cyriltovena deleted the feat/sort-profiles branch July 4, 2023 11:31
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jul 18, 2023
)

* Sort Profiles across row groups when flushing blocks

* Refactor and properly tests parquet reader and writer

* moar tests

* Fixes bad uint32 to int32 conversion causing maxval to be min

* add working tests back

* add missing  tests back

* improve less function for profile rows

* check error first
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort Rows by Series when flushing to disk
2 participants