Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] tsdb/wlog.Checkpoint: Fix counting of histogram samples in stats. #13776

Merged
merged 1 commit into from Mar 26, 2024

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Mar 15, 2024

I realized that histogram samples are wrongly counted in tsdb/wlog.Checkpoint, in that float samples are counted against stats.TotalSamples and stats.DroppedSamples respectively instead.

This PR rectifies the bug, and augments TestCheckpoint to verify that stats.TotalSamples and stats.DroppedSamples are correctly updated.

That said, I don't think any callers of Checkpoint use the stats return value. Should it be dropped?

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@aknuds1 aknuds1 marked this pull request as ready for review March 15, 2024 09:56
@machine424
Copy link
Collaborator

Good catch!
It wasn't used even in the PR that introduced it prometheus-junkyard/tsdb#332, maybe it was meant for debugging/dev?
Maybe some external clients could be relying on it, it has been around for a long time.

@aknuds1
Copy link
Contributor Author

aknuds1 commented Mar 15, 2024

@machine424 I was also thinking it's usedmeant for debugging/dev. Do you know if it's an omission that Checkpoint doesn't check record.FloatHistogramSamples? Looks to me as if it should, but I don't know what the consequences are.

@machine424
Copy link
Collaborator

Good question, I'm not familiar with that part of the code, I'll need to take a look.

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @jesusvazquez can help with your questions?

@beorn7
Copy link
Member

beorn7 commented Mar 19, 2024

I approve the fix, but let's hold back merging this until we have an idea if the stats can be ripped out entirely.

@bboreham
Copy link
Member

Big picture, I want more stats about checkpoint. Quite often when I look at big memory usage, especially in Agent mode, it seems associated with the TSDB checkpoint, but it's hard to see what happened. Checkpoints only happen every 2 hours, and usually it takes days for the data to build up.

I guess I should try to write a PR that adds the information I would find useful.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hello from the bug scrub meeting!)

LGTM, and as per question if this should be removed, I think we have cases to utilize this more in the next PRs, so let's merge. Thanks! (also we need float histograms indeed)

@bwplotka bwplotka changed the title tsdb/wlog.Checkpoint: Fix counting of histogram samples [BUGFIX] tsdb/wlog.Checkpoint: Fix counting of histogram samples in stats. Mar 26, 2024
@bwplotka bwplotka merged commit de05d5e into prometheus:main Mar 26, 2024
24 checks passed
@aknuds1 aknuds1 deleted the arve/checkpoint-histogram-samples branch March 26, 2024 11:33
@aknuds1
Copy link
Contributor Author

aknuds1 commented Mar 26, 2024

also we need float histograms indeed

@bwplotka I can open another PR to count also float histograms then.

@bwplotka
Copy link
Member

Totally

@aknuds1
Copy link
Contributor Author

aknuds1 commented Mar 26, 2024

Made PR to make Checkpoint handle float histograms @bwplotka.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants