New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port isolation from old TSDB PR #6841
Conversation
Fixes #1893. |
63805f3
to
07b3db1
Compare
Note: Recently merged #6777 didn't trigger any formal merge conflicts, but it still breaks the tests in here. I'll rebase at my next convenience. This should not touch anything else but tests, as far as I can see, so please go ahead with the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this finally back on the table.
07b3db1
to
2767d66
Compare
Just dropping a note that I would like, once this is ready for merge, to prombench this against master. |
We should prombench as soon as it compiles. I wonder how my memory estimates from 3 years ago hold up. |
Is this documentation worthy ? in tsdb/docs ? |
/benchmark master |
/prombench master |
Something locks up, and then the instances OOM. |
/prombench cancel |
Benchmark cancel is in progress. |
This pull request has been worked on by multiple people in the last years. That alone shows how this feature is needed and also the complexity of this issue. I want to drop a note here that as this is a important and risky change, as release shepherd, this pull request is under my watch and if we do not settle on it (merge it) before the 5th of March, I will put my veto on it for the 2.17 release. That said, we are still 2 weeks away from this date and the prometheus and tsdb maintainers can merge it in the time between without my intervention. |
@beorn7 the prombench has loki logs, so maybe that will help with the debuging |
I had a look at the logs, that's where my conclusions are coming from. My next goal is to reproduce the crash locally so that I don't have to run prombench for eight hours. So far no luck. |
@roidelapluie yes sure. It's your call when the time has come. Currently, we don't even know if this will work at all. |
Looking at this dashboard, the hypothesis is the following: For some reason, the low watermark got stuck around 02:45 UTC. First crash happened a while after the head truncation at 05:00, which apparently ran into trouble (head chunks didn't drop). Things didn't look to bad after that, but then the server went into a tight crash loop. I'll focus my investigation on possible reasons why the low watermark didn't get updated. (Wild guess: A scrape got canceled halfway through.) Still, hints from people who know this better than me highly appreciated. |
/prombench master |
I removed the downsizing code. Prombench has run for ~20 hours now without issues. RAM usage is increased a bit, but not more than before, when the downsizing code was included. Maybe the downsizing code has a bug, which is plausible, but not proven yet. Today I'm busy with other things, so I'll let Prombench run for a few days. |
Benchmark tests are running for 3 days! If this is intended ignore this message otherwise you can cancel it by commenting: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my. I think I see it, but...
Making ingestion of all the samples from a single scrape atomic would solve the problem
Do we know how much of overhead this simple solution would give? Would it really be too slow? Do we have data?
It looks good, I reviewed most of, still can't wrap around the writeID and isolation logic will continue review later. Also, can we remove/clarify commented code? It does not help in review 😄
Some suggestions for now.
/prombench cancel |
Benchmark cancel is in progress. |
Benchmark looks OK, but we have to benchmark the new changes anyway. So canceling the current run. |
We do. I have added it, including amending a test so that it now exposes the bug. |
/prombench master |
Prombench results:
This is now much higher than before, but I believe due to the buggy "cleanup everything while no reads are in progress" we had before, we didn't really do the full story before. |
I think those values are still within expectations, based on what @brian-brazil said above. This will still, I guess, be noticed painfully by users with high-load tight-resource setups. |
Signed-off-by: beorn7 <beorn@grafana.com>
This has been ported from prometheus-junkyard/tsdb#306. Original implementation by @brian-brazil, explained in detail in the 2nd half of this talk: https://promcon.io/2017-munich/talks/staleness-in-prometheus-2-0/ The implementation was then processed by @gouthamve into the PR linked above. Relevant slide deck: https://docs.google.com/presentation/d/1-ICg7PEmDHYcITykD2SR2xwg56Tzf4gr8zfz1OerY5Y/edit?usp=drivesdk Signed-off-by: beorn7 <beorn@grafana.com> Co-authored-by: Brian Brazil <brian.brazil@robustperception.io> Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
b9e2af6
to
7f30b09
Compare
Rebased and squashed. Last call for objections, otherwise I'll merge in about an hour or so… |
/prombench cancel |
Benchmark cancel is in progress. |
Hooray 🎉🎉🎉🎉 |
@beorn7 @brian-brazil can we agree on a sentence to put at the beginning of the release notes about this? Especially something about the memory increase. |
I'm also running this now on a moderately loaded production server for comparison (500k series, 30k samples/s, very low query volume). The increase in RAM consumption is even more pronounced here: The peak value of I'll do a bit of heap analysis at my next convenience to see if there are low hanging fruit. |
This should definitely come with a big warning in the release notes. It will anyway be a hard sell, given that only very few users will have ever noticed the problems with isolation. I'd also keep reverting this on the agenda. We need to give it some thought… |
Peak |
I think I have found the bug: When we replay the WAL, we dutifully append every sample with appendID 0. Those are all cleaned up after the next commit, but the ring buffer has then already grown to accommodate all samples in the WAL, and it will never shrink again. The solution should be easy: appendID==0 should never be recorded. PR in preparation… |
I still think it might be worthy to have the downsizing code again |
Let's see how it turns out in practice. Downsizing could easily let to oscillating behavior, which would be worse than no downsizing (spikier memory usage, more allocations, slower appends). |
I think this could fix #4580 as well. |
The original PR was prometheus-junkyard/tsdb#306 .
I tried to carefully adjust to the new world order, but please give this a very careful review, especially around iterator reuse (marked with a TODO).
On the bright side, I definitely found and fixed a bug in txRing.