Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsdb: turn off transaction isolation for head compaction #11317

Merged
merged 4 commits into from
Sep 27, 2022

Conversation

bboreham
Copy link
Member

This saves a lot of work tracking appends done while compaction is ongoing.

head.compactable() ensures that the end of the compaction range is more than chunkRange/2 back from now, and head.appendableMinValidTime() ensures that no new appends can start within the compaction range.

We do need to wait for any overlapping appenders that started previously to finish.

Fixes #9253.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
So that we can see when appending has moved past a certain point in time.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This will be used when for head compaction.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham
Copy link
Member Author

CI failing on Windows only:

=== RUN   TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint
    head_test.go:1028: 
        	Error Trace:	D:\a\prometheus\prometheus\tsdb\head_test.go:1028
        	Error:      	Not equal: 
        	            	expected: 9999
        	            	actual  : 13851
        	Test:       	TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint
--- FAIL: TestDeletedSamplesAndSeriesStillInWALAfterCheckpoint (0.31s)

It seems to have read more data than was written?

This saves a lot of work tracking appends done while compaction is ongoing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham
Copy link
Member Author

I did a git commit --amend with no change and the test passes now, so I'll call that one a flake.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have a very big impact on some of our instances that fall over into a memory death spiral.

@beorn7
Copy link
Member

beorn7 commented Sep 20, 2022

🎉

I think this should be looked at by @codesome as the most qualified person. I'll have a detailed look ASAP, too, but if there is anything wrong with this, @codesome is much more likely to spot it.

@beorn7 beorn7 mentioned this pull request Sep 27, 2022
8 tasks
@codesome codesome merged commit ff00dee into prometheus:main Sep 27, 2022
@bboreham
Copy link
Member Author

bboreham commented Nov 1, 2022

This was the result on one of the production systems at my work: memory usage dropped by half.
image

https://twitter.com/_codesome/status/1577264183208398848

roidelapluie pushed a commit to roidelapluie/prometheus that referenced this pull request Nov 23, 2022
…11317)

* tsdb: add a basic test for read/write isolation

* tsdb: store the min time with isolationAppender
So that we can see when appending has moved past a certain point in time.

* tsdb: allow RangeHead to have isolation disabled
This will be used when for head compaction.

* tsdb: do head compaction with isolation disabled
This saves a lot of work tracking appends done while compaction is ongoing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham deleted the disisolate-compaction branch August 30, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory Leak in v2.27.1: tsdb.txRing consumes much heap memory
4 participants