Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate RAM consumption during crash recovery #2139

Closed
beorn7 opened this issue Oct 31, 2016 · 4 comments
Closed

Investigate RAM consumption during crash recovery #2139

beorn7 opened this issue Oct 31, 2016 · 4 comments
Assignees

Comments

@beorn7
Copy link
Member

beorn7 commented Oct 31, 2016

We have received occasional reports of servers OOMing during crash recovery.

Obviously, the checkpoint has to be loaded in its' entirety, but if more is loaded from disk, it could explain the OOMing as no series maintenance or chunk eviction is running. After a quick check, I could only see chunk descs being loaded. In extreme cases, even the relatively small chunk descs might cause an OOM, so unloading chunk descs will definitely be a way to reduce RAM usage during crash recovery.

But there might be other code paths where chunks might be loaded. This has to be investigated more thoroughly.

Obviously, having #447 in place would come in handy.

@matthiasr as discussed earlier today.

@beorn7
Copy link
Member Author

beorn7 commented Nov 24, 2016

Random observation: A beefy Prometheus server seemed to ramp up its RAM usage during rebuilding the metrics index (xxx metrics queued for indexing).

@beorn7
Copy link
Member Author

beorn7 commented Nov 24, 2016

Wild guess: If LevelDB gets a lot of updates, it might run into trouble cleaning up and hogs too much RAM.

@beorn7
Copy link
Member Author

beorn7 commented Apr 3, 2017

I have decided to not tackle the LevelDB issues. This will be hairy at best, and it is going away in v2.0 anyway.
Evicting chunkdescs is however low hanging fruit. I'll create a PR shortly (for the 1.6 release).

@lock
Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant