Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Harden index load against failures on startup #2038

Merged
merged 4 commits into from
Aug 11, 2022

Conversation

shanson7
Copy link
Collaborator

After a cassandra crash, invalid index entries caused MT to crash repeatedly. Additionally, there were spurious timeouts and metrictank cannot start if any partition failed to load. With even a low chance of partition failure (e.g. 10%) and 16 partitions to load, it's likely Metrictank will fail to start several times. Add some retries so that it's much more likely all partitions succeed.

@shanson7
Copy link
Collaborator Author

Note: we've been running this in prod for a long time now and have seen a large reduction in startup failures.

@robert-milan robert-milan merged commit 6599d81 into grafana:master Aug 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants