Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing configurable throttling for snapshot writing to reserve disk IO for log appends #9623

Open
banks opened this issue Jan 22, 2021 · 0 comments
Labels
theme/performance Performance benchmarking or potential improvement theme/reliability

Comments

@banks
Copy link
Member

banks commented Jan 22, 2021

Background

This is a follow up to several incidents where the failure described in #9609 was the root cause.

Currently raft writes out complete state snapshots periodically. For large state stores (e.g. 1GB or more) this can result in large amounts of diskIO. Since Consul only has a single "data dir" configurable, the snapshot file is almost always being written to the same physical device that the raft log store is on which means large snapshot writes that require a lot of IO operations often interfere with raft Append times as they contend for disk IO. Slowing log appends slows down commit times overall on the cluster especially when multiple servers are snapshotting at the same time which gets increasingly likely with high write rates.

While raft attempts to mitigate slow log appends from impacting heartbeat processing, we've observed that disk IO issues does still often cause cluster instability. Even if we find why that is and fix it, snapshotting can negatively affect write throughput overall as has been demonstrated in some public Consul KV benchmarks in the past.

This proposal is to be investigated as a possible "easy win" to reduce the impact snapshot writing has on log appends. Some time-boxed research on whether this reduces appendEntries tail-latency when writing snapshots with a very basic implementations is important before we both doing all the extra plumbing work etc. We might combine the investigation with #9620 as the test environment and code paths to change are essentially the same.

Proposal

Introduce a configurable rate limit for writing the snapshot. I imagine we'd need to choose a "chunk size" and then have the FileSnapshot return an io.WriteCloser that wraps the existing buffered file in another type that will check the write rate and sleep to prevent exceeding writing more than that rate to the underlying file.

It's not clear how well this would work and will certainly need to be tuned by operators so investigating whether it has an obvious affect on append latency or overall throughput when the snapshot is large is important. If we do it we should also document some guidance on tuning the parameters.

If we decide this is worth adding to improve performance or just as a way to get control over a cluster in overload situations, we should try to make the throttling configuration hot-reloadable to avoid needing restarts when the cluster is already in an unhealthy state.

@banks banks added theme/performance Performance benchmarking or potential improvement theme/reliability labels Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/performance Performance benchmarking or potential improvement theme/reliability
Projects
None yet
Development

No branches or pull requests

1 participant