-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add operational guide on tuning availability, consistency, and durability + make ignoring corrupt commitlogs on bootstrap default in all sample YAMLs #1491
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1491 +/- ##
=========================================
- Coverage 70.9% 43.5% -27.4%
=========================================
Files 842 829 -13
Lines 71918 70192 -1726
=========================================
- Hits 51021 30570 -20451
- Misses 17561 36741 +19180
+ Partials 3336 2881 -455
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really great - thanks for writing this up! a few requests and a bunch of nits (some super pedantic) - feel free to take or leave most of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking great! thanks for the updates - a couple of super minor nits and then i think this is good to go.
|
||
### Client Write and Read Consistency | ||
|
||
The possible configuration values for write and read consistency are discussed in more detail [in this section](../m3db/architecture/consistencylevels.md) of the documentation, but in short M3DB behaves similarly to other H.A systems with configurable consistency such as Cassandra that allow the caller to control the consistency level of writes and reads from the client. | ||
The possible configuration values for write and read consistency are discussed in more detail in [the Consistency Levels section](../m3db/architecture/consistencylevels.md). In short, M3DB behaves similarly to other HA systems with configurable consistency such as Cassandra that allow the caller to control the consistency level of writes and reads from the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
with configurable consistency, such as Cassandra, which allow [...]
|
||
### Commitlog Configuration | ||
|
||
By default M3DB runs with an asynchronous commitlog such that writes will be acknowleged as successful by the client even though the data may not have been physically flushed to the commitlog on disk yet. M3DB supports changing this default behavior to run the commitlog synchronously, but this is not currently exposed to users in the YAML configuration and generally leads to a massive performance degradation. | ||
By default M3DB runs with an asynchronous commitlog such that writes will be reported as successful by the client, though the data may not have been flushed to disk yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
By default, M3DB [...]
|
||
### Commitlog Configuration | ||
|
||
By default M3DB runs with an asynchronous commitlog such that writes will be reported as successful by the client, though the data may not have been flushed to disk yet. | ||
M3DB supports changing this default behavior to run the commitlog synchronously, but this is not currently exposed to users in the YAML configuration and generally leads to a massive performance degradation. | ||
We recommend running M3DB with an asynchronous commitlog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth explaining this a little more concretely. there is a config snippet below, but it's not immediately clear how exactly that controls the (a)synchronicity of the commitlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
This instructs M3DB to handle writes for new timeseries (for a given time block) asynchronously. Creating a new timeseries in memory is much more expensive than simply appending a new write to an existing series, so the default configuration of creating them asynchronously improves M3DBs write throughput significantly when many new series are being created all at once. | ||
|
||
However, since new time series are created asynchronously, its possible that there may be a brief delay inbetween when a write is acknowledged by the client and when that series becomes available for subsequent reads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suspect that that's the case on average simply because
mean(client->server unidirectional read latency) ≤ mean(flush interval + write() + fsync() latency).
correct? (this is getting super pedantic, sorry - i don't think it's necessarily critical to call the distinction out here, this is more for my own edification.)
1. 524288 or more bytes have been written since the last time M3DB flushed the commitlog. | ||
2. One or more seconds has elapsed since the last time M3DB flushed the commitlog. | ||
|
||
In addition, the configuration also states that M3DB should allow up to `2097152` writes to be buffered in the commitlog queue before the database node will begin rejecting incoming writes so it can attempt to drain the queue and catch up. Increasing the size of this queue can often increase the write throughput of an M3DB node at the cost of potentially losing more data if the node experiences a sudden failure like a hard crash or power loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drain the queue and catch up
i don't think (but am potentially just not seeing) that we've documented/diagrammed the architectural details to explain how the queue works (maybe that'd be overkill), but if we ever add that, this should probably link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately we don't have that right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't think so - just a note for the future, then. :)
writeNewSeriesLimitPerSecond: 1048576 | ||
``` | ||
|
||
This value can be set much lower than the default value for workloads in which a significant increase in cardinality usually indicates an abusive caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abusive caller
i'd recommend changing this to "misbehaving", as abuse implies malicious intent. more likely, it's a caller that doesn't understand the goals or limitations of the system.
don't want to get too tied up in semantics, but if we use words like "abusive", "misbehaving", etc, we might want to very clearly call out or link to expectations (e.g. wrt dimensionality, cardinality, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misbehaving in this context would depend on your setup and workload. I.E at Uber it might be someone emitting UUIDs in a spark job, but for someone elses setup that might be the intended use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood - still think we should switch to "misbehaving" instead of "abusive" unless you disagree with the semantics there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep i did
### Client Write and Read consistency | ||
|
||
We recommend running the client with `writeConsistencyLevel` set to `majority` and `readConsistencyLevel` set to `unstrict_majority`. | ||
This means that all write must be acknowledged by a quorums of nodes in order to be considered succesful, and that reads will attempt to achieve quorum, but will return the data from a single node if they are unable to achieve quorum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give a brief example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an extra sentence
|
||
### Commitlog Configuration | ||
|
||
By default M3DB runs with an asynchronous commitlog such that writes will be reported as successful by the client, though the data may not have been flushed to disk yet. | ||
M3DB supports changing this default behavior to run the commitlog synchronously, but this is not currently exposed to users in the YAML configuration and generally leads to a massive performance degradation. | ||
We recommend running M3DB with an asynchronous commitlog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
This configuration states that the commitlog should be flushed whenever either of the following is true: | ||
|
||
1. 524288 or more bytes have been written since the last time M3DB flushed the commitlog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 524288
and 2097152
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its just the default we've always used. Presumably @robskillington did some benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're 2^19 and 2^21, respectively, if that helps.
|
||
### Ignoring Corrupt Commitlogs on Bootstrap | ||
|
||
As described in the "Tuning for Performance and Availability" section, we recommend configuring M3DB to ignore corrupt commitlog files on bootstrap. However, if you want to avoid any amount of inconsistency or data loss, no matter how minor, then you should configure M3DB to return unfulfilled when the commitlog bootstrapper encounters corrupt commitlog files. You can do so by modifying your configuration to look like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm do we need both this section and ### Ignoring Corrupt Commitlogs on Bootstrap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally following. I have the subheading twice, once under availability section and once under consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome so far, LGTM once the other discussions are resolved.
3e7121b
to
351879b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
No description provided.