Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operational guide on tuning availability, consistency, and durability + make ignoring corrupt commitlogs on bootstrap default in all sample YAMLs #1491

Merged
merged 10 commits into from
Mar 26, 2019

Conversation

richardartoul
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Mar 23, 2019

Codecov Report

Merging #1491 into master will decrease coverage by 27.3%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1491      +/-   ##
=========================================
- Coverage    70.9%   43.5%   -27.4%     
=========================================
  Files         842     829      -13     
  Lines       71918   70192    -1726     
=========================================
- Hits        51021   30570   -20451     
- Misses      17561   36741   +19180     
+ Partials     3336    2881     -455
Flag Coverage Δ
#aggregator 58.3% <0%> (-24.1%) ⬇️
#cluster 30.2% <0%> (-55.7%) ⬇️
#collector 39.1% <0%> (-24.6%) ⬇️
#dbnode 68.6% <0%> (-12.2%) ⬇️
#m3em 44.1% <0%> (-29.1%) ⬇️
#m3ninx 48.9% <0%> (-25.4%) ⬇️
#m3nsch 100% <0%> (+48.8%) ⬆️
#metrics 17.5% <0%> (ø) ⬆️
#msg 74.9% <0%> (ø) ⬆️
#query 1.5% <0%> (-64.6%) ⬇️
#x 42.3% <0%> (-34.5%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caff9d4...6b5836f. Read the comment docs.

Copy link
Collaborator

@mway mway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really great - thanks for writing this up! a few requests and a bunch of nits (some super pedantic) - feel free to take or leave most of them.

Copy link
Collaborator

@mway mway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking great! thanks for the updates - a couple of super minor nits and then i think this is good to go.


### Client Write and Read Consistency

The possible configuration values for write and read consistency are discussed in more detail [in this section](../m3db/architecture/consistencylevels.md) of the documentation, but in short M3DB behaves similarly to other H.A systems with configurable consistency such as Cassandra that allow the caller to control the consistency level of writes and reads from the client.
The possible configuration values for write and read consistency are discussed in more detail in [the Consistency Levels section](../m3db/architecture/consistencylevels.md). In short, M3DB behaves similarly to other HA systems with configurable consistency such as Cassandra that allow the caller to control the consistency level of writes and reads from the client.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

with configurable consistency, such as Cassandra, which allow [...]


### Commitlog Configuration

By default M3DB runs with an asynchronous commitlog such that writes will be acknowleged as successful by the client even though the data may not have been physically flushed to the commitlog on disk yet. M3DB supports changing this default behavior to run the commitlog synchronously, but this is not currently exposed to users in the YAML configuration and generally leads to a massive performance degradation.
By default M3DB runs with an asynchronous commitlog such that writes will be reported as successful by the client, though the data may not have been flushed to disk yet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

By default, M3DB [...]


### Commitlog Configuration

By default M3DB runs with an asynchronous commitlog such that writes will be reported as successful by the client, though the data may not have been flushed to disk yet.
M3DB supports changing this default behavior to run the commitlog synchronously, but this is not currently exposed to users in the YAML configuration and generally leads to a massive performance degradation.
We recommend running M3DB with an asynchronous commitlog.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth explaining this a little more concretely. there is a config snippet below, but it's not immediately clear how exactly that controls the (a)synchronicity of the commitlog.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


This instructs M3DB to handle writes for new timeseries (for a given time block) asynchronously. Creating a new timeseries in memory is much more expensive than simply appending a new write to an existing series, so the default configuration of creating them asynchronously improves M3DBs write throughput significantly when many new series are being created all at once.

However, since new time series are created asynchronously, its possible that there may be a brief delay inbetween when a write is acknowledged by the client and when that series becomes available for subsequent reads.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suspect that that's the case on average simply because

mean(client->server unidirectional read latency) ≤ mean(flush interval + write() + fsync() latency).

correct? (this is getting super pedantic, sorry - i don't think it's necessarily critical to call the distinction out here, this is more for my own edification.)

1. 524288 or more bytes have been written since the last time M3DB flushed the commitlog.
2. One or more seconds has elapsed since the last time M3DB flushed the commitlog.

In addition, the configuration also states that M3DB should allow up to `2097152` writes to be buffered in the commitlog queue before the database node will begin rejecting incoming writes so it can attempt to drain the queue and catch up. Increasing the size of this queue can often increase the write throughput of an M3DB node at the cost of potentially losing more data if the node experiences a sudden failure like a hard crash or power loss.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drain the queue and catch up

i don't think (but am potentially just not seeing) that we've documented/diagrammed the architectural details to explain how the queue works (maybe that'd be overkill), but if we ever add that, this should probably link to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately we don't have that right now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't think so - just a note for the future, then. :)

writeNewSeriesLimitPerSecond: 1048576
```

This value can be set much lower than the default value for workloads in which a significant increase in cardinality usually indicates an abusive caller.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abusive caller

i'd recommend changing this to "misbehaving", as abuse implies malicious intent. more likely, it's a caller that doesn't understand the goals or limitations of the system.

don't want to get too tied up in semantics, but if we use words like "abusive", "misbehaving", etc, we might want to very clearly call out or link to expectations (e.g. wrt dimensionality, cardinality, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misbehaving in this context would depend on your setup and workload. I.E at Uber it might be someone emitting UUIDs in a spark job, but for someone elses setup that might be the intended use case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood - still think we should switch to "misbehaving" instead of "abusive" unless you disagree with the semantics there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep i did

@mway mway added the area:documentation All issues pertaining to usability and documentation label Mar 24, 2019
### Client Write and Read consistency

We recommend running the client with `writeConsistencyLevel` set to `majority` and `readConsistencyLevel` set to `unstrict_majority`.
This means that all write must be acknowledged by a quorums of nodes in order to be considered succesful, and that reads will attempt to achieve quorum, but will return the data from a single node if they are unable to achieve quorum.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give a brief example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an extra sentence


### Commitlog Configuration

By default M3DB runs with an asynchronous commitlog such that writes will be reported as successful by the client, though the data may not have been flushed to disk yet.
M3DB supports changing this default behavior to run the commitlog synchronously, but this is not currently exposed to users in the YAML configuration and generally leads to a massive performance degradation.
We recommend running M3DB with an asynchronous commitlog.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


This configuration states that the commitlog should be flushed whenever either of the following is true:

1. 524288 or more bytes have been written since the last time M3DB flushed the commitlog.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 524288 and 2097152?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its just the default we've always used. Presumably @robskillington did some benchmarking?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're 2^19 and 2^21, respectively, if that helps.


### Ignoring Corrupt Commitlogs on Bootstrap

As described in the "Tuning for Performance and Availability" section, we recommend configuring M3DB to ignore corrupt commitlog files on bootstrap. However, if you want to avoid any amount of inconsistency or data loss, no matter how minor, then you should configure M3DB to return unfulfilled when the commitlog bootstrapper encounters corrupt commitlog files. You can do so by modifying your configuration to look like this:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm do we need both this section and ### Ignoring Corrupt Commitlogs on Bootstrap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally following. I have the subheading twice, once under availability section and once under consistency

Copy link
Collaborator

@schallert schallert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome so far, LGTM once the other discussions are resolved.

@richardartoul richardartoul force-pushed the ra/availability-consistency-durability branch from 3e7121b to 351879b Compare March 25, 2019 19:03
Copy link
Collaborator

@mway mway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@richardartoul richardartoul merged commit d7d3559 into master Mar 26, 2019
@richardartoul richardartoul deleted the ra/availability-consistency-durability branch March 26, 2019 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:documentation All issues pertaining to usability and documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants