active shards 1 crate #384

chicco785 · 2020-10-21T09:08:00Z

by enabling "write.wait_for_active_shards = 1" writes are possible also when table health is not 100%

c0c0n3 · 2020-10-27T18:35:36Z

As we discussed this morning, this could have the potential downside of data loss---which isn't a big deal in my opinion since losing a few data points in a series of thousands isn't a tragedy. So we're trading resiliency for better throughput---writes only wait for the primary shard, not for the replicas. Makes sense to me.

But going over the docs again, my understanding is that wait_for_active_shards = 1 could make this scenario more likely:

A node N1 holds a primary shard S with records r[1] to r[m + n].
Another node N2 holds S's replica shard, R, with records r[1] to r[m], i.e. n records haven't been replicated yet.
N1 goes down.
Crate won't promote N2 as primary since it knows R is stale w/r/t S.

The only way out of the impasse would be to manually force replica promotion.

I can't be 100% sure of this since docs are ambiguous to say the least. But perhaps we should test to see if this is a likely scenario, better to be safe than sorry kind of thing...

github-actions · 2020-11-20T10:07:41Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️

chicco785 · 2020-11-20T10:08:24Z

I have read the CLA Document and I hereby sign the CLA

chicco785 · 2020-11-20T10:08:45Z

recheckcla

…tive_shards = 1" writes are possible also when table health is not 100% * use variable to set values for wait_for_active_shards

amotl · 2020-11-20T13:37:22Z

Dear @chicco785 and @c0c0n3,

thanks for exposing the wait_for_active_shards setting to users of ngsi-timeseries-api. May I humbly suggest to update the wording from CRATE_WAIT_ACTIV_SHARDS to CRATE_WAIT_ACTIVE_SHARDS? There's an "e" missing.

Thanks in advance!

With kind regards,
Andreas.

chicco785 · 2020-11-20T14:08:57Z

thanks for exposing the wait_for_active_shards setting to users of ngsi-timeseries-api. May I humbly suggest to update the wording from CRATE_WAIT_ACTIV_SHARDS to CRATE_WAIT_ACTIVE_SHARDS? There's an "e" missing.

good catch!

c0c0n3 · 2020-11-20T18:09:52Z

@amotl as you're at it, you reckon the scenario detailed in the above comment is possible/likely? Or is it just me not understanding how Crate actually works under the bonnet? Thanks!! :-)

amotl · 2020-11-24T20:16:07Z

Dear @c0c0n3,

thanks for asking, we appreciate it. However, I have to admit I am personally not that much into the specific details about replication settings yet and want to humbly ask @seut or @mfussenegger to review this in order to clarify any ambiguities.

In general, I believe @chicco785 is right, using wait_for_active_shards = 1 instead of wait_for_active_shards = all will require only one shard copy to be active for write operations to proceed [1]. Otherwise, the [write] operation gets blocked and must wait and retry for up to 30s before eventually timing out. CrateDB introduced that option with version 2.0.1 [2].

Please also note that in order to improve the out of the box experience by allowing a subset of nodes to become unavailable without blocking write operations, CrateDB uses wait_for_active_shards = 1 as a default setting beginning with CrateDB 4.1.0 [3]. This decreases write resiliency in favor of write availability and has been changed through crate/crate#8876.

The problem we and our customers faced is that with the former default of wait_for_active_shards = all, if you restart a single node, it basically blocks writes until the whole cluster has recovered again. And we believed the extra safety of all is a bit of a bad trade-off for that, as with inserts blocking until a cluster has recovered (and eventually dropping if it takes too long) you are more likely to loose data from that.

The procedure to restart a cluster, for example in upgrade scenarios, also resonates with the cluster.graceful_stop.min_availability setting, which is defaulting to primaries [4]. While we state in our documentation that

By default, when the CrateDB process stops it simply shuts down, possibly making some shards unavailable which leads to a red cluster state and lets some queries fail that required the now unavailable shards. In order to safely shutdown a CrateDB node, the graceful stop procedure can be used.

apparently not many users are aware of that and will just restart the nodes without using the graceful stop procedure. This problem also manifests more within scenarios where CrateDB is running on Kubernetes clusters where users might just kill PODs when performing a rolling upgrade.

So, this topic is overall related to the balancing act of trading performance vs. safety vs. high availability.

Saying that, the premium option of cluster.graceful_stop.min_availability = full will always keep the database cluster in a green state, while increasing the probability of replica resynchronization, in turn increasing inter-node traffic. Also, when using the graceful stop procedure, a rolling upgrade will take a longer time because all active operations will be fulfilled before shutting down the node instance. This impact is obviously increasing with the amount of data which has to be shuffled around. However, this is the designated "paranoia" option and - as said - will always keep the whole cluster in a green state.

I hope @seut or @mfussenegger can elaborate on this topic a bit more and fill in some gaps I might have skipped or correct any false claims.

With kind regards,
Andreas.

[1] https://crate.io/docs/crate/reference/en/4.3/sql/statements/create-table.html#write-wait-for-active-shards
[2] https://crate.io/docs/crate/reference/en/4.3/appendices/release-notes/2.0.1.html#changes
[3] https://crate.io/docs/crate/reference/en/4.3/appendices/release-notes/4.1.0.html#others
[4] https://crate.io/docs/crate/reference/en/4.3/config/cluster.html#graceful-stop

c0c0n3 · 2020-11-25T09:59:30Z

@amotl, wow, thanks so much for the detailed answer!

wait_for_active_shards: resiliency vs throughput

wait_for_active_shards ... trade-off ... as with inserts blocking until a cluster has recovered (and eventually dropping if it takes too long) you are more likely to loose data from that.

Yep, that's pretty much what we experienced in our prod clusters. For what is worth, we also think it's a reasonable trade-off--see comments above.

@seut, @mfussenegger one thing we'd like to understand though is if wait_for_active_shards = 1 could result in a situation where Crate won't accept any further writes to a table and the only way out of that is to manually alter the table settings. Here's the scenario:

A node N1 holds a primary shard S with records r[1] to r[m + n].
Another node N2 holds S's replica shard, R, with records r[1] to r[m], i.e. n records haven't been replicated yet.
N1 goes down.
Crate won't promote N2 as primary since it knows R is stale w/r/t S.

The only way out of the impasse would be to manually force replica promotion.

Is this scenario possible or is it just me not understanding how Crate actually works under the bonnet? If the scenario is possible, then is wait_for_active_shards = 1 going to make it more likely if you have lots of nodes some of which crash regularly, e.g. AWS node going down? Also, in that case, is there any easy way to get notified of a table stuck in read-only mode?

The reason why I'm asking is that we'd like to keep wait_for_active_shards = 1 in prod, but if a table gets stuck we need a way to get to know about it as it happens since some of our tables are pretty busy and even a few hours of downtime could result in losing several thousand data points coming in from our client's devices.

Graceful shutdown

cluster.graceful_stop.min_availability ... apparently not many users are aware of that and will just restart the nodes without using the graceful stop procedure.

Count me in, I wasn't aware of that either. Thank you sooo much for bringing it up. I'd imagine this could also be a good mitigation procedure for minimising the likelihood a table transitioning to read-only---assuming the scenario I talked about earlier is actually possible. Either way, we'll definitely play around with this setting to make sure nodes shut down gracefully as much as feasible within the constraints of our deployment.

This problem also manifests more within scenarios where CrateDB is running on Kubernetes clusters where users might just kill PODs when performing a rolling upgrade.

Guilty as charged! This is exactly what we do. Your documentation is actually pretty clear about how to upgrade actually, it's just that we haven't come up with any easy way to automate that in a K8s deployment yet.

premium option of cluster.graceful_stop.min_availability = full will always keep the database cluster in a green state

I wish that came for free :-) Just joking, I appreciate you guys have to make a living too :-)

amotl · 2020-11-25T11:04:29Z

Dear @c0c0n3,

premium option of cluster.graceful_stop.min_availability = full will always keep the database cluster in a green state

I wish that came for free :-) Just joking, I appreciate you guys have to make a living too :-)

Sorry about my wording here. I meant it to be called "the most safe option". This feature is well provided by the community edition as well, AFAIK. Thanks for giving me the chance to clarify.

With kind regards,
Andreas.

c0c0n3 · 2020-11-25T12:03:24Z

@amotl

This feature is well provided by the community edition as well, AFAIK

That's great news, thanks for clarifying, we'll definitely look into that!

chicco785 force-pushed the active-shards-1-crate branch from bb92432 to 91cba09 Compare November 12, 2020 11:09

chicco785 force-pushed the active-shards-1-crate branch from 91cba09 to be10579 Compare November 19, 2020 17:03

chicco785 changed the title ~~WIP: active shards 1 crate~~ active shards 1 crate Nov 19, 2020

chicco785 force-pushed the active-shards-1-crate branch from be10579 to f433278 Compare November 20, 2020 09:31

github-actions bot added a commit that referenced this pull request Nov 20, 2020

@chicco785 has signed the CLA from Pull Request #384

ddf6943

chicco785 added 2 commits November 20, 2020 11:22

* enable write.wait_for_active_shards, by enabling "write.wait_for_ac…

5e233db

…tive_shards = 1" writes are possible also when table health is not 100% * use variable to set values for wait_for_active_shards

fix line line lenght

1197ec9

chicco785 force-pushed the active-shards-1-crate branch from 666428f to 1197ec9 Compare November 20, 2020 10:23

fix variable naming

8e86a3f

chicco785 requested a review from c0c0n3 November 23, 2020 08:23

c0c0n3 approved these changes Nov 24, 2020

View reviewed changes

amotl mentioned this pull request Nov 25, 2020

Rolling update strategy with graceful shutdown feature crate/crate-operator#127

Closed

amotl mentioned this pull request Nov 25, 2020

Mention rolling update strategy is not appropriate for Crate major version upgrade orchestracities/charts#81

Open

Merge branch 'master' into active-shards-1-crate

366e988

chicco785 merged commit 970898e into master Dec 1, 2020

chicco785 deleted the active-shards-1-crate branch December 1, 2020 22:53

github-actions bot locked and limited conversation to collaborators Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

active shards 1 crate #384

active shards 1 crate #384

chicco785 commented Oct 21, 2020

c0c0n3 commented Oct 27, 2020 •

edited

Loading

github-actions bot commented Nov 20, 2020 •

edited

Loading

chicco785 commented Nov 20, 2020

chicco785 commented Nov 20, 2020

amotl commented Nov 20, 2020

chicco785 commented Nov 20, 2020

c0c0n3 commented Nov 20, 2020

amotl commented Nov 24, 2020

c0c0n3 commented Nov 25, 2020

amotl commented Nov 25, 2020 •

edited

Loading

c0c0n3 commented Nov 25, 2020

active shards 1 crate #384

active shards 1 crate #384

Conversation

chicco785 commented Oct 21, 2020

c0c0n3 commented Oct 27, 2020 • edited Loading

github-actions bot commented Nov 20, 2020 • edited Loading

chicco785 commented Nov 20, 2020

chicco785 commented Nov 20, 2020

amotl commented Nov 20, 2020

chicco785 commented Nov 20, 2020

c0c0n3 commented Nov 20, 2020

amotl commented Nov 24, 2020

c0c0n3 commented Nov 25, 2020

wait_for_active_shards: resiliency vs throughput

Graceful shutdown

amotl commented Nov 25, 2020 • edited Loading

c0c0n3 commented Nov 25, 2020

c0c0n3 commented Oct 27, 2020 •

edited

Loading

github-actions bot commented Nov 20, 2020 •

edited

Loading

amotl commented Nov 25, 2020 •

edited

Loading