Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip defective remoteGeneration (https://issues.apache.org/jira/browse/CASSANDRA-10969) #5164

Closed
tomer-sandler opened this issue Oct 10, 2019 · 1 comment

Comments

@tomer-sandler
Copy link
Contributor

@tomer-sandler tomer-sandler commented Oct 10, 2019

Installation details
Scylla version (or git commit hash): 2.0.2-0.20171130.07b039f-0ubuntu1~trusty
Cluster size: N/A
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu 14.04

Description from Customer:
last weekend we identified a node down in our older activities cluster. When we attempted to start the node it failed with log spam as such

Sep 28 19:12:24 scylla-activities-prd-1-1 scylla:  [shard 0] gossip - received an invalid gossip generation for peer 10.10.1.38; local generation = 1529611572, received generation = 1569697739

Which seems related to this bug report against Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-10969.
This required a full cluster restart to circumvent.

I know that this cluster is on an older version of Scylla (2.0.2-0.20171130.07b039f-0ubuntu1~trusty) but wanted to verify this issue was fixed in recent versions of scylla because it’d be pretty bad to have to restart some of our more critical clusters if we ran into this bug.

More details:
The issue is that it’s rejecting the gossip generation of the restarted node - this is generally before Cassandra(or in this case Scylla) begin to even discuss their relative Schemas.
anyways I'm fairly confident scylla is still affected by the issue, comparing the code in scylla OSS master vs the patch-set to cassandra;
https://github.com/scylladb/scylla/blob/master/gms/gossiper.cc#L489
jkni/cassandra@8ebb314
You can see that it still has the defective remoteGeneration > localGeneration + MAX_GENERATION_DIFFERENCE check, that was since changed to a remoteGeneration > localTime + MAX_GENERATION_DIFFERENCE - that being said, it makes sense that this issue would continue to persist given, if the cluster is untouched for a year (nodes don’t join or leave - and the schema isn’t changed) - then gossip generation does not really need to be updated (from my understanding of this mechanism...).
So, the node leaves the cluster, the remote generation is then updated with the current timestamp, the node then tries to join the cluster again, and will reject that join - as the remote generation leaped too far forward. the patch to c* simply asks if the remote generation is within some sane value of “look at my local time - is the remote generation some point in the very distant future such that a bit coulda flipped?”
unless scylla resets its local gossip generation to the current time when a node reboots - but i actually don’t think that’s a sane thing to do , so i wouldn’t expect scylla to do that haha.

@asias confirmed we do not have the fix for https://issues.apache.org/jira/browse/CASSANDRA-10969. and asked to open the ticket.

asias added a commit to asias/scylla that referenced this issue Oct 21, 2019
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes scylladb#5164
avikivity pushed a commit that referenced this issue Nov 20, 2019
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164

(cherry picked from commit 0a52ecb)
avikivity pushed a commit that referenced this issue Nov 20, 2019
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164

(cherry picked from commit 0a52ecb)
@avikivity
Copy link
Member

@avikivity avikivity commented Nov 20, 2019

Backported to 3.0, 3.1; 3.2 already contains the commit.

asias added a commit to asias/scylla that referenced this issue Mar 26, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes scylladb#5164
tgrabiec pushed a commit that referenced this issue Mar 27, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes #5164
tgrabiec pushed a commit that referenced this issue Mar 27, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529)
tgrabiec pushed a commit that referenced this issue Mar 27, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529)
tgrabiec pushed a commit that referenced this issue Mar 27, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529)

[tgrabiec: resolved major conflicts in config.hh]
tgrabiec pushed a commit that referenced this issue Mar 30, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529)
avikivity pushed a commit to avikivity/scylla that referenced this issue Apr 28, 2020
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb, the option is no logger
needed.

Fixes scylladb#5164

(cherry picked from commit 743b529)

[tgrabiec: resolved major conflicts in config.hh]

(cherry picked from commit 9b46b9f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants