Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cluster health check to be optional #35

Closed
lukaszgielecinski opened this issue Aug 23, 2016 · 6 comments
Closed

Allow cluster health check to be optional #35

lukaszgielecinski opened this issue Aug 23, 2016 · 6 comments

Comments

@lukaszgielecinski
Copy link

At the moment cqlmigrate requires that all nodes in all datacentres are up before it will run. We would like to make this check optional so that it will still run if a node is down so that the migration will be attempted with the supplied consistency level - which we will plan to set to LOCAL_QUORUM for read and EACH_QUORUM for write for our migrations.

@adamdougal
Copy link
Contributor

Can this now be closed? Did #37 fix this?

@sebbonnet
Copy link

sebbonnet commented Oct 27, 2016

I don't think fix #37 addresses this issue when schema upgrades are required.

When a node is down, schema ugprades can still be performed although cassandra will issue a warning stating that a "schema version mismatch was detected". When the node comes back up the schema upgrade is applied to the node and the schema version is synced.
Here is the message I got when updating the schema with 1 node down, replication factor of 3 in a 4-node cluster. I wonder if doing the same via the cqldriver would throw a specific exception for this

cirrus_admin@cqlsh:platform_tests> alter table mycqlmigratetable add colc text;

Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers.
OperationTimedOut: errors={}, last_host=10.50.98.145

New schema version applied to the 3 nodes

ubuntu@ip-10-50-99-79:~$ nodetool describecluster
Cluster Information:
    Name: sandbox
    Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        1ef20cc1-bed4-32d3-a323-36b4b0a99e3e: [10.50.100.67, 10.50.98.145, 10.50.99.79]

        UNREACHABLE: [10.50.102.20]

Schema version synced after restarting the down node:

ubuntu@ip-10-50-99-79:~$ nodetool describecluster
Cluster Information:
    Name: sandbox
    Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        1ef20cc1-bed4-32d3-a323-36b4b0a99e3e: [10.50.100.67, 10.50.98.145, 10.50.102.20, 10.50.99.79]

@adamdougal
Copy link
Contributor

@sebbonnet - in a previous comment @chbatey said "I am hesitant to allow schema changes when nodes are down. It really isn't a good idea and it'll allow people to shoot them selves in the foot."

PR #37 means we now don't check cluster health if there are no schema changes, which is a definite improvement.

Unless we disagree with @chbatey's comment, then presumably any further changes towards this issue are inadvisable?

@sebbonnet
Copy link

@adamdougal sounds like we need to discuss this further with @chbatey and find out its reasons for being "hesitant". I searched on the web but could not find anything that would prove or disprove it's ok, so I've asked the question on the IRC channel. Granted the test I've done above is not realistic, so I'd like to try other on a cluster that's under load and bring up a node that is on a previous schema version.
#37 is definitely an improvement at it means most app deployments / restart will not be impacted by a node down.

@lukaszgielecinski
Copy link
Author

From our perspective it can be closed, we approached the problem with a different solution.

@sebbonnet
Copy link

I've asked the question a couple of times on IRC, but had no response... I've discussed this with @chbatey who said used to be quite a few bugs reported / unreported on this area where the node schemas would not be in agreement, so if we were to try we would need make sure nodes can recover from schema disagreement too.
So doing schema upgrade with a node down is probably not worth the risk at this stage, especially given the fact that the majority of deployments will not need be impacted following #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants