Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple nodes to cooperate #159

Open
6 tasks
nsoft opened this issue Jul 15, 2020 · 1 comment
Open
6 tasks

Allow multiple nodes to cooperate #159

nsoft opened this issue Jul 15, 2020 · 1 comment
Milestone

Comments

@nsoft
Copy link
Owner

nsoft commented Jul 15, 2020

This is a placeholder/parent ticket for the key feature of 2.0 when we get there. This will generally include:

  • Cluster formation so that nodes can access a unified cassandra cluster.
  • A means to pass documents among nodes (JavaSpace, Cassandra or otherwise)
  • A means for newly started nodes to detect existing nodes and join
  • A means for nodes to leave gracefully
  • Handling of ungraceful node loss
  • Loading and unloading of Plans without stoping the cluster.
@nsoft nsoft added this to the 2.0 milestone Jul 15, 2020
@nsoft
Copy link
Owner Author

nsoft commented Jul 15, 2020

Creating this ticket so I can note one difficulty we will face with Cassandra in the first of those. Here's a conversation from the ASF cassandra slack:

Ztyx 8:11 AM
Hello! We have an application that executed a CREATE TABLE IF NOT EXIST ... on boot. A couple of months ago we hit a node schema disagreement (and the table already existed) and our suspicion was that it had to do with that query. Anyone else hit this?

Jeff Jirsa 8:22 AM
Strictly not safe in current versions of cassandra to have multiple processes execute that command at the same time
8:23
It is, unfortunately, something that’s known, poorly documented, and has horrible horrible side effects, including potential data loss months later when you restart the instance
8:24
@ztyx if you must have the app make tables, use external locking - like zookeeper or something

gus 8:49 AM
@jeff Jirsa is this only a problem when the table didn't exist and 2 start up or is there a potential problem regardless of whether the table exists?
8:54
Is this it: https://issues.apache.org/jira/browse/CASSANDRA-15844 ?

ASF JIRA BridgeAPP 8:54 AM
CASSANDRA-15844: Create table Asynchronously or creating table contact the same node from many client threads at same time may causing data loss

Jeff Jirsa 9:32 AM
The failure modes I know about involve diverging cfid so id expect it to be mostly around create
9:33
Wouldn’t be surprised if alter statements also cause problems, but it’d be like migration task storms and GC pressure not data loss
9:34
15844 describes one shape of what I mentioned is possible yes
9:35
The race can result in like a dozen different states (different permutations of the race). One involves the cfid in schema table not matching the cfid in the table path on disk, that’s the one where If you bounce you end up losing that data because cassandra makes the “right” empty data directory on startup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant