Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow relaxing isolation guarantees on per-table basis #2379

Open
tgrabiec opened this issue May 12, 2017 · 16 comments
Open

Allow relaxing isolation guarantees on per-table basis #2379

tgrabiec opened this issue May 12, 2017 · 16 comments

Comments

@tgrabiec
Copy link
Contributor

Currently we aim at providing snapshot isolation for partition reads.

For very large partitions maintaining this is challenging and may impact performance and stability. See for example #1938.

There may be many workloads which have large partitions but do not need per-partition read isolation. Those would benefit from relaxed guarantees.

We could allow relaxing this to a row-level isolation using table property recorded in the schema.

@duarten
Copy link
Contributor

duarten commented May 12, 2017

Is there even row isolation? If two concurrent updates have the same timestamp, the values picked for the row cells could be a mix from the two updates.

@tgrabiec
Copy link
Contributor Author

@duarten That's one of the cases which may violate this. There is also the case of repair breaking isolation: http://stackoverflow.com/a/41683109/246755

We inherited this from Cassandra. It's been discussed, and I think the conclusion was that we're not confident enough to give up on this entirely, so we try to keep the isolation for cases when those known issues are not encountered.

@duarten
Copy link
Contributor

duarten commented May 12, 2017

Maybe we should revisit the discussion. I mean, we can't provide row isolation in the presence of concurrent updates, which is the exact scenario where users would potentially rely on the feature.

@tgrabiec
Copy link
Contributor Author

Note that the problem of providing read isolation exists even if there are no concurrent updates, but also when there are updates touching multiple rows, e.g. via batches. The reader could execute partly before the update and partly after the update. It could see partial update if it pauses between the updated rows.

As for the problem of timestamp aliasing, not all kinds of workload with concurrent updates could be able to trigger that problem. If the conflicting updates are relayed from a single client with client-side timestamps, conflict wouldn't happen. There could be a failover to another client after a timeout, but that timeout would offset timestamps significantly far in the future.

@duarten
Copy link
Contributor

duarten commented May 12, 2017

Dirty reads can only be solved if dirty writes are too (if a write operation runs concurrently with a logged batch statement, then it may well conflict with a row and cause a dirty write).

I also wonder what happens to batches in case of failures. If they are partially applied locally (let's say two statements update different tables), then when the node recovers it may allow dirty reads.

Here it says:

Although an atomic batch guarantees that if any part of the batch succeeds, all of it will, no other transactional enforcement is done at the batch level. For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. However, transactional row updates within a partition key are isolated: clients cannot read a partial update.

I really wonder if we should strive to ensure isolation when it can be broken by something as fundamental as read repair, like you pointed out. The guidance for these scenarios should be to use LWT.

Finally, note you don't even need concurrent updates: a write can complete with a timestamp in the past such that it conflicts with an already completed one.

@avikivity
Copy link
Member

My lawyers inform me that we can drop the guarantee for any query that has paging enabled.

@tgrabiec
Copy link
Contributor Author

@avikivity why is that? isolation doesn't have to hold across pages, but don't we have to ensure that each page sees a snapshot of partition?

@avikivity
Copy link
Member

It's legal for a page to be one row long, regardless of the page size requested. In fact we will return single row pages if the rows are large.

@tgrabiec
Copy link
Contributor Author

So your point here is that since the consumer doesn't know how large the page will be, he can't make any sensible use of the guarantee? There may be something to it.

@avikivity
Copy link
Member

Right. According to the lawyers, it's still possible that the consumer may be using their own driver, or talking the cql binary protocol directly, and thus treat pages specially. But realistically the consumer has the pages hidden by the driver, it's one long response for them. Each page boundary breaks isolation, but the page breaks themselves are hidden.

@slivne
Copy link
Contributor

slivne commented May 13, 2017 via email

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented May 16, 2017 via email

@avikivity
Copy link
Member

https://issues.apache.org/jira/browse/CASSANDRA-10701

Changes logged batches guarantees from "atomic" to "eventually". With this, there's no point in providing snapshot isolation.

@avikivity
Copy link
Member

But: it still has

# All updates in a @BATCH@ belonging to a given partition key are performed in isolation.

@tgrabiec
Copy link
Contributor Author

@avikivity I think this ticket only tries to reduce confusion caused by the docs by avoiding use of the word "atomic", which has an overloaded meaning in the context of "atomic batch", without changing any guarantees.

This [1] old blog post already tried to clarify that:

Note that we mean "atomic" in the database sense that if any part of the batch succeeds, all of it will. No other guarantees are implied; in particular, there is no isolation; other clients will be able to read the first updated rows from the batch, while others are in progress. However, updates within a single row are isolated.)

[1] http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2

@avikivity
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants