Make table.filter.update atomic #3992

AtnNn · 2015-03-30T23:31:30Z

Queries such as r.table(t).filter(f).update(u) seem atomic because there is no error when the non_atomic flag is not set. However it is possible for u to be applied when f doesn't hold (see rethinkdb/docs#679 (comment))

A possible work-around is r.table(t).filter(f).update((row) -> r.branch(f(row), u(row), {}))

This is a proposal for adding support in the server for making entire queries or subqueries atomic, by either guaranteeing to the user that queries like this one are atomic or letting the user specify that they should be without needing to duplicate any logic into the update function.

The text was updated successfully, but these errors were encountered:

mlucy · 2015-03-30T23:44:49Z

I think we should rewrite r.table(t).filter(f).update(u) to r.table(t).update(r.branch(f, u, nil)). It will be faster and will provide an atomicity guarantee.

mlucy · 2015-03-30T23:46:01Z

Er, ignore that comment, I forgot for a second that we don't have range writes even for whole tables. I think we should rewrite it to r.table(t).filter(f).update(r.branch(f, u, nil)), which will be slightly slower than r.table(t).filter(f).update(u) but will provide the atomicity guarantee.

AtnNn · 2015-03-31T05:48:13Z

@mlucy could we also provide or do we already have an atomicity guarantee for getAll.update, between.update, has_fields.update, get_intersecting.update, filter.replace, filter.delete, other similar commands, and combinations thereof?

mlucy · 2015-03-31T21:53:17Z

has_fields is rewritten to a filter, get_all is rewritten to a between, and replace/update/delete are the same operation underneath. So we'd have to implement it twice; once for ranges and once for predicates. I'm not sure about get_intersecting, that might be a third implementation.

mlucy · 2015-05-20T02:09:19Z

Moving this to 2.1 after talking with @coffeemug about it. He pointed out that this could be perceived as "compare and set" failing, which would be bad to still have in the upcoming release. (@danielmewes, does that seem fine?)

timmaxw · 2015-05-20T02:16:25Z

One disadvantage of this proposal is that it's less obvious which operations are atomic. Right now, there's a simple rule: .update() and .replace() without non_atomic=True are atomic, and nothing else is.

coffeemug · 2015-05-20T02:58:36Z

One disadvantage of this proposal is that it's less obvious which operations are atomic.

I think the opposite is true. If I run r.table('foo').filter({x: 1}).update({x: r.row('x').add(1)}), RethinkDB doesn't give me an error. So according to our rule, I assume the query is executed atomically (which it isn't!) So we should either error and force the user to decide if they want to pass non_atomic=True, or change the implementation to actually operate atomically.

I think the latter option (changing the implementation in these cases) is much better, because if we error there will be a huge class of queries that will break (and passing non_atomic everywhere or rewriting them will be very annoying for users).

danielmewes · 2015-05-20T03:06:20Z

I think this proposal is still a bit half baked.

If we make filter.update atomic on a per-document basis, we should do the same for getAll.update, and getIntersecting.update.

We cannot do it for all commands that return a selection unfortunately, since for example for orderBy.limit determining whether a document matches the predicate depends on other documents.

The filter case is by far the most common, but I agree with @timmaxw that if we special case only filter, that will make our guarantees more confusing.

danielmewes · 2015-05-20T03:08:35Z

So basically I think we should have a pretty clear rule when we provide this guarantee and when we do not.

An alternative might be to document the fact that filter doesn't guarantee atomicity in the documentation.

danielmewes · 2015-05-20T03:11:23Z

Maybe we could add a new atomic_selection ReQL type?

We could even error when applying update to a non-atomic selection unless a special flag is set, though that might be annoying.

danielmewes · 2015-05-20T03:34:11Z

Another option could be to not doing this rewrite but to instead generate an error whenever a non-delete write is applied to a selection other than a singleSelection.
We would require the non_atomic flag (or a similar one) for such operations. Again I think this could become quite annoying though.

(Edit: @coffeemug had already mentioned this, and I agree that it's not a great solution.)

timmaxw · 2015-05-20T03:48:10Z

If I run r.table('foo').filter({x: 1}).update({x: r.row('x').add(1)}), RethinkDB doesn't give me an error. So according to our rule, I assume the query is executed atomically (which it isn't!)

The query isn't executed atomically, the update is executed atomically. The conjunction of the update and filter is not executed atomically. I think that if we find the right way to phrase this, we could make it super simple for users to understand. Maybe we could say "the only thing that's executed atomically is the function you pass to a .update() call", or something.

The problem with the alternative is that there isn't a clear rule. Suppose we make .filter().update(), .getAll().update(), and .getIntersecting().update() atomic. So what about .orderBy().limit().update() or .nth().update()? Or .filter(r.row('x').eq(subquery)).update()? It feels like if we add special support to make .filter().update() and similar queries atomic, we're just putting off the problem.

coffeemug · 2015-05-20T03:59:40Z

Ok, thinking about this a bit more, this looks pretty subtle to sneak into 2.1. I wouldn't be opposed to moving it into subsequent (and in fact, think we should do that), and fixing it later after we think it through more carefully. We should probably document this better -- I think that can go a long way.

danielmewes · 2015-05-20T04:10:45Z

👍 on documenting this better and coming up with a consistent solution later. I'll open separate documentation issues tomorrow.

mlucy · 2015-05-20T05:16:17Z

I would rather we do this in 2.1 if we could. I think the current behavior is both bad and unintuitive.

IMO we should make it so every term that can easily be made atomic is. Even if we leave the docs the way they are so that the official position is that .filter.update isn't guaranteed to be atomic, it will still eliminate 99% of cases where people don't realize something's non-atomic and screw themselves as a result. This is doubly true because writing .filter.update doesn't produce an error; it's just silently non-atomic and then your app breaks in production.

I think it's OK to officially not guarantee something, but to make the database as safe as possible by default for people who don't think to read the official guarantees. (Think e.g. how GCC turns off strict aliasing by default.)

I'd be happy to put off arguing about whether or not to guarantee certain things are atomic, and whether or not to introduce an atomic selection type, until 2.2, but I really don't see a reason not to make the default behavior safe for all selections where it's easy.

coffeemug · 2015-05-20T05:58:30Z

I really don't see a reason not to make the default behavior safe for all selections where it's easy.

👍 for fixing places where it's easy for now, and adding atomic_selection (or something similar) once we have the time to properly debate it. (Also, if/when we do add atomic_selection I'd make update error on selections that aren't atomic, unless the user passes non_atomic=True)

danielmewes · 2015-05-20T17:25:53Z

I think it's OK to officially not guarantee something, but to make the database as safe as possible by default for people who don't think to read the official guarantees.

I'm not a fan of this since it seems like it will make it harder for users to find out what the actual guarantees are during their testing / evaluation phase, but if you think strongly that we should do this I will not veto it.

We should make sure that our additional guarantee isn't leaked though.
Specifically consider queries like these:

table.filter(r.table("other").get(r.row("reference"))).update({counter: r.row(counter).add(1)})

The filter in this is not atomic, but the update (incrementing the counter is). This is a totally legitimate and probably not uncommon query.

If we now apply the proposed rewrite rule, this query will fail because the update function is no longer atomic. That would leak the fact that we're doing something magical underneath and the error would probably be very confusing.
So we should only apply the rewrite if we can proof the filter function deterministic.

This by the way is also a general problem with the proposal. Since an update function can either be fully deterministic or fully non deterministic, we would need a way to disable this kind of rewriting. Otherwise a large set of legitimate queries will become impossible to run.
Consider the example query with the counter increment again: The rewrite would make the update function non-deterministic, so the user would need to pass in nonAtomic: true. However that would now also make the counter increment non-atomic, which in some applications can be dramatically worse than what the query currently does without the rewrite.

thelinuxlich · 2015-05-20T19:44:24Z

I think that you should provide atomicity guarantees wherever possible, even if the API is not 100% consistent, just document the use cases that are atomic, users that care about this will understand it is a work in progress. I for one would benefit a lot from this and documentation listing what can be atomic and what can't.

mlucy · 2015-05-21T04:47:27Z

I'm not a fan of this since it seems like it will make it harder for users to find out what the actual guarantees are during their testing / evaluation phase...

I usually worry about that a lot as well, but in this case I think it doesn't apply because most people writing .filter.update won't encounter nonatomic behavior during testing unless they simulate lots of concurrent operations (which most people don't). I think that making more things atomic by default will actually reduce the number of breakages when going from testing to production.

I think we should leave questions about what to do with .filter(NON_ATOMIC).update(ATOMIC) for the later discussion and just make .filter(ATOMIC).update(ATOMIC) (and others like it) atomic for 2.1.

...but if you think strongly that we should do this I will not veto it.

I don't feel strongly enough that this is the right thing to do that I want to push ahead unilaterally. @coffeemug, what do you think?

mlucy · 2015-05-21T04:47:56Z

(Moving back to 2.1 since it looks like we might do some work on this for that release and I don't want the issue to get lost.)

coffeemug · 2015-05-21T06:15:49Z

I don't have a strong opinion. I can see both points of view, and they both make sense to me. I wouldn't be opposed to making a subset of our queries behave under tighter semantics under the hood for now, and I wouldn't be opposed to holding off until we can get a more robust solution in place.

Sorry, I'm not a good tie-breaker here. I suppose the fact that I don't feel strongly that we should tighten up the code now means I'm in favor of doing nothing (but I'm not opposed to tightening up the code either).

danielmewes · 2015-06-08T20:09:21Z

I still don't like even just doing .filter(ATOMIC).update(ATOMIC).

If we document this, it will become extremely dangerous for users since once they use anything that's non-atomic in the filter (currently using a geo predicate is enough), the atomicity guarantee will silently disappear.

If on the other hand we don't document it and don't tell anyone about it, I think it would be better to just explain properly which exact operations are atomic and which are not. I think we can do a lot here with improving documentation. We would mention it in the filter API documentation, in the replace and update API documentation and in the FAQ as well as wherever else we mention atomicity. Basically whenever we say "single document updates are atomic", we would instead say something along the lines of "single document updates can be made atomic by using an update function (see ... for details)".

mlucy · 2015-08-04T05:04:49Z

Bumping this to 2.2 since it was apparently never settled, we should try to settle it during the next discussion period.

zappjones · 2015-12-18T02:32:56Z

Hi all -

I just saw a link in the consistency documentation (https://www.rethinkdb.com/docs/consistency/) pointing to this thread and was wondering if there was any new information regarding making the filter.update process atomic?

Thanks!

mlucy · 2015-12-18T08:16:03Z

@danielmewes -- I made this mistake myself just recently when answering a stackoverflow question. I really think we should try to do something better here.

AtnNn · 2016-02-12T20:29:20Z

I just ran into this again. Could we change non_atomic: true to atomic: false and then have atomic: 'document_update' which would have the current behaviour and atomic: 'document_query'which would fail for all queries that touch anything other than the document id?

danielmewes · 2016-02-15T19:35:09Z

@AtnNn I like that proposal.
We do need to figure out what document_query means exactly though (does it only work for filter, or also for predicates such as between, getAll or getIntersecting?).

We also need to be somewhat careful in how we name and describe this, since a document_query query isn't actually atomic at all.
If RethinkDB crashes in the middle of a multi-document update for example, it will have updated some documents but not others.
If you compare the set of documents that are updated by such a query and the set of documents in the table that fulfil the filter predicate at any given time, it's also possible that the updated set is not equal to any of the matching sets that existed in the table during the query execution (it will usually be a subset of some matching set, but not always I think).

We need to make it clear that there's still not going to be any atomicity across documents, and that the only thing it does is that it turns the filter predicate into a CAS update.

AtnNn added the tp:ReQL_proposal label Mar 30, 2015

mlucy added this to the 2.1 milestone Mar 30, 2015

danielmewes modified the milestones: 2.1-polish, 2.1 Mar 31, 2015

danielmewes modified the milestones: 2.1-polish, subsequent Apr 20, 2015

mlucy mentioned this issue Apr 27, 2015

Concurrency problem #4129

Closed

mlucy modified the milestones: 2.1, subsequent May 20, 2015

danielmewes modified the milestones: 2.2, 2.1 May 20, 2015

mlucy modified the milestones: 2.1, 2.2 May 21, 2015

timmaxw mentioned this issue Jun 9, 2015

Separate document about consistency and persistence guarantees rethinkdb/docs#770

Closed

danielmewes modified the milestones: 2.1-polish, 2.1 Jul 15, 2015

mlucy modified the milestones: 2.2, 2.1-polish Aug 4, 2015

danielmewes modified the milestones: 2.2, 2.2-polish, subsequent Aug 27, 2015

ha1331 mentioned this issue Jul 14, 2016

improve documentation of update() to mention the fact that filter is not atomic #5951

Closed

danielmewes mentioned this issue Jul 18, 2016

improve documentation of update() to mention the fact that filter is not atomic rethinkdb/docs#1159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make table.filter.update atomic #3992

Make table.filter.update atomic #3992

AtnNn commented Mar 30, 2015

mlucy commented Mar 30, 2015

mlucy commented Mar 30, 2015

AtnNn commented Mar 31, 2015

mlucy commented Mar 31, 2015

mlucy commented May 20, 2015

timmaxw commented May 20, 2015

coffeemug commented May 20, 2015

danielmewes commented May 20, 2015

danielmewes commented May 20, 2015

danielmewes commented May 20, 2015

danielmewes commented May 20, 2015

timmaxw commented May 20, 2015

coffeemug commented May 20, 2015

danielmewes commented May 20, 2015

mlucy commented May 20, 2015

coffeemug commented May 20, 2015

danielmewes commented May 20, 2015

thelinuxlich commented May 20, 2015

mlucy commented May 21, 2015

mlucy commented May 21, 2015

coffeemug commented May 21, 2015

danielmewes commented Jun 8, 2015

mlucy commented Aug 4, 2015

zappjones commented Dec 18, 2015

mlucy commented Dec 18, 2015

AtnNn commented Feb 12, 2016

danielmewes commented Feb 15, 2016

Make table.filter.update atomic #3992

Make table.filter.update atomic #3992

Comments

AtnNn commented Mar 30, 2015

mlucy commented Mar 30, 2015

mlucy commented Mar 30, 2015

AtnNn commented Mar 31, 2015

mlucy commented Mar 31, 2015

mlucy commented May 20, 2015

timmaxw commented May 20, 2015

coffeemug commented May 20, 2015

danielmewes commented May 20, 2015

danielmewes commented May 20, 2015

danielmewes commented May 20, 2015

danielmewes commented May 20, 2015

timmaxw commented May 20, 2015

coffeemug commented May 20, 2015

danielmewes commented May 20, 2015

mlucy commented May 20, 2015

coffeemug commented May 20, 2015

danielmewes commented May 20, 2015

thelinuxlich commented May 20, 2015

mlucy commented May 21, 2015

mlucy commented May 21, 2015

coffeemug commented May 21, 2015

danielmewes commented Jun 8, 2015

mlucy commented Aug 4, 2015

zappjones commented Dec 18, 2015

mlucy commented Dec 18, 2015

AtnNn commented Feb 12, 2016

danielmewes commented Feb 15, 2016