Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make table.filter.update atomic #3992

Open
AtnNn opened this issue Mar 30, 2015 · 27 comments
Open

Make table.filter.update atomic #3992

AtnNn opened this issue Mar 30, 2015 · 27 comments

Comments

@AtnNn
Copy link
Member

AtnNn commented Mar 30, 2015

Queries such as r.table(t).filter(f).update(u) seem atomic because there is no error when the non_atomic flag is not set. However it is possible for u to be applied when f doesn't hold (see rethinkdb/docs#679 (comment))

A possible work-around is r.table(t).filter(f).update((row) -> r.branch(f(row), u(row), {}))

This is a proposal for adding support in the server for making entire queries or subqueries atomic, by either guaranteeing to the user that queries like this one are atomic or letting the user specify that they should be without needing to duplicate any logic into the update function.

@mlucy mlucy added this to the 2.1 milestone Mar 30, 2015
@mlucy
Copy link
Member

mlucy commented Mar 30, 2015

I think we should rewrite r.table(t).filter(f).update(u) to r.table(t).update(r.branch(f, u, nil)). It will be faster and will provide an atomicity guarantee.

@mlucy
Copy link
Member

mlucy commented Mar 30, 2015

Er, ignore that comment, I forgot for a second that we don't have range writes even for whole tables. I think we should rewrite it to r.table(t).filter(f).update(r.branch(f, u, nil)), which will be slightly slower than r.table(t).filter(f).update(u) but will provide the atomicity guarantee.

@AtnNn
Copy link
Member Author

AtnNn commented Mar 31, 2015

@mlucy could we also provide or do we already have an atomicity guarantee for getAll.update, between.update, has_fields.update, get_intersecting.update, filter.replace, filter.delete, other similar commands, and combinations thereof?

@danielmewes danielmewes modified the milestones: 2.1-polish, 2.1 Mar 31, 2015
@mlucy
Copy link
Member

mlucy commented Mar 31, 2015

has_fields is rewritten to a filter, get_all is rewritten to a between, and replace/update/delete are the same operation underneath. So we'd have to implement it twice; once for ranges and once for predicates. I'm not sure about get_intersecting, that might be a third implementation.

@danielmewes danielmewes modified the milestones: 2.1-polish, subsequent Apr 20, 2015
@mlucy mlucy modified the milestones: 2.1, subsequent May 20, 2015
@mlucy
Copy link
Member

mlucy commented May 20, 2015

Moving this to 2.1 after talking with @coffeemug about it. He pointed out that this could be perceived as "compare and set" failing, which would be bad to still have in the upcoming release. (@danielmewes, does that seem fine?)

@timmaxw
Copy link
Member

timmaxw commented May 20, 2015

One disadvantage of this proposal is that it's less obvious which operations are atomic. Right now, there's a simple rule: .update() and .replace() without non_atomic=True are atomic, and nothing else is.

@coffeemug
Copy link
Contributor

One disadvantage of this proposal is that it's less obvious which operations are atomic.

I think the opposite is true. If I run r.table('foo').filter({x: 1}).update({x: r.row('x').add(1)}), RethinkDB doesn't give me an error. So according to our rule, I assume the query is executed atomically (which it isn't!) So we should either error and force the user to decide if they want to pass non_atomic=True, or change the implementation to actually operate atomically.

I think the latter option (changing the implementation in these cases) is much better, because if we error there will be a huge class of queries that will break (and passing non_atomic everywhere or rewriting them will be very annoying for users).

@danielmewes
Copy link
Member

I think this proposal is still a bit half baked.

If we make filter.update atomic on a per-document basis, we should do the same for getAll.update, and getIntersecting.update.

We cannot do it for all commands that return a selection unfortunately, since for example for orderBy.limit determining whether a document matches the predicate depends on other documents.

The filter case is by far the most common, but I agree with @timmaxw that if we special case only filter, that will make our guarantees more confusing.

@danielmewes
Copy link
Member

So basically I think we should have a pretty clear rule when we provide this guarantee and when we do not.

An alternative might be to document the fact that filter doesn't guarantee atomicity in the documentation.

@danielmewes
Copy link
Member

Maybe we could add a new atomic_selection ReQL type?

We could even error when applying update to a non-atomic selection unless a special flag is set, though that might be annoying.

@danielmewes
Copy link
Member

Another option could be to not doing this rewrite but to instead generate an error whenever a non-delete write is applied to a selection other than a singleSelection.
We would require the non_atomic flag (or a similar one) for such operations. Again I think this could become quite annoying though.

(Edit: @coffeemug had already mentioned this, and I agree that it's not a great solution.)

@timmaxw
Copy link
Member

timmaxw commented May 20, 2015

If I run r.table('foo').filter({x: 1}).update({x: r.row('x').add(1)}), RethinkDB doesn't give me an error. So according to our rule, I assume the query is executed atomically (which it isn't!)

The query isn't executed atomically, the update is executed atomically. The conjunction of the update and filter is not executed atomically. I think that if we find the right way to phrase this, we could make it super simple for users to understand. Maybe we could say "the only thing that's executed atomically is the function you pass to a .update() call", or something.

The problem with the alternative is that there isn't a clear rule. Suppose we make .filter().update(), .getAll().update(), and .getIntersecting().update() atomic. So what about .orderBy().limit().update() or .nth().update()? Or .filter(r.row('x').eq(subquery)).update()? It feels like if we add special support to make .filter().update() and similar queries atomic, we're just putting off the problem.

@coffeemug
Copy link
Contributor

Ok, thinking about this a bit more, this looks pretty subtle to sneak into 2.1. I wouldn't be opposed to moving it into subsequent (and in fact, think we should do that), and fixing it later after we think it through more carefully. We should probably document this better -- I think that can go a long way.

@danielmewes
Copy link
Member

👍 on documenting this better and coming up with a consistent solution later. I'll open separate documentation issues tomorrow.

@danielmewes danielmewes modified the milestones: 2.2, 2.1 May 20, 2015
@mlucy
Copy link
Member

mlucy commented May 20, 2015

I would rather we do this in 2.1 if we could. I think the current behavior is both bad and unintuitive.

IMO we should make it so every term that can easily be made atomic is. Even if we leave the docs the way they are so that the official position is that .filter.update isn't guaranteed to be atomic, it will still eliminate 99% of cases where people don't realize something's non-atomic and screw themselves as a result. This is doubly true because writing .filter.update doesn't produce an error; it's just silently non-atomic and then your app breaks in production.

I think it's OK to officially not guarantee something, but to make the database as safe as possible by default for people who don't think to read the official guarantees. (Think e.g. how GCC turns off strict aliasing by default.)

I'd be happy to put off arguing about whether or not to guarantee certain things are atomic, and whether or not to introduce an atomic selection type, until 2.2, but I really don't see a reason not to make the default behavior safe for all selections where it's easy.

@coffeemug
Copy link
Contributor

I really don't see a reason not to make the default behavior safe for all selections where it's easy.

👍 for fixing places where it's easy for now, and adding atomic_selection (or something similar) once we have the time to properly debate it. (Also, if/when we do add atomic_selection I'd make update error on selections that aren't atomic, unless the user passes non_atomic=True)

@danielmewes
Copy link
Member

I think it's OK to officially not guarantee something, but to make the database as safe as possible by default for people who don't think to read the official guarantees.

I'm not a fan of this since it seems like it will make it harder for users to find out what the actual guarantees are during their testing / evaluation phase, but if you think strongly that we should do this I will not veto it.

We should make sure that our additional guarantee isn't leaked though.
Specifically consider queries like these:

table.filter(r.table("other").get(r.row("reference"))).update({counter: r.row(counter).add(1)})

The filter in this is not atomic, but the update (incrementing the counter is). This is a totally legitimate and probably not uncommon query.

If we now apply the proposed rewrite rule, this query will fail because the update function is no longer atomic. That would leak the fact that we're doing something magical underneath and the error would probably be very confusing.
So we should only apply the rewrite if we can proof the filter function deterministic.

This by the way is also a general problem with the proposal. Since an update function can either be fully deterministic or fully non deterministic, we would need a way to disable this kind of rewriting. Otherwise a large set of legitimate queries will become impossible to run.
Consider the example query with the counter increment again: The rewrite would make the update function non-deterministic, so the user would need to pass in nonAtomic: true. However that would now also make the counter increment non-atomic, which in some applications can be dramatically worse than what the query currently does without the rewrite.

@thelinuxlich
Copy link

I think that you should provide atomicity guarantees wherever possible, even if the API is not 100% consistent, just document the use cases that are atomic, users that care about this will understand it is a work in progress. I for one would benefit a lot from this and documentation listing what can be atomic and what can't.

@mlucy
Copy link
Member

mlucy commented May 21, 2015

I'm not a fan of this since it seems like it will make it harder for users to find out what the actual guarantees are during their testing / evaluation phase...

I usually worry about that a lot as well, but in this case I think it doesn't apply because most people writing .filter.update won't encounter nonatomic behavior during testing unless they simulate lots of concurrent operations (which most people don't). I think that making more things atomic by default will actually reduce the number of breakages when going from testing to production.

I think we should leave questions about what to do with .filter(NON_ATOMIC).update(ATOMIC) for the later discussion and just make .filter(ATOMIC).update(ATOMIC) (and others like it) atomic for 2.1.

...but if you think strongly that we should do this I will not veto it.

I don't feel strongly enough that this is the right thing to do that I want to push ahead unilaterally. @coffeemug, what do you think?

@mlucy mlucy modified the milestones: 2.1, 2.2 May 21, 2015
@mlucy
Copy link
Member

mlucy commented May 21, 2015

(Moving back to 2.1 since it looks like we might do some work on this for that release and I don't want the issue to get lost.)

@coffeemug
Copy link
Contributor

I don't have a strong opinion. I can see both points of view, and they both make sense to me. I wouldn't be opposed to making a subset of our queries behave under tighter semantics under the hood for now, and I wouldn't be opposed to holding off until we can get a more robust solution in place.

Sorry, I'm not a good tie-breaker here. I suppose the fact that I don't feel strongly that we should tighten up the code now means I'm in favor of doing nothing (but I'm not opposed to tightening up the code either).

@danielmewes
Copy link
Member

I still don't like even just doing .filter(ATOMIC).update(ATOMIC).

If we document this, it will become extremely dangerous for users since once they use anything that's non-atomic in the filter (currently using a geo predicate is enough), the atomicity guarantee will silently disappear.

If on the other hand we don't document it and don't tell anyone about it, I think it would be better to just explain properly which exact operations are atomic and which are not. I think we can do a lot here with improving documentation. We would mention it in the filter API documentation, in the replace and update API documentation and in the FAQ as well as wherever else we mention atomicity. Basically whenever we say "single document updates are atomic", we would instead say something along the lines of "single document updates can be made atomic by using an update function (see ... for details)".

@danielmewes danielmewes modified the milestones: 2.1-polish, 2.1 Jul 15, 2015
@mlucy mlucy modified the milestones: 2.2, 2.1-polish Aug 4, 2015
@mlucy
Copy link
Member

mlucy commented Aug 4, 2015

Bumping this to 2.2 since it was apparently never settled, we should try to settle it during the next discussion period.

@danielmewes danielmewes modified the milestones: 2.2, 2.2-polish, subsequent Aug 27, 2015
@zappjones
Copy link

Hi all -

I just saw a link in the consistency documentation (https://www.rethinkdb.com/docs/consistency/) pointing to this thread and was wondering if there was any new information regarding making the filter.update process atomic?

Thanks!

@mlucy
Copy link
Member

mlucy commented Dec 18, 2015

@danielmewes -- I made this mistake myself just recently when answering a stackoverflow question. I really think we should try to do something better here.

@AtnNn
Copy link
Member Author

AtnNn commented Feb 12, 2016

I just ran into this again. Could we change non_atomic: true to atomic: false and then have atomic: 'document_update' which would have the current behaviour and atomic: 'document_query'which would fail for all queries that touch anything other than the document id?

@danielmewes
Copy link
Member

@AtnNn I like that proposal.
We do need to figure out what document_query means exactly though (does it only work for filter, or also for predicates such as between, getAll or getIntersecting?).

We also need to be somewhat careful in how we name and describe this, since a document_query query isn't actually atomic at all.
If RethinkDB crashes in the middle of a multi-document update for example, it will have updated some documents but not others.
If you compare the set of documents that are updated by such a query and the set of documents in the table that fulfil the filter predicate at any given time, it's also possible that the updated set is not equal to any of the matching sets that existed in the table during the query execution (it will usually be a subset of some matching set, but not always I think).

We need to make it clear that there's still not going to be any atomicity across documents, and that the only thing it does is that it turns the filter predicate into a CAS update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants