get_or_create should be atomic #478

Closed
mitar opened this Issue Apr 5, 2012 · 9 comments

Projects

None yet

3 participants

@mitar

Current implementation of get_or_create is not atomic. It should probably use upsert:

http://stackoverflow.com/questions/6703330/how-do-i-do-a-get-or-create-in-pymongo-python-mongodb

@rozza rozza referenced this issue in MongoEngine/mongoengine May 9, 2012
@rozza rozza Added note that get_or_create contains a race condition
Refs #478
eb54037
@rozza rozza referenced this issue in MongoEngine/mongoengine May 9, 2012
@rozza rozza Made note stronger re: race condition
Refs #478
5c4b33e
@rozza

It cant be atomic - because of the syntax and it determines if the document existed prior to it possibly being created.

Semantically, this is different to an upsert, you might create a new instance but not save it disk because this is part of a bigger operation.

However, this does create a race condition and other approaches might be more applicable to your use case.

I have updated the docs to highlight this.

@rozza rozza closed this May 9, 2012
@mitar

OK, this is really strange. I would never know from documentation that document is not really saved. For me, created implies saving. Anyway, this is what happens in Django. So if you are targeting semantic equality to Django, then it is really strange that it would not work like this. Anyway, auto_save is True. So for this default case, when auto_save is True, upsert could be used. Please reopen this ticket for that case.

@rozza

Hi,

Well people wanted to do a fetch / create then save as part of their flow - so auto_save was introduced.

The semantics aren't the same as an upsert - infact doing an upsert could result in the loss of data, as you'd have to query against the whole document otherwise an upsert would overwrite the existing documents. For that reason, I'm not reopening.

Also, the same race condition exists in django - see: https://github.com/django/django/blob/master/django/db/models/query.py#L432-465 There are two separate operations:

1) To check for existence
2) Save an new instance if it fails.

Cheers,

Ross

@mitar

I know that Django's version also has race condition, but this is because it it hard to do it right when you need to support multiple database backends. For mongodb we could find a better solution.

@rozza

@mitar but there isn't a better solution for many scenarios as semantically this is the only way to achieve what some people want. You can always use an upsert if required but that is different and has different usecases.

Wont fix

@jorgebastida

@mitar @rozza afaik there is not a race condition in django if the fields you are using to query are unique=True:
Look: https://github.com/django/django/blob/master/django/db/models/query.py#L463

In case of an IntegrityError it rollback and then (line 467) it returns the result of a simple .get()

We've just face a race condition using get_or_create and been honest I don't think what could be the workaround...
We'll probably start doing a .filter().first() ...

@rozza

So unique=True stops the race condition - not true, there is a catch for the exception and code handles that race condition, if you fall into that catgeory. I'm happy to add the error catch here, but it still is leaky - depends on your write concerns as to if you get an index duplicate error back from the server. You should just use an upsert if you can - its what they are there for, otherwise version and locking patterns can be used.

There is a note on the method describing the race condition, so it is use at your own risk.

@rozza rozza referenced this issue in MongoEngine/mongoengine Jul 13, 2012
Closed

Add exception handling and rollback to get_or_create #35

@rozza

Marked the method for deprecation - but thinking about it we can do it better.

The race condition doesnt exist where unique=True - exactly same as django (indexes that enforce uniqueness will stop multiple items being added) we should enforce a safe write and catch an index error.

We could also code a rollback if created - add an extra query to match all items and delete all but the first ordered by obj id (creation time) - those would be items created in a race and then return the results of a get with created = False.

@rozza

Added new ticket: MongoEngine/mongoengine#35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment