Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_or_create should be atomic #478

Closed
mitar opened this issue Apr 5, 2012 · 9 comments
Closed

get_or_create should be atomic #478

mitar opened this issue Apr 5, 2012 · 9 comments

Comments

@mitar
Copy link

mitar commented Apr 5, 2012

Current implementation of get_or_create is not atomic. It should probably use upsert:

http://stackoverflow.com/questions/6703330/how-do-i-do-a-get-or-create-in-pymongo-python-mongodb

rozza referenced this issue in MongoEngine/mongoengine May 9, 2012
rozza referenced this issue in MongoEngine/mongoengine May 9, 2012
@rozza
Copy link

rozza commented May 9, 2012

It cant be atomic - because of the syntax and it determines if the document existed prior to it possibly being created.

Semantically, this is different to an upsert, you might create a new instance but not save it disk because this is part of a bigger operation.

However, this does create a race condition and other approaches might be more applicable to your use case.

I have updated the docs to highlight this.

@rozza rozza closed this as completed May 9, 2012
@mitar
Copy link
Author

mitar commented May 9, 2012

OK, this is really strange. I would never know from documentation that document is not really saved. For me, created implies saving. Anyway, this is what happens in Django. So if you are targeting semantic equality to Django, then it is really strange that it would not work like this. Anyway, auto_save is True. So for this default case, when auto_save is True, upsert could be used. Please reopen this ticket for that case.

@rozza
Copy link

rozza commented May 9, 2012

Hi,

Well people wanted to do a fetch / create then save as part of their flow - so auto_save was introduced.

The semantics aren't the same as an upsert - infact doing an upsert could result in the loss of data, as you'd have to query against the whole document otherwise an upsert would overwrite the existing documents. For that reason, I'm not reopening.

Also, the same race condition exists in django - see: https://github.com/django/django/blob/master/django/db/models/query.py#L432-465 There are two separate operations:

  1. To check for existence
  2. Save an new instance if it fails.

Cheers,

Ross

@mitar
Copy link
Author

mitar commented May 10, 2012

I know that Django's version also has race condition, but this is because it it hard to do it right when you need to support multiple database backends. For mongodb we could find a better solution.

@rozza
Copy link

rozza commented May 14, 2012

@mitar but there isn't a better solution for many scenarios as semantically this is the only way to achieve what some people want. You can always use an upsert if required but that is different and has different usecases.

Wont fix

@jorgebastida
Copy link

@mitar @rozza afaik there is not a race condition in django if the fields you are using to query are unique=True:
Look: https://github.com/django/django/blob/master/django/db/models/query.py#L463

In case of an IntegrityError it rollback and then (line 467) it returns the result of a simple .get()

We've just face a race condition using get_or_create and been honest I don't think what could be the workaround...
We'll probably start doing a .filter().first() ...

@rozza
Copy link

rozza commented Jul 13, 2012

So unique=True stops the race condition - not true, there is a catch for the exception and code handles that race condition, if you fall into that catgeory. I'm happy to add the error catch here, but it still is leaky - depends on your write concerns as to if you get an index duplicate error back from the server. You should just use an upsert if you can - its what they are there for, otherwise version and locking patterns can be used.

There is a note on the method describing the race condition, so it is use at your own risk.

@rozza
Copy link

rozza commented Jul 13, 2012

Marked the method for deprecation - but thinking about it we can do it better.

The race condition doesnt exist where unique=True - exactly same as django (indexes that enforce uniqueness will stop multiple items being added) we should enforce a safe write and catch an index error.

We could also code a rollback if created - add an extra query to match all items and delete all but the first ordered by obj id (creation time) - those would be items created in a race and then return the results of a get with created = False.

@rozza
Copy link

rozza commented Jul 13, 2012

Added new ticket: MongoEngine#35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants