Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support firestore transactions/batches #36

Open
gresnick opened this issue Feb 28, 2022 · 9 comments
Open

Support firestore transactions/batches #36

gresnick opened this issue Feb 28, 2022 · 9 comments

Comments

@gresnick
Copy link

https://firebase.google.com/docs/firestore/manage-data/transactions

Without this, cloud functions that share a trigger invariably enter a race condition.

@gresnick
Copy link
Author

I would be happy to contribute this with some initial guidance

@gmega
Copy link

gmega commented Oct 27, 2022

Indeed. This is a great project as you can get support for application-side schemas in Firestore while getting everything Pydantic has to offer (e.g. unlike other Firebase "ORMs" which have built their own stuff for schema definition), but without support for transactions we simply cannot adopt it.

@antont
Copy link
Contributor

antont commented Oct 4, 2023

Any new thoughts here? I'm also considering Firedantic for our project, where have until now (just a few weeks) written a self baked simple db util for using pydantic for firebase quite nicely. It lacks a lot though.

Am just worried that might hit a wall somewhere with Firedantic.

I'd guess it's always possible to just use the firebase python sdk client etc. directly, bypassing Firedantic, e.g. for a batch op?

@antont
Copy link
Contributor

antont commented Oct 4, 2023

I'd guess it's always possible to just use the firebase python sdk client etc. directly,

Just to answer my own question: yes, it seems trivial to fall back using the Client directly, am using it for more complex queries now and I guess running batch updates etc. would work somehow too.

@lietu
Copy link
Contributor

lietu commented Oct 5, 2023

It might not be too much work to accept an optional transaction argument to parameters so that you could use @firestore.transactional around firedantic yourself and pass in the transaction? If this seems valuable to you a PR could be interesting to see.

@antont
Copy link
Contributor

antont commented Oct 10, 2023

Without this, cloud functions that share a trigger invariably enter a race condition.

What do you actually mean with this BTW? I guess two functions that get triggered by the same thing, like that they listen for document created in the same collection or whatever.. I haven't happened to do such functions yet, just have a single kind of handler per event, but I guess that can be nice easily.

@lietu
Copy link
Contributor

lietu commented Oct 10, 2023

Say you have cloud functions handling people submitting a form to add you to a newsletter list.

The cloud function both

  1. adds you to a collection of newsletter subscribers
    and
  2. updates statistics on subscribers per region, by extracting the list of subscribers, and counting their totals per region based on e.g. the email address domain, then saving the numbers to a collection containing the statistics

Now if your database can't just perform an atomic operation to do these two actions at once, there's a decent chance that some day there will be a rare occurrence (rarity heavily depends on the popularity of your service), that two people add themselves to the newsletter list at very nearly exactly at the same time.

Now your two cloud functions will spin up, not knowing about each other, and not synchronizing their work, both will

  1. Add the user to the collection of newsletter subscribers
  2. Extract the data
  3. Calculate updated statistics
  4. Store statistics

Now if we name these two users A and B, their requests might be processed in linear infinitely divisible time in this order:

  • A1
  • B1
  • A2
  • A3
  • A4
  • B2
  • B3
  • B4

.. so both entries were added to the list first, then they both calculated the statistics and updated the data - no problem.

But if the order instead is:

  • A1
  • A2
  • B1
  • A3
  • B2
  • B3
  • B4
  • A4

The end result will be .. wrong. A2 calculated the result before B1 added user B to the list. The request for B knew that - saved in B4, but A4 updated the wrong data to the DB afterwards. This is a race condition, which happens due to the inherent inpredictability of simultaneous actions and can be made a bit more interesting by the inpredictability of the speed at which they end up being executed.

How you'd work around this is either 1) transactions, or 2) locks

Locks:

  • Request A comes in, it acquires an exclusive lock to the database
  • A1
  • A2
  • Request B comes in, it asks for the lock, but fails to get it and either errors, or for this example waits for it
  • A3
  • A4
  • Request A completes, and releases the lock
  • Request B acquires the lock
  • B1
  • B2
  • B3
  • B4
  • Request B releases the lock

Final result is predictable and good.

Transactions are a bit more like:

  • Request A comes in, and starts a transaction, and inside the transaction performs these actions
  • A1
  • A2
  • Request B comes in, and starts a transaction ..
  • B1
  • A3
  • B2
  • A4
  • B2
  • B3
  • B4 - the database errors and says you're trying to update something that has changed state since you started your transaction, the request to make the change will be ignored, your transaction logic restarts.
  • B1
  • B2
  • B3
  • B4

This might not be exactly faithful for how it works out in practice, but this is roughly what race conditions are in general, and how these 2 different methods of solving the problem of race conditions work.

@lietu
Copy link
Contributor

lietu commented Oct 10, 2023

Also to add, locks are generally speaking a simpler thing to implement and comprehend, but come with their own scalability issues, which is partially why transactions are often preferred.

@antont
Copy link
Contributor

antont commented Oct 17, 2023

Say you have cloud functions handling people submitting a form to add you to a newsletter list.

Right-o, thanks for the rautalanka. I think we currently avoid this by having such statistics like things triggered by scheduled cloud functions, so that only one task runs at a time for the whole service. Functions triggered by user activity only touch their own data. Will check our ops with this in mind anyway, and keep an eye on it for later.

I may also have some time to add support for this, also before we need to, just to be prepared once the need hits. Am curious if @gresnick or @gmega have ideas about how it would look, or if you write something I can at least test etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants