Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5300 Avoid race conditions in Spree::StockItem #5406

Conversation

kristinalim
Copy link
Member

@kristinalim kristinalim commented May 12, 2020

What? Why?

I've looked into #5300, exploring if the negative inventory levels could be related to a missing callback for guest checkout, subscriptions, and the now removed backorders, but this doesn't seem to be the case.

The issue seems to be that stock levels in Spree::StockItem, which we check for stock levels before completing an order, could be updated by another order before the stock level is updated by the current order.

This PR introduces two changes:

  • Optimistic locking for Spree::StockItem via lock_version.
  • Require count_on_hand in Spree::StockItem to be positive if not backorderable.

What should we test?

Scenario 1: Stock restrictions for customer checking out should still work. Open two browser sessions, one logged in as admin (Session A), another logged in as customer (Session B).

  1. Session A: Set the stock level of a product to limited stock with e.g. 4.
  2. Session B: Add less than total stock left to cart, e.g. 3.
  3. Session A: Update the stock level of the product to be less than the quantity in cart, e.g. 2.
  4. Session B: Attempt to check out. This should warn of insufficient stock.
  5. Session B: Reduce quantity in cart, e.g. 2.
  6. Session B: Checkout should succeed.

Scenario 2: Stock restrictions for admin setting up an order should still work. Open two browser sessions, both logged in as admin.

  1. Session A: Set the stock level of a product to limited stock with e.g. 4.
  2. Session B: Create an order, Add less than total stock left to cart, e.g. 3.
  3. Session A: Update the stock level of the product to be less than the quantity in cart, e.g. 2.
  4. Session B: Attempt to complete the order. This should succeed but reduce the quantity in the order to the max available.

Release notes

Address race condition resulting in orders exceeding available stock.

Changelog Category: Fixed

@kristinalim kristinalim self-assigned this May 12, 2020
@kristinalim kristinalim changed the title Add optimistic locking to Spree::StockItem 5300 Add optimistic locking to Spree::StockItem May 12, 2020
@kristinalim kristinalim changed the title 5300 Add optimistic locking to Spree::StockItem 5300 [WIP] Add optimistic locking to Spree::StockItem May 12, 2020
@kristinalim kristinalim changed the title 5300 [WIP] Add optimistic locking to Spree::StockItem 5300 [WIP] Avoid race conditions in Spree::StockItem May 12, 2020
@kristinalim kristinalim force-pushed the fix/5300-optimistic_locking_in_stock_items branch from b958358 to 37b8d48 Compare May 12, 2020 07:59
@kristinalim kristinalim force-pushed the fix/5300-optimistic_locking_in_stock_items branch 2 times, most recently from 9d2a8e1 to 089dab4 Compare May 14, 2020 08:56
@kristinalim kristinalim changed the title 5300 [WIP] Avoid race conditions in Spree::StockItem 5300 Avoid race conditions in Spree::StockItem May 14, 2020
Copy link
Contributor

@luisramos0 luisramos0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @kristinalim
From what I read, with lock_version there will be some ActiveRecord::StaleObjectError exceptions be thrown. Do you know where and how will this affect the user journey?
Will this be a generic checkout error shown to the user?

I wonder if this can be tested in a concurrency spec like the checkout_controller_concurrency_spec.

Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some good changes in here but I'm not sure I understood where the root cause is. Are you assuming that an admin is updating the stock level while somebody checks out?

I also don't see any code dealing with the stale object errors. Is the plan to observe them in Bugsnag first to know when they actually happen?

app/models/spree/stock_item.rb Outdated Show resolved Hide resolved
spec/models/spree/stock_item_spec.rb Show resolved Hide resolved
@@ -10,6 +10,7 @@ class StockItem < ActiveRecord::Base
# rubocop:disable Rails/UniqueValidationWithoutIndex
validates :variant_id, uniqueness: { scope: :stock_location_id }
# rubocop:enable Rails/UniqueValidationWithoutIndex
validates :count_on_hand, numericality: { greater_than_or_equal_to: 0, unless: :backorderable? }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the checkout fail. Did you test what the user would see in this scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can see, this should never happen, if it happens and if it blows up we will see it in bugsnag.

@luisramos0
Copy link
Contributor

@kristinalim please move back to Code Review when you think this is ready to be re-reviewed. Thanks!

@luisramos0 luisramos0 force-pushed the fix/5300-optimistic_locking_in_stock_items branch from c03a678 to e4b346d Compare June 22, 2020 16:54
@luisramos0 luisramos0 self-assigned this Jun 22, 2020
@luisramos0
Copy link
Contributor

I am having a look at this.
I thought of creating PRs on top of Kristina's branch but Kristina may not be available in the next few days so I decided to work directly on Kristinas branch. I started by rebasing the PR and update stock_item to spree 2.1 (rails 4 version).

@luisramos0 luisramos0 force-pushed the fix/5300-optimistic_locking_in_stock_items branch from b855fa2 to e4e037f Compare June 22, 2020 17:36
@luisramos0
Copy link
Contributor

I have found that checkout controller already contains this:
rescue ActiveRecord::StaleObjectError
So we will be ok on that front.

With this last version I tried reproducing the problem by placing two orders at the same time. I only get the correct behavior, one of the sessions gets this:
image

I am now trying to write a spec that simulates two users checking out at the same time with insufficient stock for both.
I am working at controller level extending spec/controllers/checkout_controller_concurrency_spec.rb

@mkllnk
Copy link
Member

mkllnk commented Jun 23, 2020

With this last version I tried reproducing the problem by placing two orders at the same time. I only get the correct behavior

And with master? Can you actually re-produce the error? I still don't know why the synchronisation doesn't work in this case.

@luisramos0
Copy link
Contributor

luisramos0 commented Jun 24, 2020

I was trying to extend https://github.com/openfoodfoundation/openfoodnetwork/blob/9e730e6445231c16396de2da7c33bff26c0aed74/spec/controllers/checkout_controller_concurrency_spec.rb

With a spec that would test for different orders with the same stock limited variant.
I can create a lot of deadlocks but they are not consistent...

I realize that the existing spec still passes without this:
image

And the second thread doesnt even get to the load_order call back.

Which is exactly the same problem in my attempt here.
I dont understand why the second thread doesnt even get to load_order... I cant underastand what's going on.

I have spent too much time on this already without success. So I am leaving it.
I suggest we just go ahead with this PR as is. Thoughts?

@luisramos0
Copy link
Contributor

rebased to resolve conflicts.

@luisramos0 luisramos0 force-pushed the fix/5300-optimistic_locking_in_stock_items branch from e4e037f to ba50491 Compare June 24, 2020 15:17
@sauloperez
Copy link
Contributor

sauloperez commented Jul 16, 2020

This looks pretty good but I noticed this while reading https://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html:

This locking mechanism will function inside a single Ruby process. To make it work across all web requests, the recommended approach is to add lock_version as a hidden field to your form.

We have two workers on our infra so this will be needed. It could come on a separate PR but the issue won't be fully fixed until we address this.

@luisramos0
Copy link
Contributor

yeah, this PR is not fixing 5300, just improving the situation.

@sauloperez can you clarify what exactly is preventing you from approving this PR? Is it the question to @mkllnk ?

@sauloperez
Copy link
Contributor

sauloperez commented Jul 16, 2020

I didn't notice this isn't closing #5300 . If this is about introducing optimistic locking while bringing in the necessary models, this is Test ready IMO.

@filipefurtad0 filipefurtad0 self-assigned this Jul 17, 2020
@filipefurtad0 filipefurtad0 added the pr-staged-fr staging.coopcircuits.fr label Jul 17, 2020
@filipefurtad0
Copy link
Contributor

filipefurtad0 commented Jul 17, 2020

Hi @kristinalim ,

I tried to reproduce the issue as reported in #5300, but without success. I thank you for the detailed test cases! I went through them:

  • Scenario 1: Opened two browser sessions, one logged in as admin (Session A), another logged in as customer (Session B).

Session A: I've set the stock level of a product to the limited stock of 4.
Session B: Added 3 items
Session A: Updated the stock level of the product to 2.
Session B: Attempted to check out. This warned of insufficient stock, and redirected to the /cart page, as shown below:

image

Session B: Reduced quantity in cart to 2; While in the /cart page, pressing Update triggered the warning in the red tab appear again, although the quantity in car was already compatible with the available stock -> a "memory-effect" was observed while testing another PR (#5766). I will investigate this issue - system retaining previous state and showing previously shown warnings - and report later. <- After pressing Update a second time and making the warning disappear redirection to the checkout page followed.
Session B: Checkout succeed.

  • Scenario 2: Opened two browser sessions, both logged in as admin.

Session A: I've set the stock level of a product to limited stock, in this case, 4 items.
Session B: Created an order, adding less than total stock left to cart: 3 items.
Session A: Updated the stock level of the product to be less than the quantity in cart: I've set it to 2 items.
Session B: Attempting to complete the order displays the message "Error saving payment":

image

and triggers Bugnsnag:
https://app.bugsnag.com/open-food-france/open-food-france-prod/errors/5f11cd5d9f0dd90017a69e3d?event_id=5f11cd5d005b6e2ca49f0000&i=sk&m=nw

  • I have reproduced Scenario 2 with the same values in master-branch: the order is successfully created (Session B) and I verified that negative stock values are allowed:

image

Scenario 2 doesn't seem to be working as expected. Perhaps this PR could be improved? I'm moving it into In Dev for now. The other option would be to merge and open an issue on this concern. Please let me know if you find this beneficial in this case.

@filipefurtad0 filipefurtad0 removed the pr-staged-fr staging.coopcircuits.fr label Jul 17, 2020
@RachL
Copy link
Contributor

RachL commented Jul 17, 2020

@filipefurtad0 I don't think Kristina will be able to pick this up as she has no time to contribute at the moment. As it looks to me that it is already improving the situation, without making what we have in prod worse, I would merge and open a dedicated issue. What do you think?

@filipefurtad0
Copy link
Contributor

filipefurtad0 commented Jul 17, 2020

Thank you for your feedback @RachL ,

I agree, the situation is improved: as Scenario 2 shows, after this PR it's not possible to reach negative stock levels, through backoffice orders - which is the second change introduced by the PR.

Opened issue #5784 about the error.

Thanks again!

@luisramos0
Copy link
Contributor

It's great you can replicate the problem Filipe! I thought we would not be able to replicate it.

I agree with the approach but I think 5784 is a sub-issue of 5300 and thus still an S2 to be worked on. Does that make sense?
If we can replicate 5784 locally I think it will be straight forward to add some defensive code to let the user know the product is out of stock.

@filipefurtad0
Copy link
Contributor

Cheers to @kristinalim and the wonderful testing notes! 🎉
Upgraded to S2, and moved to dev-ready.

Thanks @luisramos0 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants