Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate uuid/version events prevent further updates #170

Closed
UncertaintyP opened this issue Nov 19, 2020 · 27 comments
Closed

Duplicate uuid/version events prevent further updates #170

UncertaintyP opened this issue Nov 19, 2020 · 27 comments
Assignees

Comments

@UncertaintyP
Copy link

First I want to thank you guys for your awesome packages and the hard work you put into them. ❤️

I was trying out this package in local development (Nginx + PHP-FPM, MySQL). After some time this system goes dormant and the next request takes some time to be processed. In this state I (accidentally) made 2 concurrent requests to update the same aggregate. The corresponding controller function:

    public function update(UserUpdateRequest $request, User $user)
    {
        UserAggregateRoot::retrieve($user->id)
            ->updateUser($request->validated() + ['id' => $user->id])
            ->persist();

        return UserResource::make($user->refresh());
    }

This resulted in a duplicate entry in the database, see:

image

Note: The UserAggregateRoot does NOT have protected static bool $allowConcurrency = true;

Now every update request to that specific aggregate (user) results in the following error:

[2020-11-19 17:39:58] local.ERROR: Could not persist aggregate UserAggregateRoot (uuid: 8be7ace5-8c98-3823-b858-11cdff2974e7) because it seems to be changed by another process after it was retrieved in the current process. Expect to persist events after version 2, but version 1 was already persisted. {"userId":"9e33aa4f-6efa-3505-ab49-9d667e5b2c39","exception":"[object] (Spatie\\EventSourcing\\Exceptions\\CouldNotPersistAggregate(code: 0): Could not persist aggregate UserAggregateRoot (uuid: 8be7ace5-8c98-3823-b858-11cdff2974e7) because it seems to be changed by another process after it was retrieved in the current process. Expect to persist events after version 2, but version 1 was already persisted. at base/vendor/spatie/laravel-event-sourcing/src/Exceptions/CouldNotPersistAggregate.php:18)
[stacktrace]
#0 base/vendor/spatie/laravel-event-sourcing/src/AggregateRoots/AggregateRoot.php(189): Spatie\\EventSourcing\\Exceptions\\CouldNotPersistAggregate::unexpectedVersionAlreadyPersisted()
#1 base/vendor/spatie/laravel-event-sourcing/src/AggregateRoots/AggregateRoot.php(90): Spatie\\EventSourcing\\AggregateRoots\\AggregateRoot->ensureNoOtherEventsHaveBeenPersisted()
#2 base/vendor/spatie/laravel-event-sourcing/src/AggregateRoots/AggregateRoot.php(79): Spatie\\EventSourcing\\AggregateRoots\\AggregateRoot->persistWithoutApplyingToEventHandlers()
#3 base/Modules/User/Http/Controllers/UserController.php(91): Spatie\\EventSourcing\\AggregateRoots\\AggregateRoot->persist()
  1. Should this be handled by the package automatically (is this a bug) or should this be prevented in another part of the system?
  2. Should I enforce unique constraints in the database on [aggregate_uuid, aggregate_version]? If so, should this be highlighted in the docs?
  3. FYI: adding allowConcurrency afterwards to the root allows for updating and inserts a new event with version 3.

Best regards,
Marcel

@dilab
Copy link
Contributor

dilab commented Dec 29, 2020

We had the same issue and we had to manually delete one of the duplicated events.

Wonder why this happens too.

@dilab
Copy link
Contributor

dilab commented Jan 15, 2021

@UncertaintyP turns out this is a design issue. we as developers should design the functions to be idempotent to protect this. it is a valid exception the aggregate throws.

@UncertaintyP would love to discuss your use case. you can reach me via my profile.

@UncertaintyP
Copy link
Author

@dilab Sorry I don't see how to contact you.

My use case is pretty simple. I want to update a User model. I was relying on the documentation stating: If that highest version id differs from the highest one that the aggregate remembers, an exception will be thrown. This can happen when there are two concurrent requests that try to update the same aggregate. Obviously, that has not been the case and I struggle to understand why or how to protect against it.

@dilab
Copy link
Contributor

dilab commented Jan 19, 2021

@UncertaintyP
Basically we have to guard this scenario, prevent this from happening. Things like eventual consistency and saga comes to play.

@UncertaintyP
Copy link
Author

  1. The docs already say that this is protected by the package as stated above (which is probably not).
  2. This is an atomic transaction - only a single event has been recorded - nothing you can order or manage. I fail to see how this is related to eventual consistency or saga.
  3. What do you suggest in this case specifically - a record of a single event? Again if this should be enforced in some way by the user (locking tables, enforcing constraints, ...) I feel like the docs should mention this and maybe even provide a solution as this is the most basic usage of an aggregate.

@dilab
Copy link
Contributor

dilab commented Jan 26, 2021

@freekmurze is it possible to add lock mechanism to this package to prevent concurrent requests?

@booni3
Copy link

booni3 commented Feb 2, 2021

I also had this same issue. My fix as I was up against it at the time was to manually delete the event in the stored events table, which allowed my system to unlock. My duplicate event appeared within the events table only, so the aggregate root was out of sync to my projections (although this time it was the aggregate root at fault).

I do have protected static bool $allowConcurrency = true on at the moment, although this was a duplicated single event, not two events firing at the same time, so I do not think this is relevant.

image

Where you see 4 lines, there should only be 2. Only 2 events fired as reflected in my projections.

@dilab
Copy link
Contributor

dilab commented Feb 9, 2021

@UncertaintyP are you using transaction somewhere in this particular request?

@UncertaintyP
Copy link
Author

UncertaintyP commented Feb 9, 2021

@dilab No. This is simply recording the event and the projector uses Laravels User::update function. Also no queues. Will probably go with a unique constraint on both columns because I don't need concurrency in this project.

@booni3
Copy link

booni3 commented Mar 6, 2021

@UncertaintyP @dilab This is the same in my case too. I am having concurrent requests being generated via a standard button click through to a controller. I do have queues running too but have isolated certain cases to actions where no queues are running. I do not really understand how a concurrent request can happen in these cases. I am not using transactions either.

I am wondering whether adding an atomic lock prior to calling any methods on the aggregate root will fix this or whether something is going on past this first call. i.e.

Cache::lock('MyAction')->get(function () {
    InventoryAggregateRoot::retrieve($this->uuid)->allocateOrderItem($this->item)->persist();
});

I am trying this next.

@booni3
Copy link

booni3 commented Mar 6, 2021

I have found another instance where the aggregate has 2 events under the same aggregate_version, but they were created 20 mins apart. So these are certainly not the same process and an atomic lock will not help.

What could cause the same aggregate version to be used across this length of time?

Same aggregate_uuid, same aggregate_version, created at 21:05 & 21:26 respectively

image

@erikgaal
Copy link
Contributor

@booni3 are you using (non-sync) queues?

@booni3
Copy link

booni3 commented Mar 13, 2021

@erikgaal I have seen the issue in 2 scenarios:

Same aggregate version, same timestamp:
In these cases, one event had the user saved in the metadata in and the other event did not. This is being run via a livewire button without queues but there may be something going on with livewire that is causing a double call of the method. To remove this as an option I have added atomic locks to the action class that calls the AR method.

Same aggregate version, different timestamps:
This one was likely run within a async job, however it runs on a queue that only allows a single process at a time (config below) so I am unsure how it could have created a duplicate event, especially this far apart.

'supervisor-allocation' => [
    'connection' => 'redis-long-running',
    'queue' => ['allocation-high', 'allocation'],
    'balance' => 'false', // process queues in order
    'processes' => 1,
    'tries' => 1,
    'timeout' => 300, // Timeout after 5 minutes
]

@dilab
Copy link
Contributor

dilab commented Apr 9, 2021

@booni3 what is the size of the aggregate? I meant how many events are recorded for that particular UUID.

@dilab
Copy link
Contributor

dilab commented Apr 11, 2021

@freekmurze Did some research over the weekend. This package is unable to deal with the race condition. For instance if you have a race condition like two events with the same ID and the same version will be saved at nearly the same time. This can be simulated using apache benchmark, for example ab -n 20 -c 10

One way to solve this is to use a composite primary key consisting of the aggregate ID and the version number, however, this package does not save unique version number per aggregate.

So my suggestion is to do:

  • modify the package to save unique version number when persisting the aggregate at :

image

  • then we can make use of the database composite primary key to solve the concurrency issue.

What do you think? If okay, I will send a PR .

@brendt
Copy link
Collaborator

brendt commented Apr 16, 2021

Hi all, sorry for the late reply. I was reading through all your comments and was thinking that a composite key would indeed be the best solution, let the database deal with the race conditions in order to prevent invalid state. This means that the developer should still handle those cases if they occur.

@dilab thanks for the thorough investigation, yes feel free to send a PR!

@brendt
Copy link
Collaborator

brendt commented Apr 20, 2021

FYI I'm working on a PR myself.

@brendt
Copy link
Collaborator

brendt commented Apr 20, 2021

Status update: unfortunately adding a unique constraint breaks aggregate roots with $allowConcurrency enabled. I wasn't working on this package when $allowConcurrency was added but it does exactly the opposite as what we want to achieve, and I believe it's a feature that should be removed in the next major update.

There are two options in dealing with this breaking change:

  • Only fix this issue in the v5 branch, but it'll take another month or two before that is released
  • We can deprecate $allowConcurrency and warn in the changelog: if you want to prevent race conditions, you'll have to manually run a migration to add the unique constraint. If you're doing so, all ARs with $allowConcurrency have the potential to break.

I prefer option 2, but would like some input.

@riasvdv
Copy link
Member

riasvdv commented Apr 20, 2021

I think the deprecation and optional migration is a good solution for people who need the fix right away.

We can then remove the concurrency functionality in the v5 branch

@brendt
Copy link
Collaborator

brendt commented Apr 20, 2021

PR is here: #212

@rovansteen
Copy link

@brendt awesome, thanks!

Any plans to add a retry mechanism to this package? Before we switched to this package we had a custom event sourcing implementation that caught the unique constraint violation and retried X times before giving up, allowing you to still have some concurrency but without giving up the integrity of the aggregate.

@brendt
Copy link
Collaborator

brendt commented Apr 20, 2021

@rovansteen We could add a function that retries x-amounts of time but it sounds to me that it would only solve part of the problem.

My preference would be to have a mechanism that ensures only one persist can be done simultaneously for specific operations that you know might happen concurrently, and that will automatically refresh the aggregate root as well. We already have an excellent queuing system built-into Laravel, so I'd prefer to look in that direction first.

@brendt
Copy link
Collaborator

brendt commented Apr 20, 2021

As a sidenote: we were already planning on adding a command bus in v5, which would be one way of solving the issue: only allow one concurrent command at a time, running in a job.

@dilab
Copy link
Contributor

dilab commented Apr 21, 2021

@brendt great, any timeline to release this?

@dilab
Copy link
Contributor

dilab commented Apr 21, 2021

@rovansteen we are using the laravel built-in queue retry mechanism to handle the case you have mentioned.

@brendt
Copy link
Collaborator

brendt commented Apr 21, 2021

@brendt brendt closed this as completed Apr 21, 2021
@brendt
Copy link
Collaborator

brendt commented Apr 21, 2021

Here's the followup discussion about better handling concurrency: #214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants