Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency errors #101

Closed
kresa3333 opened this issue Jul 5, 2020 · 7 comments
Closed

Concurrency errors #101

kresa3333 opened this issue Jul 5, 2020 · 7 comments

Comments

@kresa3333
Copy link

kresa3333 commented Jul 5, 2020

Hello,

Recently I started working with multiple concurrent requests and a part of their flow is to access the CQRSlite logic and save the data to my mongoDB read and write(event-store) databases.

I am using .net core with its native IoC as follow -


            //Add Cqrs services
            services.AddSingleton<Router>(new Router());
            services.AddSingleton<ICommandSender>(y => y.GetService<Router>());
            services.AddSingleton<IEventPublisher>(y => y.GetService<Router>());
            services.AddSingleton<IHandlerRegistrar>(y => y.GetService<Router>());
            services.AddSingleton<IEventStore, DocsEventStore>();

            services.AddSingleton<ISnapshotStore, SnapshotsEventStore>();
            services.AddSingleton<ISnapshotStrategy, SnapshotingStrategy>();
            services.AddSingleton<IEventStoreSnapshotsRepository, MongoDbEventStoreSnapshotsRepository>();

            services.AddSingleton<ICache, MemoryCache>();
            services.AddTransient<IRepository>(y => new CacheRepository(new SnapshotRepository(y.GetService<ISnapshotStore>(), y.GetService<ISnapshotStrategy>(),
                new Repository(y.GetService<IEventStore>()), y.GetService<IEventStore>()),
                y.GetService<IEventStore>(), y.GetService<ICache>()));           
            services.AddTransient<ISession, Session>();

            //Scan for commandhandlers and eventhandlers
            services.Scan(scan => scan
                .FromAssemblies(typeof(OrderCommandHandler).GetTypeInfo().Assembly)
                .AddClasses(classes => classes.Where(x =>
                {
                    var allInterfaces = x.GetInterfaces();
                    return
                        allInterfaces.Any(y => y.GetTypeInfo().IsGenericType && y.GetTypeInfo().GetGenericTypeDefinition() == typeof(IHandler<>)) ||
                        allInterfaces.Any(y => y.GetTypeInfo().IsGenericType && y.GetTypeInfo().GetGenericTypeDefinition() == typeof(ICancellableHandler<>));
                }))
                .AsSelf()
                .WithTransientLifetime()
            );

I was wondering if there is something wrong with my Ioc setup?
Or maybe there is something special which needs to be done to support concurrency traffic?
I am asking because I have been getting the following errors :

frequent error:

System.AggregateException: One or more errors occurred. (Collection was modified; enumeration operation may not execute.) ---> System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at Microsoft.Extensions.Caching.Memory.CacheEntryExtensions.SetOptions(ICacheEntry entry, MemoryCacheEntryOptions options)
   at Microsoft.Extensions.Caching.Memory.CacheExtensions.Set[TItem](IMemoryCache cache, Object key, TItem value, MemoryCacheEntryOptions options)
   at CQRSlite.Caching.MemoryCache.Set(Guid id, AggregateRoot aggregate)
   at CQRSlite.Caching.CacheRepository.Save[T](T aggregate, Nullable`1 expectedVersion, CancellationToken cancellationToken)
   at CQRSlite.Caching.CacheRepository.Save[T](T aggregate, Nullable`1 expectedVersion, CancellationToken cancellationToken)
   at CQRSlite.Domain.Session.Commit(CancellationToken cancellationToken)

Less frequent error:

System.AggregateException: One or more errors occurred. (A different version than expected was found in aggregate cc2ac022-e04b-4398-a0b7-076e37caceec) ---> CQRSlite.Domain.Exception.ConcurrencyException: A different version than expected was found in aggregate cc2ac022-e04b-4398-a0b7-076e37caceec\r\n at CQRSlite.Domain.Session.Get[T](Guid id, Nullable1 expectedVersion, CancellationToken cancellationToken)\r\n `

@gautema
Copy link
Owner

gautema commented Jul 5, 2020

Hi.

I'm looking into this, but can you help me with providing a bit more info about your system? Do you have very high load? The less frequent error should come from time to time as it's part of the optimistic concurrency check. You are using expected version from the ui? If so, this is something that need to be planned for and either given a message to the user or retrying or handling in some other way.

The frequent error is worse, and I have to look into why that happens. What version of CQRSlite are you using and what .NET Core version?

@kresa3333
Copy link
Author

Hello and thank you for the response,

We are running on Windows Servers with mostly IIS sites.  
We are using .net core 2.2 and Cqrslite 0.27.
This is a pretty old version and we are planning to update it in the next days.

In terms of load, the majority of the load is from our B2B channel. We receive files and at some point Http request is being created and sent to the Cqrs logic - per file.
The thing is that at the moment it is a very controlled environment and what I mean by that is that even if we got tens of thousands of files waiting to be processed we are doing it in small batches of about 4+ files each time.
So the Cqrs part handles on average about ~4 concurrent requests.

A bit of context -
It worked well until we started using concurrent requests. Now while it still works we are getting those 2 error types,
The first error causes for many request failures but the second error sometimes actually also creates a gap in the DB (It is possible that this specific issue is related to our implementation around CQRSlite).

A bit more details about the issue -
Sometimes we get the second error which results in a failure of the request and more rarely it also results in a situation where the read model document stays on version X while the write model (event-Store) is on version X+1 (1 new event object is added to it)

@gautema
Copy link
Owner

gautema commented Jul 5, 2020

Even if it's a semi old version, there's not that many changes since then. I think upgrading cache version is the only one that can make a difference. I would try to use the the framework without the cacherepository to see if it works or make your own cache using something other than memorycache.

If theres something added to the readmodel but the write fails, there might be an error to your implementation of eventstore since the writemodel shouldn't update without sending a message to update the readmodel at some time.

I'm sorry I can't be of more help, but I'll try to see a bit more if I can find out whats wrong, but it's hard without being able to debug or fully understand the system where it's used.

@sebslsch
Copy link
Contributor

I encountered this error as well, using CQRSlite 0.29 on dotnet core 3.1.301:

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at Microsoft.Extensions.Caching.Memory.CacheEntryExtensions.SetOptions(ICacheEntry entry, MemoryCacheEntryOptions options)
   at Microsoft.Extensions.Caching.Memory.CacheExtensions.Set[TItem](IMemoryCache cache, Object key, TItem value, MemoryCacheEntryOptions options)
   at CQRSlite.Caching.MemoryCache.Set(Guid id, AggregateRoot aggregate)
   at CQRSlite.Caching.CacheRepository.Save[T](T aggregate, Nullable`1 expectedVersion, CancellationToken cancellationToken)
   at CQRSlite.Caching.CacheRepository.Save[T](T aggregate, Nullable`1 expectedVersion, CancellationToken cancellationToken)
   at CQRSlite.Domain.Session.Commit(CancellationToken cancellationToken)

I've written a test that seems to reproduce the exception reliably, at least in my environment.
It occurs when an eviction callback is registered with the memory cache, while at the same time a cache entry is set.
Since the callback is registered in the constructor of CQRSlite.Caching.CacheRepository, this is more likely to happen when many CacheRepositories are created.

Taking a closer look at the code of Microsoft.Extensions.Caching.Abstractions, setting the entry in the MemoryCache will result in the entryOptions being copied to the new instance of CacheEntry, which is not done in a thread safe manner.
https://github.com/dotnet/runtime/blob/687177bba23b00df733bbfec8668f83aa6add20e/src/libraries/Microsoft.Extensions.Caching.Abstractions/src/CacheEntryExtensions.cs#L187

Using a lock around all usages of _cacheOptions in CQRSlite.Caching.MemoryCache appears to resolve the problem.
However, I also noticed that the number of registered callbacks on the shared instance of _cacheOptions grows over time, since there is effectively no way to unregister them.
It seems to me that instances of MemoryCacheEntryOptions are not really meant to be shared.

From what I've seen, in the NET452 implementation of CQRSlite.Caching.MemoryCache a newly registered eviction callback will overwrite the previous one, so only one eviction callback can be registered at the same time. I have modified the .NET Core implementation to work the same way.

@gautema Would you like me to open a PR with my changes, or do you think there might be a better solution?

@gautema
Copy link
Owner

gautema commented Jul 21, 2020 via email

@sebslsch
Copy link
Contributor

Glad to help. Have a nice vacation, and let me know what you think when you are back.

@gautema
Copy link
Owner

gautema commented Jul 29, 2020

Closing this since its patched and done. Thanks for the help!

@gautema gautema closed this as completed Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants