Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.InvalidOperationException: Can't replace active reader. #1154

Closed
adamhathcock opened this issue Mar 25, 2022 · 32 comments
Closed

System.InvalidOperationException: Can't replace active reader. #1154

adamhathcock opened this issue Mar 25, 2022 · 32 comments

Comments

@adamhathcock
Copy link

Software versions
MySqlConnector version: 2.1.8
Server type (MySQL, MariaDB, Aurora, etc.) and version: Aurora (MySQL 5.7)
.NET version: .NET 6
(Optional) ORM NuGet packages and versions: Pomelo 6.0.1

Describe the bug
After some usage my application, I start getting pseudo-random exceptions. It doesn't happen on each call but it happens around 50% of web calls after some time. Restarting the application seems to reset things but they again start to go wrong after a while.

This only happened after the latest update. I changed from version 2.1.6 to 2.1.8. I updated from .NET and EFCore 6.0.2 to 6.0.3. Pomelo remained the same.

Admittingly there are other code changes but nothing big nor that should do this as I don't use Readers directly. Only with EFCore. I don't see connection exhaustion or anything looking at metrics in AWS.

I'm not sure if I'm doing something wrong or where to look for to how to solve this problem. Can't find anything similar. I'm going to downgrade to 2.1.6 to see if the issue reoccurs. I'm hoping to get some pointers about how to figure out what's going on.

Exception

System.InvalidOperationException: Can't replace active reader.
   at MySqlConnector.MySqlConnection.SetActiveReader(MySqlDataReader dataReader) in /_/src/MySqlConnector/MySqlConnection.cs:line 874
   at MySqlConnector.MySqlDataReader.CreateAsync(CommandListPosition commandListPosition, ICommandPayloadCreator payloadCreator, IDictionary`2 cachedProcedures, IMySqlCommand command, CommandBehavior behavior, Activity activity, IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlDataReader.cs:line 460
   at MySqlConnector.Core.CommandExecutor.ExecuteReaderAsync(IReadOnlyList`1 commands, ICommandPayloadCreator payloadCreator, CommandBehavior behavior, Activity activity, IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/CommandExecutor.cs:line 56
   at MySqlConnector.MySqlCommand.ExecuteReaderAsync(CommandBehavior behavior, IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlCommand.cs:line 344
   at MySqlConnector.MySqlCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlCommand.cs:line 337
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
   at Microsoft.EntityFrameworkCore.Query.ShapedQueryCompilingExpressionVisitor.SingleOrDefaultAsync[TSource](IAsyncEnumerable`1 asyncEnumerable, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.ShapedQueryCompilingExpressionVisitor.SingleOrDefaultAsync[TSource](IAsyncEnumerable`1 asyncEnumerable, CancellationToken cancellationToken)
@bgrainger
Copy link
Member

I don't think I've seen this error reported before.

It might be useful to test EF Core 6.0.3 with MySqlConnector 2.1.6 and EF Core 6.0.2 with MySqlConnector 2.1.8 to see if changing just one of those libraries has introduced the problem.

@adamhathcock
Copy link
Author

Seemingly, reverting to 2.1.6 stops the problem.

It only happens in one of the services after some period of use. It's like I'm leaking something is one specific piece of code but I don't know where to begin as I don't believe I'm doing anything different in one area versus another.

I've had several hours with no errors now when it only took 1-2 at most (this service is relatively low usage).

@adamhathcock
Copy link
Author

Actually, got the problem overnight it seems. I'll try downgrading EFCore soon.

Definitely a weird problem. Just wish I knew what i was doing 😬

@adamhathcock
Copy link
Author

Actually, I now have a theory.

I discovered some bad code that created a runaway query. The command timed out which caused on exception at some point. However, the reader used on the query is somehow still active. Later the pool tries to use it causing this exception.

Not sure if that's reality but I definitely found my problem query happening. Then a series of sequent errors above.

I've fixed the code and kept my downgrades. We'll see if errors reoccur. If not I'll upgrade again and wait more.

@adamhathcock
Copy link
Author

I've now re-upgraded to everything and it's fine. Definitely my own bug but the error was so unfamiliar that I wanted to report it.

Happy to close this unless you want to do something

@bgrainger
Copy link
Member

Hmm, MySqlConnector should generate a better exception if a connection/reader is being reused (rather than returning an in-use object to the pool).

But unless you have a consistent repro for this problem I don't know if I can solve it based solely on the description, so you can close this case.

@adamhathcock
Copy link
Author

It's going to be hard to reproduce as my guess it's like:

  1. execute long query on connection
  2. command times out (magic here is that the connection is still in use)
  3. pool finds the same reader that thinks its rightly or wrongly in use
  4. boom exception

happy to close....maybe someone else will google in the future and find this helpful

@jcfore21
Copy link

jcfore21 commented Aug 19, 2022

@bgrainger Can we reopen this issue? We recently upgraded to .net 6 and MySqlConnector 2.1.11 and I'm seeing the exact same error. Its sporadic just like the original issue mentions. I'm unable to determine what is causing this and its causing us data loss in our application.

Connection timeouts shouldn't cause this to happen - the connection pool shouldn't reuse a connection that is already active. There seems to be something fundamentally wrong here.

@bgrainger
Copy link
Member

@jcfore21 What was the previous version of MySqlConnector you upgraded from? Were there any application changes prior to this exception happening, or just a PackageReference update? Is there any more information you can provide that might help reproduce this?

@bgrainger bgrainger reopened this Aug 19, 2022
@jcfore21
Copy link

@bgrainger

We upgraded from .net core 3.1 to .net 6 and upgraded our packages to support that:

MySqlConnector 2.1.9 -> 2.1.11
Microsoft.EntityFrameworkCore 5.0.10 -> 6.0.7
Microsoft.EntityFrameworkCore.Design 5.0.10 -> 6.0.7
Microsoft.EntityFrameworkCore.Tools 5.0.6 -> 6.0.7
Pomelo.EntityFrameworkCore.MySql 5.0.4 -> 6.0.2
Pomelo.EntityFrameworkCore.MySql.Json.Newtonsoft 5.0.4 -> 6.0.2

These were the relevant packages that we updated.

Also, we are using Aurora - Mysql 5.7.

@jcfore21
Copy link

On our connection string we also have a ConnectionLifeTime=1800 and MaximumPoolSize = 250

This has been this way for quite some time before our upgrade - but mentioning it just in case that has something to do with it.

Also we are using AddDbContextPool with the following UseMysql (in our Startup.cs)

 options.UseMySql(conn, ServerVersion.Parse("5.7.32", ServerType.MySql), options =>
  {
      options.UseNewtonsoftJson(MySqlCommonJsonChangeTrackingOptions.FullHierarchyOptimizedFast)
          .UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)
          .EnableIndexOptimizedBooleanColumns(true);
  })
  .AddInterceptors(new List<IInterceptor>() {new RemoveLastOrderByInterceptor()});

@jcfore21
Copy link

@bgrainger We have restarted our service and the issue has subsided but as @adamhathcock had experienced, im sure it will come back in a few hours.

I'm wondering if the issue is really a MySqlConnector issue or a change in EF Core. While I understand the DBContext Pooling is different from the MySqlConnector connection pooling, could there be something related there?

I noticed that the EF Core 6 documentation says that the DBContext pool size has been defaulted to 1024 - while it use to be 128:
https://docs.microsoft.com/en-us/ef/core/performance/advanced-performance-topics?tabs=with-di%2Cwith-constant#dbcontext-pooling

image

Not sure how that would cause connections to still be active though and back in the pool..

@jcfore21
Copy link

@bgrainger Yep issue persists. It came back in exactly one hour on the dot.

@jcfore21
Copy link

We are trying to set the ConnectionLifeTime back to its default of 0.

@jcfore21
Copy link

@bgrainger We are still seeing the same issues even after changing the ConnectionLifeTime back to 0.

I'm really at a loss on what to try now. The behavior I'm seeing doesn't make sense. We have several kubernetes pods and only 2 seem to be getting the error.

@bgrainger
Copy link
Member

I'll have to dig in and try to see if I can figure out what state would cause this on the MySqlConnector side.

@jcfore21
Copy link

jcfore21 commented Aug 24, 2022

@bgrainger after some deep digging into our logs, I found some very strange things. When the issue starts it will throw an exception for every table in the database with a "Failed executing DbCommand (0ms).." and a Select all as the query for each table.

Then it will log one last error message:

Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while saving the entity changes. See the inner exception for details.
---> MySqlConnector.MySqlException (0x80004005): Duplicate entry '3d51076c-e7b0-4548-8ba8-5e2e3ade22da' for key 'UC_JobId'
at MySqlConnector.Core.ResultSet.ReadResultSetHeaderAsync(IOBehavior ioBehavior) in /_/src/MySqlConnector/Core/ResultSet.cs:line 43
at MySqlConnector.MySqlDataReader.ActivateResultSet(CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlDataReader.cs:line 132
at MySqlConnector.MySqlDataReader.NextResultAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlDataReader.cs:line 89
at MySqlConnector.MySqlDataReader.DisposeAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlDataReader.cs:line 609
at MySqlConnector.MySqlDataReader.Close() in /_/src/MySqlConnector/MySqlDataReader.cs:line 341
at System.Data.Common.DbDataReader.CloseAsync()
--- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.RelationalDataReader.DisposeAsync()
at Microsoft.EntityFrameworkCore.Storage.RelationalDataReader.DisposeAsync()
at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IList`1 entriesToSave, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(StateManager stateManager, Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
at Pomelo.EntityFrameworkCore.MySql.Storage.Internal.MySqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) ...

It seems here that on SaveChangesAsync we are failing to Dispose the reader? How could that be happening because of a Duplicate entry error?

@bgrainger
Copy link
Member

MySqlConnector is throwing an exception within Dispose (specifically from MySqlDataReader.NextResultAsync), which is generally a Bad Thing to do. (If the call stack is already being unwound for another exception, throwing a new one can mask the original problem.)

That's probably worth fixing on its own, but I'm not sure how much it impacts the original issue.

Just to help gather some more information, can you post a full call stack from the "Can't replace active reader." exception?

@bgrainger
Copy link
Member

It may be an indicator that application (or EF) code is "reading" a MySqlDataReader by disposing it, instead of correctly consuming it by calling NextResultAsync with try/catch blocks around that code as necessary.

@bgrainger
Copy link
Member

bgrainger commented Aug 25, 2022

We upgraded from .net core 3.1 to .net 6 and upgraded our packages to support that:
MySqlConnector 2.1.9 -> 2.1.11
Microsoft.EntityFrameworkCore 5.0.10 -> 6.0.7

At this point I'm leaning towards the theory that some change in EF.Core 6.0 or Pomelo 6.0 is triggering this new (undesirable) behaviour, but I still don't know what it would have been. The OP said the change was just EF.Core 6.0.2 to 6.0.3 so it's possible that looking at that diff would provide some clues?

@jcfore21 If released a 2.1.13-beta.1 (or rc.1) version that just changed MySqlDataReader.DisposeAsync to swallow and log exceptions, would you be able to run that in production and see if the errors change? It would also be valuable to see if the new logging is triggered and what it logs.

@jcfore21
Copy link

jcfore21 commented Aug 25, 2022

@bgrainger The code mainly seems to start erroring in a BackgroundService that runs along side our APIs. It contains a foreach loop all within a scoped DbContext:

  await using var scope = _scopeFactory.CreateAsyncScope();
  var dbContext = scope.ServiceProvider.GetRequiredService<TDbContext>();
  foreach (SqsMessage sqsMessage in sqsMessages)
  {
        try 
        {

               // for brevity here is where we are  calling into our code that eventually calls SaveChangesAsync
               //   that fails trying to Dispose

          }
          catch (Exception ex)
          {
              _singleRunError = ex;
              _logger.LogError(ex, "Error processing event: {eventName}, {messageId}, {receiveCount}",
                  messageType,
                  messageId,
                  sqsMessage.Attributes.ContainsKey(ApproximateReceiveCount)
                      ? sqsMessage.Attributes[ApproximateReceiveCount]
                      : "Undefined");
          }
  }

I think there might be an issue here because it catches the exception but the next iteration would still have the scope that failed to dispose correctly? It seems to always happen in a SaveChangesAsync that is failing. My thought was to change our code and move the CreateAsyncScope and GetRequiredService() calls into each iteration of the foreach. Just thinking maybe that would help it Dispose of things correctly.

Also we tried reverting back to what we had originally (.net core 3.1), and we got the errors again BUT they seem to only happen in short spurts and self heal. Meaning Id see them happen less frequently and then go away for quite some time. So this issue may have been there the whole time, we just recovered from it and it didn't seem to affect anything until now.

We plan to release the code change tomorrow and see if that helps. We can also try out the 2.1.13-beta.1 as you mentioned - that would be great.

@bgrainger
Copy link
Member

I don't know if it's save to reuse dbContext after an exception has been thrown. Perhaps changing that might help?

2.1.13-beta.1 is now available: https://www.nuget.org/packages/MySqlConnector/2.1.13-beta.1. As mentioned, very interested to see what (if any) warnings are logged.

@jcfore21
Copy link

jcfore21 commented Aug 25, 2022

@bgrainger Well we deployed my change and the issue resurfaced after an hour. Here is the initial stack trace from the Can't replace active reader exception:

at MySqlConnector.MySqlConnection.SetActiveReader(MySqlDataReader dataReader) in /_/src/MySqlConnector/MySqlConnection.cs:line 874
at MySqlConnector.MySqlDataReader.CreateAsync(CommandListPosition commandListPosition, ICommandPayloadCreator payloadCreator, IDictionary`2 cachedProcedures, IMySqlCommand command, CommandBehavior behavior, Activity activity, IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlDataReader.cs:line 460
at MySqlConnector.Core.CommandExecutor.ExecuteReaderAsync(IReadOnlyList`1 commands, ICommandPayloadCreator payloadCreator, CommandBehavior behavior, Activity activity, IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/Core/CommandExecutor.cs:line 56
at MySqlConnector.MySqlCommand.ExecuteReaderAsync(CommandBehavior behavior, IOBehavior ioBehavior, CancellationToken cancellationToken) in /_/src/MySqlConnector/MySqlCommand.cs:line 345
at MySqlConnector.MySqlCommand.ExecuteDbDataReader(CommandBehavior behavior) in /_/src/MySqlConnector/MySqlCommand.cs:line 278
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.InitializeReader(Enumerator enumerator)at Pomelo.EntityFrameworkCore.MySql.Storage.Internal.MySqlExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.MoveNext()

My plan is to get the 2.1.13-beta.1 out and try that next.

@jcfore21
Copy link

I've updated our services with 2.1.13-beta.1 and waiting deploy to our production environment. I'll let you know how that goes.

@jcfore21
Copy link

@bgrainger After deploying that fix, we have gone over an hour without issue. We even had the exception occur which normally seemed to be the start of the "Can't replace active reader" issues. However, it didn't appear to start those messages anymore. I wasn't able to see any warnings logged either.

I think the issue is may be resolved but going to keep monitoring.

@jcfore21
Copy link

@bgrainger This has gone all night without issue. Sounds resolved to me. Will the plan be to move this into an official version?

@bgrainger
Copy link
Member

Thanks for testing! Were any warnings logged? I would expect them to be, if the modified code was being executed.

Yes, I can ship 2.1.13 with this fix.

@jcfore21
Copy link

We may not have the logging turned down enough in prod :( I was going to take a deeper look into that. But I believe since you are eating the exception it now gets further on and cleans up the data reader in the FinishQuerying(). So thats why its working fine now? That's what I understood from your commit :-)

@bgrainger
Copy link
Member

Fixed in 2.1.13 by f26ae9b.

@Kation

This comment was marked as off-topic.

@Kation

This comment was marked as off-topic.

@bgrainger
Copy link
Member

@Kation Please open a new issue instead of adding comments to a closed one. Let's move the discussion here: #1469.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants