-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected System.AccessViolationException when using Microsoft.Data.Sqlite #430
Comments
Hmm... using the span overload of |
I'll try to dig into this tomorrow. A couple months ago, based on discussion from somebody dealing with a crash, I came to suspect, but never confirmed, that there may be cases where I construct a Span that might outlive the pointer it came from. |
... but this issue does not appear to be what I was thinking about when I wrote that. |
The apparent code involved from
in cases where the underlying C function didn't consume the entire string, the tail returns a Span made from a slice of the Span that was given:
and all that happens inside the fixed block, so the pointers should still be valid, and no pointers escape out to the caller. The call from Microsoft.Data.Sqlite appears to be in
and the call itself:
which I'm guessing is safe. I mean, tail is going to end up as a slice of the span returned by sql.AsSpan(), which itself is not held in a local, but that should be okay? And the sql.AsSpan() does start from So at the moment I don't see anything obvious, which makes me suspect that either (1) I am missing something obvious, or (2) an off-by-one error somewhere, which is the next thing I'll look at. |
I'm seeing this error a lot more consistently in my pooling branch (dotnet/efcore#25018). Like you, I don't see anything obviously wrong with the Span code... |
No real progress yet, but here are some findings: The test code posted by @EYHN does reproduce the problem for me on Windows. I don't see any significant off-by-one errors that I mentioned in a previous comment. In general, the span tends to arrive with an extra byte, the zero terminator, which we don't really need, but that seems to cause no problems. I tried lying to the native code by passing Length -1 (thus removing that extra byte) and it made no difference. The test case string has a lot of whitespace before and after, but removing it seems to make no difference. Something in M.D.Sqlite is probably doing a Trim, since Whether the function returns a tail span or not seems to make no difference. I hacked the code to always return ReadOnlySpan.Empty for the tail, and the crash still happens. And also, all of the above was tested against a current build of the tree, not against 2.0.4. One major difference here is that the current tree uses DllImport for .NET Core instead of the dynamic provider. So that difference is yet another thing that makes no difference. |
The "other" overload (the one that uses utf8z instead of ReadOnlySpan) of Hacking the span overload to pass -1 (ignoring the length given in the span, since the caller seems to always be providing a zero terminator anyway) also seems to have no effect. |
…430 repro to use ugly instead of Microsoft.Data.Sqlite.
I ported the repro program to use SQLitePCLRaw.ugly instead of Microsoft.Data.Sqlite, and it no longer crashes. I think my version is equivalent. The code is in test_nupkgs/bug430. |
@ajcvickers is there anyone we can pull in from the runtime team to help investigate? |
@Pilchie Any ideas on who might be able to help here? |
@stephentoub? Any ideas? |
Here's the PR where the change went in: dotnet/efcore#24331 Are the |
@bricelam I did glare at those loops with a measure of suspicion, but I couldn't convince myself they were problematic. Still might be interesting to move the |
Oh wow, we're still seeing the exception after switching back to the string overload. I think Span was a red herring. |
Maybe a pinning issue in SQLitePCLRaw? |
Sorry, I missed yesterday that I was tagged. From the most recent comments it sounds like this isn't actually due to the cited PR but is actually longer standing and not related to the new span usage? |
@bricelam Hmmm. That's interesting. And it gives me a new angle of investigation. |
Thinking out loud: Okay, let's say Span was red herring. It was still a good guess, because of the timing, and because of the recent changed in Microsoft.Data.Sqlite to use the Span overload. If not Span, then did something else change? How long has this bug been around? And if the answer is "a very long time", why are we just finding it now? I wish we had a repro that does not involve Microsoft.Data.Sqlite. But due to the nature of the problem, it still seems likely that the bug, whatever it is, is somewhere in SQLitePCLRaw. |
How confident are we that the error seen WITH the Span overload is the same one seen after "switching back to the string overload" ? |
Clarification: There are 6 overloads of Those 3 overloads are for Previously, Microsoft.Data.Sqlite was using the The string overload calls the Span overload. It does not call the utf8z overload. So AFAICT, despite something I said above, it has not been using the |
…Data.Sqlite counterpart. still can't make it crash.
…ses Microsoft.Data.Sqlite. bug430_ugly is my attempt to repro the same bug without Microsoft.Data.Sqlite.
The stack trace is exactly the same going into sqlite3_prepare_v2 where the exception is thrown. Both throw AccessViolationException. |
Since merging, the CI fail rate increased significantly from ericsink/SQLitePCL.raw#430 Fixes dotnet#16202, unresolves dotnet#13837
|
So yeah, those stack traces look the same. Both code paths end up in the Span overload anyway. FWIW, I wanted to completely rule out the suspected problems with the utf8z overload, so I have a local build where I completely removed it. Running that with Microsoft.Data.Sqlite repro, I get no MissingMethod exceptions or anything like that, so it doesn't seem to be getting accidentally used anywhere. Clearly, the |
@stephentoub Can we start blaming the JIT or GC? 😏 |
"Can we start blaming the JIT or GC?" LOL. More thinking out loud: I still can't get the crash to happen in my repo program without Microsoft.Data.Sqlite. I have made a number of changes to make that repro similar to the way Microsoft.Data.Sqlite is doing things, but no luck. Microsoft.Data.Sqlite is doing something different, and that something either is the bug or it is triggering the bug. But we don't know what that something is. We do know that the changes in your pooling branch make the bug happen more often. Stab in the dark: Is there any way threading could be an issue here? My bug430_ugly repro is definitely not creating any threads. Is that same thing true for the original repro program from the OP? |
Since merging, the CI fail rate increased significantly from ericsink/SQLitePCL.raw#430 Fixes dotnet#16202, unresolves dotnet#13837
Playing with the repro... Here's the most minimal I can make it with M.D.Sqlite. I'll try using SQLitePCLRaw directly now. using Microsoft.Data.Sqlite;
using var connection = new SqliteConnection("Data Source=:memory:");
connection.Open();
while (true)
{
// NB: No using
var command = connection.CreateCommand();
command.CommandText = "SELECT 1";
command.ExecuteScalar();
} |
I confirm that your minimal M.D.Sqlite repro works (throws) for me as well. Wow. |
Could it be SQLITE_OPEN_NOMUTEX? We started passing that in 6.0 |
Mystery solved. Adding SQLITE_OPEN_NOMUTEX to 5.0 makes it throw. Does SQLitePCLRaw use threads? |
(1) Wait, I thought you had previously said the minimal repro did throw on 5.0 ? (2) Well, I don't think there are any threads anywhere. I'll double-check. (3) Adding SQLITE_OPEN_NOMUTEX to my non-M.D.Sqlite repro does not seem to make it throw. |
I thought 5.0 was throwing, but when I did a full rebuild, it started working. I suspect the concurrent access is happening on some GC thread--that's why we see it more without a But maybe it has nothing to do with concurrent access and there's just a lifetime bug somewhere inside SQLite itself when SQLITE_OPEN_NOMUTEX is specified. |
Small/partial confirmation: I built a SQLitePCLRaw which strips out SQLITE_OPEN_NOMUTEX even if the caller specifies it. The minimal M.D.Sqlite repro no longer throws. I'm using 6.0.0-preview.6.21352.1 |
"Does SQLitePCLRaw use threads?" No. I was pretty sure, as the idea that I would spin up a thread in SQLitePCLRaw is kind of unthinkable, but this kind of investigation is the time to be 100% sure. I looked everywhere I could think of, and I never create a thread, nor (AFAICT) do I call anything that would. |
So is removing SQLITE_OPEN_NOMUTEX a viable fix for you folks? |
Yep. It was really just added on a whim to see if we could get any perf gains from it. |
…. these overloads were not memory safe, as they returned a span made from a pointer obtained from a fixed block. technically, this is a breaking change, but it is likely that nothing was using those overloads. my own test suite was not, and Microsoft.Data.Sqlite was not. there was discussion of the problem with these overloads in #321. additional detective work on this happened while trying to figure out #430.
Thanks for all your help on this, @ericsink! |
…re_v2/v3. these overloads were not memory safe, as they returned a span made from a pointer obtained from a fixed block. technically, this is a breaking change, but it is likely that nothing was using those overloads. my own test suite was not, and Microsoft.Data.Sqlite was not. there was discussion of the problem with these overloads in #321. additional detective work on this happened while trying to figure out #430." This reverts commit 264be3a.
Hi, I just randomly stumbled across this issue. I think it hasn't been noted yet, but to me it seems clear why exceptions like Notice that the code creates a new
protected override void Dispose(bool disposing)
{
DisposePreparedStatements(disposing);
// ... Note it calls private void DisposePreparedStatements(bool disposing = true)
{
if (disposing
&& DataReader != null)
{
DataReader.Dispose();
DataReader = null;
}
if (_preparedStatements != null)
{
foreach (var stmt in _preparedStatements)
{
stmt.Dispose();
}
_preparedStatements.Clear();
}
_prepared = false;
} We can see that in the second
Therefore, it can happen that the finalizer thread calls the native (Note: Even if the Thanks! |
I'm still seeing this issue with microsoft.data.sqlite.core v 6.0.0-preview.7.21378.4 and sqlitepclraw.bundle_e_sqlite3 v2.0.6...
this gives:
|
@jhgbrt I can repro this crash using the versions you specified. But I just tried it with RC 1 (released today), and the crash did not happen. |
I can confirm it's fixed with rc1 of Microsoft.Data.SqLite.Core |
Yes, this was deliberately fixed on our side for RC1. |
Hi, I'm using this library in my personal project and when executing a query, there is a probability that a System.AccessViolationException will occur.
I am sorry that I cannot find the cause of the problem. I came here for some help
The error log is as follows:
I have stable reproduction code as follows: (Usually runs will crash within 30 seconds)
The package version i used:
After my attempts, the same problem appeared on windows pc and macbook.
I'll upload the database file I'm using, and this file seems to be the only one that's having problems.
database.zip
The text was updated successfully, but these errors were encountered: