Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected System.AccessViolationException when using Microsoft.Data.Sqlite #430

Closed
EYHN opened this issue Jul 28, 2021 · 42 comments
Closed

Comments

@EYHN
Copy link

EYHN commented Jul 28, 2021

Hi, I'm using this library in my personal project and when executing a query, there is a probability that a System.AccessViolationException will occur.

I am sorry that I cannot find the cause of the problem. I came here for some help

The error log is as follows:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at SQLitePCL.SQLite3Provider_dynamic_cdecl.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at Microsoft.Data.Sqlite.SqliteCommand+<PrepareAndEnumerateStatements>d__64.MoveNext()
   at Microsoft.Data.Sqlite.SqliteCommand+<GetStatements>d__54.MoveNext()
   at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(System.Data.CommandBehavior)
   at ConsoleApp1.Program.Main(System.String[])

I have stable reproduction code as follows: (Usually runs will crash within 30 seconds)

using System;
using System.Data;
using Microsoft.Data.Sqlite;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var connection =
                new SqliteConnection(
                    @"Data Source=C:\Users\eyhn\Projects\Anything\core\Anything\bin\Debug\net5.0\cache\tracker.db"))
            {
                connection.Open();

                while (true)
                {
                    using (connection.BeginTransaction())
                    {
                        var queryCommand = connection.CreateCommand();
                        queryCommand.CommandText = $@"
                        SELECT Id, Url, Parent, IsDirectory, IdentifierTag, ContentTag FROM FileTracker WHERE Url=?1;
                        ";
                        queryCommand.Parameters.Add(new SqliteParameter("?1", "file://local/"));

                        var reader = queryCommand.ExecuteReader();

                        var result = HandleReaderSingleDataRow(reader);

                        Console.WriteLine(result?.Id + "");
                    }
                }
            }
        }

        private static DataRow HandleReaderSingleDataRow(IDataReader reader)
        {
            if (!reader.Read())
            {
                return null;
            }

            return new DataRow(
                reader.GetInt64(0),
                reader.GetString(1),
                reader.GetValue(2) as long?,
                reader.GetBoolean(3),
                reader.GetValue(4) as string,
                reader.GetValue(5) as string);
        }

        public record DataRow(long Id, string Url, long? Parent, bool IsDirectory, string IdentifierTag,
            string ContentTag);
    }
}

The package version i used:

> dotnet list package --include-transitive
Project 'ConsoleApp1' has the following package references
   [net5.0]: 
   Top-level Package            Requested                 Resolved
   > Microsoft.Data.Sqlite      6.0.0-preview.6.21352.1   6.0.0-preview.6.21352.1

   Transitive Package                         Resolved
   > Microsoft.Data.Sqlite.Core               6.0.0-preview.6.21352.1
   > SQLitePCLRaw.bundle_e_sqlite3            2.0.5-pre20210521085756
   > SQLitePCLRaw.core                        2.0.5-pre20210521085756
   > SQLitePCLRaw.lib.e_sqlite3               2.0.5-pre20210521085756
   > SQLitePCLRaw.provider.dynamic_cdecl      2.0.5-pre20210521085756
   > System.Memory                            4.5.3

After my attempts, the same problem appeared on windows pc and macbook.

I'll upload the database file I'm using, and this file seems to be the only one that's having problems.

database.zip

@bricelam
Copy link
Contributor

Hmm... using the span overload of sqlite3_prepare_v2 is new in Microsoft.Data.Sqlite 6.0.0.

@ericsink
Copy link
Owner

I'll try to dig into this tomorrow.

A couple months ago, based on discussion from somebody dealing with a crash, I came to suspect, but never confirmed, that there may be cases where I construct a Span that might outlive the pointer it came from.

@ericsink
Copy link
Owner

where I construct a Span that might outlive the pointer it came from

... but this issue does not appear to be what I was thinking about when I wrote that.

@ericsink
Copy link
Owner

The apparent code involved from provider.tt is in this function:

unsafe int ISQLite3Provider.sqlite3_prepare_v2(sqlite3 db, ReadOnlySpan<byte> sql, out IntPtr stm, out ReadOnlySpan<byte> tail)

in cases where the underlying C function didn't consume the entire string, the tail returns a Span made from a slice of the Span that was given:

                if (len_remain > 0)
                {
                    tail = sql.Slice(len_consumed, len_remain);
                }

and all that happens inside the fixed block, so the pointers should still be valid, and no pointers escape out to the caller.

The call from Microsoft.Data.Sqlite appears to be in SqliteCommand.cs. The source of the string is:

            var sql = new byte[byteCount + 1];
            Encoding.UTF8.GetBytes(_commandText, 0, _commandText.Length, sql, 0);

and the call itself:

                ReadOnlySpan<byte> tail;
                while (IsBusy(rc = sqlite3_prepare_v2(_connection!.Handle, sql.AsSpan(start), out stmt, out tail)))

which I'm guessing is safe. I mean, tail is going to end up as a slice of the span returned by sql.AsSpan(), which itself is not held in a local, but that should be okay?

And the sql.AsSpan() does start from start, which reflects whatever the tail gave, so...

So at the moment I don't see anything obvious, which makes me suspect that either (1) I am missing something obvious, or (2) an off-by-one error somewhere, which is the next thing I'll look at.

@bricelam
Copy link
Contributor

I'm seeing this error a lot more consistently in my pooling branch (dotnet/efcore#25018). Like you, I don't see anything obviously wrong with the Span code...

@ericsink
Copy link
Owner

ericsink commented Aug 11, 2021

No real progress yet, but here are some findings:

The test code posted by @EYHN does reproduce the problem for me on Windows.

I don't see any significant off-by-one errors that I mentioned in a previous comment.

In general, the span tends to arrive with an extra byte, the zero terminator, which we don't really need, but that seems to cause no problems. I tried lying to the native code by passing Length -1 (thus removing that extra byte) and it made no difference.

The test case string has a lot of whitespace before and after, but removing it seems to make no difference. Something in M.D.Sqlite is probably doing a Trim, since sqlite3_prepare_v2 doesn't actually get called a second time with all that whitespace.

Whether the function returns a tail span or not seems to make no difference. I hacked the code to always return ReadOnlySpan.Empty for the tail, and the crash still happens.

And also, all of the above was tested against a current build of the tree, not against 2.0.4. One major difference here is that the current tree uses DllImport for .NET Core instead of the dynamic provider. So that difference is yet another thing that makes no difference.

@ericsink
Copy link
Owner

The "other" overload (the one that uses utf8z instead of ReadOnlySpan) of sqlite3_prepare_v2 passes -1 for the length, which means the native code will determine the length by looking for the zero terminator.

Hacking the span overload to pass -1 (ignoring the length given in the span, since the caller seems to always be providing a zero terminator anyway) also seems to have no effect.

ericsink added a commit that referenced this issue Aug 11, 2021
ericsink added a commit that referenced this issue Aug 11, 2021
…430 repro to use ugly instead of Microsoft.Data.Sqlite.
@ericsink
Copy link
Owner

I ported the repro program to use SQLitePCLRaw.ugly instead of Microsoft.Data.Sqlite, and it no longer crashes. I think my version is equivalent. The code is in test_nupkgs/bug430.

@bricelam
Copy link
Contributor

bricelam commented Aug 11, 2021

@ajcvickers is there anyone we can pull in from the runtime team to help investigate?

@ajcvickers
Copy link

@Pilchie Any ideas on who might be able to help here?

@Pilchie
Copy link

Pilchie commented Aug 12, 2021

@stephentoub? Any ideas?

@bricelam
Copy link
Contributor

bricelam commented Aug 13, 2021

Here's the PR where the change went in: dotnet/efcore#24331

Are the while and do..while loops an issue? Or the yield return?

@ericsink
Copy link
Owner

@bricelam I did glare at those loops with a measure of suspicion, but I couldn't convince myself they were problematic.

Still might be interesting to move the sql.AsSpan(start) outside the while(IsBusy()) and see if the problem still repros. AFAICT, the AsSpan doesn't need to be inside the loop anyway?

@bricelam
Copy link
Contributor

bricelam commented Aug 13, 2021

Oh wow, we're still seeing the exception after switching back to the string overload. I think Span was a red herring.

@bricelam
Copy link
Contributor

Maybe a pinning issue in SQLitePCLRaw?

@stephentoub
Copy link

Sorry, I missed yesterday that I was tagged. From the most recent comments it sounds like this isn't actually due to the cited PR but is actually longer standing and not related to the new span usage?

@ericsink
Copy link
Owner

@bricelam Hmmm. That's interesting. And it gives me a new angle of investigation.

@ericsink
Copy link
Owner

Thinking out loud:

Okay, let's say Span was red herring. It was still a good guess, because of the timing, and because of the recent changed in Microsoft.Data.Sqlite to use the Span overload.

If not Span, then did something else change?

How long has this bug been around?

And if the answer is "a very long time", why are we just finding it now?

I wish we had a repro that does not involve Microsoft.Data.Sqlite. But due to the nature of the problem, it still seems likely that the bug, whatever it is, is somewhere in SQLitePCLRaw.

@ericsink
Copy link
Owner

How confident are we that the error seen WITH the Span overload is the same one seen after "switching back to the string overload" ?

@ericsink
Copy link
Owner

ericsink commented Aug 16, 2021

Clarification:

There are 6 overloads of sqlite3_prepare_v2 in raw.cs, 3 that return a tail, and 3 that do not. We're talking about the ones that return a tail.

Those 3 overloads are for string, ReadOnlySpan<byte>, and utf8z.

Previously, Microsoft.Data.Sqlite was using the string overload. Then it changed to use the Span overload.

The string overload calls the Span overload. It does not call the utf8z overload.

So AFAICT, despite something I said above, it has not been using the utf8z overload. And that's good, because the more review that code, the less I trust it.

ericsink added a commit that referenced this issue Aug 16, 2021
…Data.Sqlite counterpart. still can't make it crash.
@ericsink
Copy link
Owner

The search for this bug should probably include taking a fresh look at #409 and #321

ericsink added a commit that referenced this issue Aug 16, 2021
…ses Microsoft.Data.Sqlite. bug430_ugly is my attempt to repro the same bug without Microsoft.Data.Sqlite.
@bricelam
Copy link
Contributor

bricelam commented Aug 16, 2021

How confident are we that the error seen WITH the Span overload is the same one seen after "switching back to the string overload" ?

The stack trace is exactly the same going into sqlite3_prepare_v2 where the exception is thrown. Both throw AccessViolationException.

bricelam added a commit to bricelam/efcore that referenced this issue Aug 16, 2021
Since merging, the CI fail rate increased significantly from ericsink/SQLitePCL.raw#430

Fixes dotnet#16202, unresolves dotnet#13837
@bricelam
Copy link
Contributor

AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at SQLitePCL.SQLite3Provider_dynamic_cdecl.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at SQLitePCL.SQLite3Provider_dynamic_cdecl.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at SQLitePCL.raw.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, SQLitePCL.sqlite3_stmt ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at SQLitePCL.raw.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.String, SQLitePCL.sqlite3_stmt ByRef, System.String ByRef)

@ericsink
Copy link
Owner

So yeah, those stack traces look the same. Both code paths end up in the Span overload anyway.

FWIW, I wanted to completely rule out the suspected problems with the utf8z overload, so I have a local build where I completely removed it. Running that with Microsoft.Data.Sqlite repro, I get no MissingMethod exceptions or anything like that, so it doesn't seem to be getting accidentally used anywhere.

Clearly, the sqlite3_prepare_v2 span overload is where the symptom happens, but I'm not sure the actual memory corruption is happening there.

@bricelam
Copy link
Contributor

@stephentoub Can we start blaming the JIT or GC? 😏

@ericsink
Copy link
Owner

"Can we start blaming the JIT or GC?"

LOL.

More thinking out loud:

I still can't get the crash to happen in my repo program without Microsoft.Data.Sqlite. I have made a number of changes to make that repro similar to the way Microsoft.Data.Sqlite is doing things, but no luck.

Microsoft.Data.Sqlite is doing something different, and that something either is the bug or it is triggering the bug. But we don't know what that something is.

We do know that the changes in your pooling branch make the bug happen more often.

Stab in the dark: Is there any way threading could be an issue here? My bug430_ugly repro is definitely not creating any threads. Is that same thing true for the original repro program from the OP?

bricelam added a commit to bricelam/efcore that referenced this issue Aug 16, 2021
Since merging, the CI fail rate increased significantly from ericsink/SQLitePCL.raw#430

Fixes dotnet#16202, unresolves dotnet#13837
@bricelam
Copy link
Contributor

bricelam commented Aug 16, 2021

Playing with the repro... Here's the most minimal I can make it with M.D.Sqlite. I'll try using SQLitePCLRaw directly now.

using Microsoft.Data.Sqlite;

using var connection = new SqliteConnection("Data Source=:memory:");
connection.Open();

while (true)
{
    // NB: No using
    var command = connection.CreateCommand();
    command.CommandText = "SELECT 1";

    command.ExecuteScalar();
}

@ericsink
Copy link
Owner

ericsink commented Aug 16, 2021

I confirm that your minimal M.D.Sqlite repro works (throws) for me as well. Wow.

@bricelam
Copy link
Contributor

Could it be SQLITE_OPEN_NOMUTEX? We started passing that in 6.0

@bricelam
Copy link
Contributor

Mystery solved. Adding SQLITE_OPEN_NOMUTEX to 5.0 makes it throw. Does SQLitePCLRaw use threads?

@ericsink
Copy link
Owner

ericsink commented Aug 16, 2021

(1) Wait, I thought you had previously said the minimal repro did throw on 5.0 ?

(2) Well, I don't think there are any threads anywhere. I'll double-check.

(3) Adding SQLITE_OPEN_NOMUTEX to my non-M.D.Sqlite repro does not seem to make it throw.

@bricelam
Copy link
Contributor

I thought 5.0 was throwing, but when I did a full rebuild, it started working.

I suspect the concurrent access is happening on some GC thread--that's why we see it more without a using.

But maybe it has nothing to do with concurrent access and there's just a lifetime bug somewhere inside SQLite itself when SQLITE_OPEN_NOMUTEX is specified.

@ericsink
Copy link
Owner

Small/partial confirmation: I built a SQLitePCLRaw which strips out SQLITE_OPEN_NOMUTEX even if the caller specifies it. The minimal M.D.Sqlite repro no longer throws.

I'm using 6.0.0-preview.6.21352.1

@ericsink
Copy link
Owner

"Does SQLitePCLRaw use threads?"

No.

I was pretty sure, as the idea that I would spin up a thread in SQLitePCLRaw is kind of unthinkable, but this kind of investigation is the time to be 100% sure. I looked everywhere I could think of, and I never create a thread, nor (AFAICT) do I call anything that would.

@ericsink
Copy link
Owner

So is removing SQLITE_OPEN_NOMUTEX a viable fix for you folks?

@bricelam
Copy link
Contributor

So is removing SQLITE_OPEN_NOMUTEX a viable fix for you folks?

Yep. It was really just added on a whim to see if we could get any perf gains from it.

ericsink added a commit that referenced this issue Aug 16, 2021
…. these overloads were not memory safe, as they returned a span made from a pointer obtained from a fixed block. technically, this is a breaking change, but it is likely that nothing was using those overloads. my own test suite was not, and Microsoft.Data.Sqlite was not. there was discussion of the problem with these overloads in #321.  additional detective work on this happened while trying to figure out #430.
@ajcvickers
Copy link

Thanks for all your help on this, @ericsink!

ericsink added a commit that referenced this issue Aug 17, 2021
…re_v2/v3. these overloads were not memory safe, as they returned a span made from a pointer obtained from a fixed block. technically, this is a breaking change, but it is likely that nothing was using those overloads. my own test suite was not, and Microsoft.Data.Sqlite was not. there was discussion of the problem with these overloads in #321.  additional detective work on this happened while trying to figure out #430."

This reverts commit 264be3a.
@kpreisser
Copy link

kpreisser commented Aug 17, 2021

Hi,

I just randomly stumbled across this issue. I think it hasn't been noted yet, but to me it seems clear why exceptions like AccessViolationException can happen using the repro code in #430 (comment), when SQLITE_OPEN_NOMUTEX is specified for the connection (which is not a bug in SQLite or the GC/CLR):

Notice that the code creates a new SqliteCommand command in a loop without a using statement (or calling .Dispose() on it). This means that some time after the SqliteCommand is no longer referenced, the GC will eventually call the finalizer (in a separate thread) of this object, which calls Dispose(false) (since it hasn't been disposed yet).

SqliteCommand.Dispose(bool) looks like this:

        protected override void Dispose(bool disposing)
        {
            DisposePreparedStatements(disposing);
            // ...

Note it calls DisposePreparedStatements, passing the disposing parameter. Looking at this method:

        private void DisposePreparedStatements(bool disposing = true)
        {
            if (disposing
                && DataReader != null)
            {
                DataReader.Dispose();
                DataReader = null;
            }

            if (_preparedStatements != null)
            {
                foreach (var stmt in _preparedStatements)
                {
                    stmt.Dispose();
                }

                _preparedStatements.Clear();
            }

            _prepared = false;
        }

We can see that in the second if block, it iterates through the _preparedStatements and calls .Dispose() on them, regardless of whether disposing is true or false. (AFAIK this is not a correct implementation of the Dispose pattern, because when the finalizer is executed, object references on the current object might no longer be valid, and therefore the _preparedStatements shouldn't be accessed when disposing is false).

sqlite3_stmt.Dispose() in turn calls its ReleaseHandle(), which calls the SQLite native function sqlite3_finalize() in order to finalize the prepared statement.

Therefore, it can happen that the finalizer thread calls the native sqlite3_finalize() function for a specific native DB connection at the same time when the main thread calls other SQLite functions like sqlite3_prepare_v2() (within CreateCommand()/ExecuteScalar()) for the same native DB connection, which is not allowed when using SQLITE_OPEN_NOMUTEX, because then each DB connection may only be accessed by a single thread at the same time.

(Note: Even if the SqliteCommand.DisposePreparedStatements() is changed to only iterate over _preparedStatements when disposing is true, the access violation will probably still occur because then the GC will eventually finalize the sqlite3_stmt separately.)

Thanks!

@jhgbrt
Copy link

jhgbrt commented Sep 13, 2021

I'm still seeing this issue with microsoft.data.sqlite.core v 6.0.0-preview.7.21378.4 and sqlitepclraw.bundle_e_sqlite3 v2.0.6...

var connection = SqliteFactory.Instance.CreateConnection();
connection.ConnectionString = "Data Source=:memory:";
connection.Open();

while (true)
{
    var command = connection.CreateCommand();
    command.CommandText = "SELECT 1";
    command.CommandType = System.Data.CommandType.Text;
    command.ExecuteNonQuery();
}

this gives:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at SQLitePCL.SQLite3Provider_e_sqlite3+NativeMethods.sqlite3_prepare_v2(SQLitePCL.sqlite3, Byte*, Int32, IntPtr ByRef, Byte* ByRef)
--------------------------------
   at SQLitePCL.SQLite3Provider_e_sqlite3.SQLitePCL.ISQLite3Provider.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.ReadOnlySpan`1<Byte>, IntPtr ByRef, System.ReadOnlySpan`1<Byte> ByRef)
   at Microsoft.Data.Sqlite.SqliteCommand+<PrepareAndEnumerateStatements>d__64.MoveNext()
   at Microsoft.Data.Sqlite.SqliteCommand+<GetStatements>d__54.MoveNext()
   at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(System.Data.CommandBehavior)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader()
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery()
   at <Program>$.<Main>$(System.String[])

@ericsink
Copy link
Owner

@jhgbrt I can repro this crash using the versions you specified.

But I just tried it with RC 1 (released today), and the crash did not happen.

@jhgbrt
Copy link

jhgbrt commented Sep 14, 2021

I can confirm it's fixed with rc1 of Microsoft.Data.SqLite.Core

@bricelam
Copy link
Contributor

Yes, this was deliberately fixed on our side for RC1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants