Skip to content

SQLite3Adapter: Ensure fork-safety#52931

Closed
flavorjones wants to merge 3 commits into
rails:mainfrom
flavorjones:flavorjones-sqlite3-fork-safety
Closed

SQLite3Adapter: Ensure fork-safety#52931
flavorjones wants to merge 3 commits into
rails:mainfrom
flavorjones:flavorjones-sqlite3-fork-safety

Conversation

@flavorjones
Copy link
Copy Markdown
Member

Motivation / Background

Currently, SQLite3Adapter is not fork-safe and can result in database file corruption when a Rails process forks under certain circumstances.

rails/solid_queue#324 described sqlite database corruption happening while dealing with solid_queue job management. An investigation demonstrated that a likely cause of that corruption is sqlite's lack of fork-safety, a situation which was ironically made worse when the sqlite3-ruby driver improved its memory management with sparklemotion/sqlite3-ruby#392 in v2.0.0.

Any SQLite3::Database objects that are inherited from the parent process are unsafe to properly close and may impact either the parent or the child and potentially lead to database corruption.

You can read more about sqlite's lack of fork-safety in How To Corrupt An SQLite Database File §2.6, and reproduce corruption yourself with https://github.com/flavorjones/2024-09-13-sqlite-corruption.

Detail

In this commit, a pre-fork callback prepare_to_fork! is introduced for the connection pools and adapters to take any necessary actions before forking. The callback is only implemented by SQLite3Adapter which now invokes disconnect! to close all open prepared statements and database connections.

This pull request has a lot going on. The commits, in order, do the following:

  1. Prefactor activesupport/test/fork_tracker_test.rb.
  2. Introduce ActiveSupport::ForkTracker.before_fork for registering callbacks before a Rails process forks.
  3. Introduce the prepare_to_fork! callback and SQLite3Adapter's specialized version of that method.

Additional information

As a side effect, the parent process will end up having to re-open database connections if it continues to do work, which may be a small performance overhead for some use cases, but is necessary in order to prevent database corruption.

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 14, 2024

This looks good to me (but I'd like to give it a closer look on Monday).

end
end

ActiveSupport::ForkTracker.before_fork { ActiveRecord::ConnectionAdapters::PoolConfig.prepare_to_fork! }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue here, is that after_fork is simple, because in the child, only the thread that forked survived.

But before_fork is much trickier, because nothing guarantees no thread was spawned, nor that any of these threads isn't using a connection right now.

So we potentially have a big thread safety on our hands here.

#
# More on sqlite fork-safety: https://www.sqlite.org/howtocorrupt.html section 2.6
# More on Rails db corruption: https://github.com/rails/solid_queue/issues/324
disconnect!
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is called concurrently while another thread is using that connection? Is it synchronized?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, actually, since sqlite3-ruby doesn't release the GVL during queries I guess we're fine. But given people are asking for the GVL to be released...

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 14, 2024

No idea if related or just bad luck but...

<OBJ_INFO:gc_mark_ptr@gc.c:7072> 0x00007f3b854e00d0 [0 M    ] T_NONE
/rails/activerecord/lib/active_record/connection_adapters/abstract/schema_definitions.rb:306: [BUG] try to mark T_NONE object
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0061 p:0017 s:0336 e:000329 METHOD /rails/activerecord/lib/active_record/connection_adapters/abstract/schema_definitions.rb:306
...
/usr/local/lib/libruby.so.3.3(rb_bug) /usr/src/ruby/error.c:1052
/usr/local/lib/libruby.so.3.3(gc_mark_ptr+0x1ad) [0x7f3ba1ea3f4d] /usr/src/ruby/gc.c:7073
/usr/local/bundle/gems/sqlite3-2.0.4-x86_64-linux-gnu/lib/sqlite3/3.3/sqlite3_native.so(0x7f3b852a55d6) [0x7f3b852a55d6]
/usr/local/lib/libruby.so.3.3(gc_mark_stacked_objects+0x78) [0x7f3ba1ea7408] /usr/src/ruby/gc.c:7565
/usr/local/lib/libruby.so.3.3(gc_mark_stacked_objects_all) /usr/src/ruby/gc.c:7603
/usr/local/lib/libruby.so.3.3(gc_marks_rest) /usr/src/ruby/gc.c:8798
/usr/local/lib/libruby.so.3.3(gc_marking_exit+0x0) [0x7f3ba1ea8943] /usr/src/ruby/gc.c:8856
/usr/local/lib/libruby.so.3.3(gc_marks) /usr/src/ruby/gc.c:8867
/usr/local/lib/libruby.so.3.3(gc_start) /usr/src/ruby/gc.c:9609
/usr/local/lib/libruby.so.3.3(heap_prepare+0x2c) [0x7f3ba1ea9073] /usr/src/ruby/gc.c:2517
/usr/local/lib/libruby.so.3.3(heap_next_free_page) /usr/src/ruby/gc.c:2725
/usr/local/lib/libruby.so.3.3(newobj_alloc) /usr/src/ruby/gc.c:2827
/usr/local/lib/libruby.so.3.3(newobj_init+0x0) [0x7f3ba1ea9ea2] /usr/src/ruby/gc.c:2930
/usr/local/lib/libruby.so.3.3(newobj_of0) /usr/src/ruby/gc.c:2931
/usr/local/lib/libruby.so.3.3(newobj_of) /usr/src/ruby/gc.c:2947
/usr/local/lib/libruby.so.3.3(rb_wb_protected_newobj_of) /usr/src/ruby/gc.c:2962

and backfill coverage for deregistering callbacks.
Also add ForkTracker.unregister_before_fork and rename
ForkTracker.unregister to unregister_after_fork for consistency.
Sqlite itself is not fork-safe, as documented in
https://www.sqlite.org/howtocorrupt.html section 2.6. Any databases
that are inherited from the parent process are unsafe to properly
close and may impact either the parent or the child and potentially
lead to database corruption.

In this commit, a pre-fork callback `prepare_to_fork!` is introduced
for the connection pools and adapters to take any necessary actions
before forking. The callback is only implemented by the SQLite3Adapter
which now invokes `disconnect!` to close all open prepared statements
and database connections.

As a side effect, the parent process will end up having to re-open
database connections if it continues to do work, which may be a small
performance overhead for some use cases, but is necessary in order to
prevent database corruption.

See rails/solid_queue#324 for examples of
the type of corruption that can occur.
@flavorjones flavorjones force-pushed the flavorjones-sqlite3-fork-safety branch from eadbfb7 to c3e02fb Compare September 15, 2024 02:08
@flavorjones
Copy link
Copy Markdown
Member Author

No idea if related or just bad luck but...

I believe this issue was fixed by sparklemotion/sqlite3-ruby#556 and will be in a new release shortly.

@matthewd
Copy link
Copy Markdown
Member

matthewd commented Sep 15, 2024

This is not the right way to fix this -- forking should not affect the parent process's connections.

We should be doing this:

It is safe to close the underlying file descriptor

in discard!, as we do for the other adapters.

Edit: Thanks for identifying this! We (I) had obviously assumed it was safe to do regular closes across forks while teaching discard! to the other adapters.

@flavorjones
Copy link
Copy Markdown
Member Author

@matthewd Thanks for taking a look.

This is not the right way to fix this

I appreciate this response, I do know this is an unexpected approach and I'm not entirely happy with it myself. I will take another look at closing the file descriptors in the child, however I'll note that I did explore this already and did not find a method supported by the sqlite C API to access the file descriptors (and there may be multiple per database, e.g. WAL mode) nor a way to close them.

But, like I said, I'll take another look.

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 15, 2024

Yeah, I'm afraid SQLite doesn't allow to just discard connections in the child:

2.6. Carrying an open database connection across a fork()

Do not open an SQLite database connection, then fork(), then try to use that database connection in the child process. All kinds of locking problems will result and you can easily end up with a corrupt database. SQLite is not designed to support that kind of behavior. Any database connection that is used in a child process must be opened in the child process, not inherited from the parent.

Do not even call sqlite3_close() on a database connection from a child process if the connection was opened in the parent. It is safe to close the underlying file descriptor, but the sqlite3_close() interface might invoke cleanup activities that will delete content out from under the parent, leading to errors and perhaps even database corruption.

https://www.sqlite.org/howtocorrupt.html

Of course there's always the option to leak the connection, but...

That said, perhaps this fork safety protection would be better located in the sqlite gem?

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 15, 2024

Hum:

It is safe to close the underlying file descriptor,

It's unclear how to get the file descriptor though. Perhaps:

@flavorjones
Copy link
Copy Markdown
Member Author

perhaps this fork safety protection would be better located in the sqlite gem

If I can figure out how to close the file descriptors (and can determine that's safe) then yes, I'd prefer to put it in the sqlite3-ruby gem.

It's unclear how to get the file descriptor though. Perhaps ...

Yes, thanks, I'm looking into how to use sqlite3_file_control now, I did not see that in my earlier exploration.

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 15, 2024

Sounds good. There's still a lot of unknowns though. Even if we close the FDs, we still need to free the allocated memory, and it's unclear how. Perhaps it's safe to call sqlite3_close on a connection for which all FDs have been closed, but that need to be investigated and might come bite us in the future.

@flavorjones
Copy link
Copy Markdown
Member Author

Even if we close the FDs, we still need to free the allocated memory

I can fix this in the sqlite3-ruby gem, it will be GCed and deallocated but won't call sqlite3_close if the FDs have been discarded. I think I've got this working now, actually. Will take a day to kick the tires and really test it but I think this is the way.

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 15, 2024

I can fix this in the sqlite3-ruby gem, it will be GCed and deallocated but won't call sqlite3_close

How do you know for sure sqlite doesn't need to free some other structs? Is everything kept in a single memory block?

@flavorjones
Copy link
Copy Markdown
Member Author

How do you know for sure sqlite doesn't need to free some other structs

I'm going to take some time to look into it. Trust me! ♥

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 15, 2024

Oh, I trust you ❤️ , I'm just curious.

@flavorjones
Copy link
Copy Markdown
Member Author

flavorjones commented Sep 15, 2024

How do you know for sure sqlite doesn't need to free some other structs

It looks like sqlite3_close only frees data that's been explicitly associated with the database via sqlite3_set_clientdata which the sqlite3-ruby gem does not support. I'll run it in a loop to look for memory leaks to make sure, though.

Edit: I should be able to test the new Database#discard method without forking, which means we can also rely on ruby-memcheck to spot leaks.

@flavorjones
Copy link
Copy Markdown
Member Author

flavorjones commented Sep 15, 2024

I'm going to close this since the file descriptor gambit seems to be working. I will drop a pointer to the sqlite3-ruby pull request when I create it (likely tomorrow).

And then I'll open a new PR that will implement SQLite3Adapter#discard! as @matthewd suggested (which will try to be backwards-compatible with previous sqlite3-ruby versions).

Edit: work in progress is sparklemotion/sqlite3-ruby#558

@flavorjones
Copy link
Copy Markdown
Member Author

@matthewd @byroot OK, so simply closing just the file descriptors (which is described as "safe") leaks quite a bit of memory from the improperly-closed connection inherited from the parent. So "safe" doesn't mean "optimal" or "recommended".

This forum thread from last year deals with this question (in the context of a Lua app):

https://sqlite.org/forum/forumpost/1fa07728204567a0a136f442cb1c59e3117da96898b7fa3290b0063ae7f6f012

and the advice seems to be, "either close all connections before the fork or be OK with memory leaks".

Any advice or thoughts would be greatly appreciated.

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 15, 2024

leaks quite a bit of memory from the improperly-closed connection inherited from the parent.

What happens if you sqlite3_close such connection after having pre-closed the FDs? I assume you tried that.

IMO this PR wasn't too bad, closing before fork is way more annoying that after, but doable. The real issue is to ensure there isn't any active thread using the connections you are trying to close.

@matthewd
Copy link
Copy Markdown
Member

I think closing the FDs and accepting the possibility of leakage puts us on par with the other adapters: from memory I think our usage may not be officially supported by pg / mysql2 either.

The real issue is to ensure there isn't any active thread using the connections you are trying to close.

Setting aside the more-possible :memory: factor, and the implausible-but-technical "still-open fd to now-unlinked file", I will concede that recreating fresh SQLite connections is less bad than for other network-based adapters. But even with all that, I think the above is a showstopper: closing every possibly-mid-transaction connection in the process just because one thread has decided to fork seems thoroughly unacceptable.

It's built-in rather than being something the user must configure, but that makes this a worse version of #31241 (which at least had the merit of only affecting a particular class of forks).

quite a bit of memory

How much memory are we talking?


As a much more heavy-handed approach, under the argument that a well-behaved C-ext will ideally defer to Ruby for allocations, perhaps there's an extreme option involving SQLITE_CONFIG_MALLOC (first to give Ruby control over mallocs, and then more adventurously allowing the ext enough knowledge of what's been allocated that it might be able to just outright free everything in a fork-child -- without giving the library any opportunity to have an opinion)?

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 16, 2024

accepting the possibility of leakage puts us on par with the other adapters: from memory I think our usage may not be officially supported by pg / mysql2 either.

Not really, with pg/mysql we simply call close. Since the IO is a socket and not a file, there is no concens about locks and such on the file descriptor.

perhaps there's an extreme option involving SQLITE_CONFIG_MALLOC

Seems way too extreme to me, and likely won't work if it does some malloc at library init time etc.

@flavorjones
Copy link
Copy Markdown
Member Author

flavorjones commented Sep 16, 2024

What happens if you sqlite3_close such connection after having pre-closed the FDs?

Segfault.

How much memory are we talking?

On my Linux machine, it's 152,384 bytes leaked per connection (edit: calling sqlite3_db_release_memory reduces this to ~69kb). That's a floor, more memory will be leaked if virtual tables are being used, and it seems like actively used connections will have additional small (~500 byte) blocks leaked. I can do more research here if it's interesting.

perhaps there's an extreme option involving SQLITE_CONFIG_MALLOC

Maybe? I do appreciate the value of purpose-built arenas. But that feels risky and would take some time to determine if it can be done safely. Meanwhile we still have databases getting corrupted.

I propose:

WDYT?

@flavorjones
Copy link
Copy Markdown
Member Author

flavorjones commented Sep 16, 2024

I've written some of this up in a doc that, if we agree this is the best path forward, would be added to the sqlite3-ruby docs:

sparklemotion/sqlite3-ruby#558

I'd love feedback.

@byroot
Copy link
Copy Markdown
Member

byroot commented Sep 16, 2024

  • implement SQLite3Adapter#discard! (potentially emitting a warning telling people they're leaking?)

I think a warning is preferable yes. For the overwhelming majority of cases, that will only happen once because you generally just fork once, (or perhaps twice with Puma fork_worker), but for people using Pitchfork, or doing more exotic stuff, a warning would be appreciated as a way to say: figure yourself how to safely close all sqlite connections.

do some research into the "free raw memory pointers" gambit

Honestly that sounds way too risky to me.

I'd love feedback.

Your doc sound fair to me. I haven't check the implementation though.

@flavorjones
Copy link
Copy Markdown
Member Author

@matthewd Thanks for pushing back on this, I think we ended up with a much cleaner and more universal solution in sparklemotion/sqlite3-ruby#558.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants