Allow SQLite3 `busy_handler` to be configured with simple max number of retries #49352

fractaledmind · 2023-09-22T10:11:11Z

Motivation / Background

SQLite's busy_timeout is a specific implementation of the lower-level busy_handler function which simply retries busy connections with "exponential backoff" (not truly exponential, the backoff steps are [1, 2, 5, 10, 15, 20, 25, 25, 25, 50, 50, 100] ms and then 100 ms each step thereafter) up to the timeout number of milliseconds. Exposing access to this function via the timeout option in the database.yml file is useful, but insufficient.

It is growingly common for production Rails applications to use SQLite as their database engine. A recommended performance optimization is to immediately retry busy connections, instead of waiting for backoffs.

Detail

This pull request adds support for a new option in the database.yml called retries, which is used in a simple busy_handler function to retry busy connections without exponential backoff up to the max number of retries .

Additional information

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
[] Tests are added or updated if you fix a bug or add a feature.
- no tests added as the timeout option isn't tested, and testing either the busy_timeout or busy_handler function is difficult relative to its value
CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

…of `retries` Retrying busy connections without delay is a preferred practice for performance-sensitive applications. Add support for a `database.yml` `retries` integer, which is used in a simple `busy_handler` function to retry busy connections without exponential backoff up to the max number of `retries` .

suwyn · 2023-10-23T17:36:38Z

@fractaledmind Where is the backoff implemented? That's new to me and I haven't been able to find the source or another reference to it. Thanks.

fractaledmind · 2023-10-23T18:26:03Z

@suwyn See

https://www.sqlite.org/c3ref/busy_handler.html

https://github.com/sparklemotion/sqlite3-ruby/blob/6775c84975b69cd7d2284b3a115a7714fcdf1559/ext/sqlite3/database.c#L192

suwyn · 2023-10-23T19:17:27Z

Got it thanks. The default busy handler what I was looking for.

Noting for reference that the current implementation of timeout still leads to busy exceptions (i.e. it doesn't work) so one needs to implement their own busy handler.

This retries implementation in this PR is a good one, or one could use the monkey patch which reapplies the timeout setting without any backoff.

fractaledmind · 2023-10-23T21:48:13Z

timeout indeed isn't useful for multi-threading in the way described in that issue; i.e. allowing other threads to progress while waiting for the backoff. I think retries is a more solid foundation for handling connection contention. Can you confirm that it releases the GIL for other threads?

suwyn · 2023-10-23T23:45:38Z

Yes, can confirm the retries busy handler does not have the issue with the GIL.

The challenge I have with retries is I have no idea what the right setting is, or even if there is a right setting. Maybe I am thinking about it wrong though.

For example a quick test locally with RAILS_MAX_THREADS=5 (the default), in a test with 17 concurrent requests (15 GET/reading, 2 POST/writing), required me to set retries more than 1 million to avoid any busy errors.

I really don't care about how many retries are needed though, what I care about is the latency. With a timeout I can be explicit as to how I want the system to behave in regards to busy errors and how long to wait before throwing them.

Incidentally, in my rudimentary testing (simple rails app, apache benchmark), using the Timeout monkey patch but with a much smaller sleep (1 picosecond) was actually faster than the Retries handler.

oldmoe · 2023-10-24T13:40:32Z

At Litestack, what we do is simply either sleep for a small amount of time (e.g. 100 microseconds) or just switch the current fiber in case of a fiber based environment. The retries approach as implemented might result in needing a very high number of retries in order to ensure no timing out very often, it will also result in executing many blocks of Ruby code from within C control frames, not the best recipe for performance sensitive situations.

Maybe a simpler approach, that would still honor the timeout SLA and work for both a Fiber and Thread based environment would be to do the following:

Only accept the timeout configuration option, as it was the case
Don't set the busy_timeout though, rather set a busy_handler
The busy_handler sleeps for a little while (e.g. 1ms) and checks whether or not the timeout has passed
If it ever exceeds the timeout then false is returned, triggering a BusyException from SQLite3

some sample code

db.busy_handler do |count|
  timed_out = false
  # capture the start time of this blocked write
  @start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) if count == 0
  # keep track of elapsed time every x iterations (to lower load)
   if count % 100 == 0
    @elapsed_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) - @start_time  
    # fail if we exceed the timeout value (captured from the timeout config option, converted to seconds)
    timed_out = @elapsed_time > @timeout 
  end  
  if timed_out
     false # this will cause the BusyException to be raised
  else
     sleep 0.001 # sleep 1 millisecond (or whatever)
  end
end

This, or something similar keeps the current familiar configuration and is easy to reason about and understand, while internally it allows for other threads/fibers to take control while the current context is blocked on a write lock

suwyn · 2023-11-07T16:15:19Z

The retries approach as implemented might result in needing a very high number of retries in order to ensure no timing out very often, it will also result in executing many blocks of Ruby code from within C control frames, not the best recipe for performance sensitive situations.

That's what we have observed and why this retries implementation can actually be slower than sleeping.

Also want to note that the lower level handler busy handler is much much (that's a scientific expression) faster than specifying a busy_handler here. Would be wonderful if underlining issue with the busy timeout could be solved.

oldmoe · 2023-11-11T16:29:51Z

A C level handler will be naturally much faster than. Ruby one, but in order to allow other Ruby threads/fibers to run, you need to tap into the Ruby C API anyway and return control to the VM. I think the approach outlined above presents a good enough middle ground between fast-but-blocking and really-slow-but-concurrent. If you want to make it even more efficient, the forgo the time calculations and just bail out after a certain number of retries (while still sleeping a bit on every invocation)

fractaledmind · 2023-12-06T13:05:42Z

@oldmoe @suwyn: I have opened a PR to bring Mohammed's suggested approach into the active record-enhancedsqlite3-adapter gem that I maintain. We can test it in some applications using this gem for a bit before working to upstream this into Rails.

$fractaledmind$

rails-bot bot added the activerecord label Sep 22, 2023

guilleiguaran approved these changes Sep 24, 2023

View reviewed changes

guilleiguaran merged commit 7c07787 into rails:main Sep 24, 2023
4 checks passed

$@fractaledmind$ fractaledmind deleted the ar-sqlite-retries branch September 24, 2023 18:30

$@fractaledmind$ fractaledmind mentioned this pull request Sep 26, 2023

[skip ci] Add 7.1 release notes for SQLite3 adapter improvements #49387

Merged

$@fractaledmind$ fractaledmind mentioned this pull request Dec 6, 2023

Setup a better busy_handler that respects a timeout fractaledmind/activerecord-enhancedsqlite3-adapter#3

Merged

$@fractaledmind$ fractaledmind mentioned this pull request Dec 18, 2023

Improve concurrency support for SQLite3 #50370

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow SQLite3 `busy_handler` to be configured with simple max number of retries #49352

Allow SQLite3 `busy_handler` to be configured with simple max number of retries #49352

$@fractaledmind$ fractaledmind commented Sep 22, 2023

suwyn commented Oct 23, 2023

fractaledmind commented Oct 23, 2023

suwyn commented Oct 23, 2023

fractaledmind commented Oct 23, 2023

suwyn commented Oct 23, 2023 •

edited

oldmoe commented Oct 24, 2023

suwyn commented Nov 7, 2023

oldmoe commented Nov 11, 2023

fractaledmind commented Dec 6, 2023

Allow SQLite3 busy_handler to be configured with simple max number of retries #49352

Allow SQLite3 busy_handler to be configured with simple max number of retries #49352

Conversation

fractaledmind commented Sep 22, 2023

Motivation / Background

Detail

Additional information

Checklist

suwyn commented Oct 23, 2023

fractaledmind commented Oct 23, 2023

suwyn commented Oct 23, 2023

fractaledmind commented Oct 23, 2023

suwyn commented Oct 23, 2023 • edited

oldmoe commented Oct 24, 2023

suwyn commented Nov 7, 2023

oldmoe commented Nov 11, 2023

fractaledmind commented Dec 6, 2023

Allow SQLite3 `busy_handler` to be configured with simple max number of retries #49352

Allow SQLite3 `busy_handler` to be configured with simple max number of retries #49352

$@fractaledmind$ fractaledmind commented Sep 22, 2023

suwyn commented Oct 23, 2023 •

edited