Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow SQLite3 busy_handler to be configured with simple max number of retries #49352

Merged
merged 1 commit into from Sep 24, 2023

Conversation

fractaledmind
Copy link
Contributor

Motivation / Background

SQLite's busy_timeout is a specific implementation of the lower-level busy_handler function which simply retries busy connections with "exponential backoff" (not truly exponential, the backoff steps are [1, 2, 5, 10, 15, 20, 25, 25, 25, 50, 50, 100] ms and then 100 ms each step thereafter) up to the timeout number of milliseconds. Exposing access to this function via the timeout option in the database.yml file is useful, but insufficient.

It is growingly common for production Rails applications to use SQLite as their database engine. A recommended performance optimization is to immediately retry busy connections, instead of waiting for backoffs.

Detail

This pull request adds support for a new option in the database.yml called retries, which is used in a simple busy_handler function to retry busy connections without exponential backoff up to the max number of retries .

Additional information

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • [] Tests are added or updated if you fix a bug or add a feature.
    • no tests added as the timeout option isn't tested, and testing either the busy_timeout or busy_handler function is difficult relative to its value
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

…of `retries`

Retrying busy connections without delay is a preferred practice for performance-sensitive applications. Add support for a `database.yml` `retries` integer, which is used in a simple `busy_handler` function to retry busy connections without exponential backoff up to the max number of `retries`
.
@guilleiguaran guilleiguaran merged commit 7c07787 into rails:main Sep 24, 2023
4 checks passed
@fractaledmind fractaledmind deleted the ar-sqlite-retries branch September 24, 2023 18:30
@suwyn
Copy link

suwyn commented Oct 23, 2023

@fractaledmind Where is the backoff implemented? That's new to me and I haven't been able to find the source or another reference to it. Thanks.

@suwyn
Copy link

suwyn commented Oct 23, 2023

Got it thanks. The default busy handler what I was looking for.

Noting for reference that the current implementation of timeout still leads to busy exceptions (i.e. it doesn't work) so one needs to implement their own busy handler.

This retries implementation in this PR is a good one, or one could use the monkey patch which reapplies the timeout setting without any backoff.

@fractaledmind
Copy link
Contributor Author

timeout indeed isn't useful for multi-threading in the way described in that issue; i.e. allowing other threads to progress while waiting for the backoff. I think retries is a more solid foundation for handling connection contention. Can you confirm that it releases the GIL for other threads?

@suwyn
Copy link

suwyn commented Oct 23, 2023

Yes, can confirm the retries busy handler does not have the issue with the GIL.

The challenge I have with retries is I have no idea what the right setting is, or even if there is a right setting. Maybe I am thinking about it wrong though.

For example a quick test locally with RAILS_MAX_THREADS=5 (the default), in a test with 17 concurrent requests (15 GET/reading, 2 POST/writing), required me to set retries more than 1 million to avoid any busy errors.

I really don't care about how many retries are needed though, what I care about is the latency. With a timeout I can be explicit as to how I want the system to behave in regards to busy errors and how long to wait before throwing them.

Incidentally, in my rudimentary testing (simple rails app, apache benchmark), using the Timeout monkey patch but with a much smaller sleep (1 picosecond) was actually faster than the Retries handler.

@oldmoe
Copy link

oldmoe commented Oct 24, 2023

At Litestack, what we do is simply either sleep for a small amount of time (e.g. 100 microseconds) or just switch the current fiber in case of a fiber based environment. The retries approach as implemented might result in needing a very high number of retries in order to ensure no timing out very often, it will also result in executing many blocks of Ruby code from within C control frames, not the best recipe for performance sensitive situations.

Maybe a simpler approach, that would still honor the timeout SLA and work for both a Fiber and Thread based environment would be to do the following:

  1. Only accept the timeout configuration option, as it was the case
  2. Don't set the busy_timeout though, rather set a busy_handler
  3. The busy_handler sleeps for a little while (e.g. 1ms) and checks whether or not the timeout has passed
  4. If it ever exceeds the timeout then false is returned, triggering a BusyException from SQLite3

some sample code

db.busy_handler do |count|
  timed_out = false
  # capture the start time of this blocked write
  @start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) if count == 0
  # keep track of elapsed time every x iterations (to lower load)
   if count % 100 == 0
    @elapsed_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) - @start_time  
    # fail if we exceed the timeout value (captured from the timeout config option, converted to seconds)
    timed_out = @elapsed_time > @timeout 
  end  
  if timed_out
     false # this will cause the BusyException to be raised
  else
     sleep 0.001 # sleep 1 millisecond (or whatever)
  end
end

This, or something similar keeps the current familiar configuration and is easy to reason about and understand, while internally it allows for other threads/fibers to take control while the current context is blocked on a write lock

@suwyn
Copy link

suwyn commented Nov 7, 2023

The retries approach as implemented might result in needing a very high number of retries in order to ensure no timing out very often, it will also result in executing many blocks of Ruby code from within C control frames, not the best recipe for performance sensitive situations.

That's what we have observed and why this retries implementation can actually be slower than sleeping.

Also want to note that the lower level handler busy handler is much much (that's a scientific expression) faster than specifying a busy_handler here. Would be wonderful if underlining issue with the busy timeout could be solved.

@oldmoe
Copy link

oldmoe commented Nov 11, 2023

A C level handler will be naturally much faster than. Ruby one, but in order to allow other Ruby threads/fibers to run, you need to tap into the Ruby C API anyway and return control to the VM. I think the approach outlined above presents a good enough middle ground between fast-but-blocking and really-slow-but-concurrent. If you want to make it even more efficient, the forgo the time calculations and just bail out after a certain number of retries (while still sleeping a bit on every invocation)

@fractaledmind
Copy link
Contributor Author

@oldmoe @suwyn: I have opened a PR to bring Mohammed's suggested approach into the active record-enhancedsqlite3-adapter gem that I maintain. We can test it in some applications using this gem for a bit before working to upstream this into Rails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants