Skip to content

ActiveRecord::ConnectionFailed not caught by the failsafe (terminated connection surfaces as an error) #307

@navidemad

Description

@navidemad

Summary

SolidCache::Store::Failsafe::TRANSIENT_ACTIVE_RECORD_ERRORS rescues ActiveRecord::ConnectionNotEstablished but not ActiveRecord::ConnectionFailed. When the cache database connection is terminated mid-transaction, Solid Cache raises ConnectionFailed instead of degrading to a cache miss, so the error propagates to the caller (a 500 on a web request, for example).

How we hit it

We run Solid Cache on a dedicated PostgreSQL database and set idle_in_transaction_session_timeout on that connection to bound row-lock contention on solid_cache_entries. A hot key_hash gets held by Entry.lock_and_write while the Ruby block runs, and under load the holding thread can stay idle-in-transaction long enough for PostgreSQL to terminate the backend:

PG::ConnectionBad: PQconsumeInput() FATAL: terminating connection due to idle-in-transaction timeout

The write that follows the SELECT ... FOR UPDATE inside lock_and_write then raises ActiveRecord::ConnectionFailed. That class is a subclass of ActiveRecord::StatementInvalid (added in Rails 7.1), not of ActiveRecord::ConnectionNotEstablished, so the current failsafe list does not catch it.

Reproduction

conn = SolidCache::Record.connection
conn.execute("SET idle_in_transaction_session_timeout = '300ms'")

SolidCache::Entry.lock_and_write("probe") do |_value|
  sleep 1   # transaction stays idle > 300ms -> PostgreSQL kills the session
  "v"
end
# => ActiveRecord::ConnectionFailed: ... terminating connection due to idle-in-transaction timeout

SolidCache::Store::Failsafe::TRANSIENT_ACTIVE_RECORD_ERRORS.any? { |k| error.is_a?(k) }
# => false

Proposal

Add ActiveRecord::ConnectionFailed to TRANSIENT_ACTIVE_RECORD_ERRORS. A terminated connection is the kind of transient failure the cache should degrade on, consistent with the existing ConnectionNotEstablished entry.

TRANSIENT_ACTIVE_RECORD_ERRORS = [
  ActiveRecord::AdapterTimeout,
  ActiveRecord::ConnectionFailed,         # added
  ActiveRecord::ConnectionNotEstablished,
  ActiveRecord::Deadlocked,
  ActiveRecord::LockWaitTimeout,
  ActiveRecord::QueryCanceled,
  ActiveRecord::StatementTimeout
]

ActiveRecord::ConnectionFailed exists in Rails 7.1+, which is within Solid Cache's supported range, so referencing it directly is safe.

Happy to send a PR with a test if this looks right.

Environment: solid_cache 1.0.10, Rails 8.1, PostgreSQL 17.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions