Defer verification of database connections #44576

matthewd · 2022-02-28T16:08:38Z

Instead of verifying upon checkout, wait until we run a query.

If the query we need to run is safely retryable (including schema inspection, and most notably BEGIN), we don't need to explicitly verify, and can just recover if the query fails.

By extension, we can apply the same connection-recovery logic to every retryable query we run -- so e.g. the beginning of any top-level transaction is a safe transparent DB-reconnection point.

matthewd · 2022-03-08T16:11:03Z

Still draft because tests are very much needed, but I think this is at least in a state for us to decide whether we're comfortable with the API changes I'm making, etc.

cc @yahonda as I can only request reviews from committers

byroot

Some minor "tactical" comments.

The overall approach seem good to me so far.

activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

activerecord/lib/active_record/connection_adapters/postgresql/quoting.rb

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

activerecord/lib/active_record/connection_adapters/sqlite3_adapter.rb

eileencodes

Left some comments, but this mostly looks good and correct to me. I am wondering if some of the refactoring changes can be pulled out into separate PRs so it's easier to read some of the bigger changes.

eileencodes · 2022-03-10T20:45:34Z

activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

@@ -145,6 +146,10 @@ def use_metadata_table?
        @config.fetch(:use_metadata_table, true)
      end

+      def connection_retries
+        (@config[:connection_retries] || 3).to_i


Is 3 a number you chose or was it previously a default somewhere? If previously retries were 0, is it surprising behavior or potentially bad to retry if that's unexpected?

My main concern is that this is defaulting to 3 and if someone wants to turn it off they need to know it exists AND that it's set to 3 by default.

Arbitrary choice.

On reflection I think I'll make it lower (1, maybe 2), but I do think it should be greater than zero: the whole idea is that this is a new better default behaviour, which is not believed to have any downsides for existing users. (There is no "review your code for X pattern before turning this on" sort of recommendation, so I don't think there's any value in applying it as a versioned default.)

Notably, we only retry on queries we're absolutely sure are safe (like "BEGIN"), or [in future] if we're absolutely sure the query didn't make it to the database server -- we'll never automatically retry something that has any chance of having executed on the server.

The ability to know the option exists and set it to zero is mostly just as a last-ditch safety mechanism, ensuring there is a workaround if some subset of users discover an unanticipated flaw in the logic that claims this is universally safe.

eileencodes · 2022-03-28T15:19:11Z

activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

+        # In the latter case, we really ought to guarantee the inner
+        # call will not reconnect (which would interfere with the
+        # still-yielded connection in the outer block), but we currently
+        # provide no special enforcement there.


This commentary reads like it's unsure what will really happen or that Rails should be doing more and isn't. Can you clarify this paragraph?

Yeah, it's an (internal, implementation-scoped) comment noting a limitation of the current implementation. My intention was basically to acknowledge there's an obscure corner case that we're not handling; if a future reader is investigating an issue of that shape [because it's more likely than I expected, or because of other implementation changes], it's a helpful breadcrumb that they're on the right trail. I'll try to rephrase.

eileencodes · 2022-03-28T15:25:36Z

activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

+            else
+              # We don't know whether the connection is okay, but it also doesn't matter:
+              # we wouldn't be able to reconnect anyway. We're just going to run our query
+              # and hope for the best.


I think you did this so it's easy to comment but having 2 empty conditions feels odd and I think some of the comments aren't necessary. I think this can be simplified to the following:

if reconnectable unless allow_retry verify! end end

Am I missing something?

Yes, it's verify! unless @verified || !reconnectable || allow_retry... it just seemed important to me to spell out the three very different reasons we might choose not to verify.

eileencodes · 2022-03-28T15:28:22Z

activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

+        # Returns a raw connection for internal use with methods that are known
+        # to both be thread-safe and not rely upon actual server communication.
+        # This is useful for e.g. string escaping methods.
+        def any_raw_connection


I don't love this name because it feels like "get anything, literally any raw connection" but it's a specific raw connection right? I don't have a better idea other than continuing to use @raw_connection instead.

This is partly a bleed-forward from #44591, where it becomes @raw_connection || connect. Adding it here, with its more defined cousins valid_raw_connection and with_raw_connection, helps in auditing existing users of @raw_connection: most of them should be [and in this PR, are] using with_raw_connection, but completing the set forces us to review each one, and where appropriate, explicitly opt in to this "I'll take absolutely anything you can offer" behaviour.

Contrasted with "valid", I think "anything, literally any" is a pretty fair representation of how this qualifies what it's returning. "A connection you already know is invalid" is rather extreme... I did briefly consider calling this even_an_invalid_raw_connection, but that felt a bit too pessimistic. (It's still an instance method, and IMO that scoping is sufficient to convey that it's still only our raw connection, just in any possible state.)

eileencodes · 2022-03-28T15:30:29Z

activerecord/lib/active_record/connection_adapters/mysql/database_statements.rb

@@ -109,7 +112,7 @@ def default_insert_value(column)
          end

          def last_inserted_id(result)
-            @raw_connection.last_id
+            @raw_connection&.last_id


So before we had a guaranteed raw_connection but now we might not have one? Does this change behavior of last_inserted_id?

I think this is technically a tiny bleed-over of #44591 too -- this change alone doesn't introduce any occasions where @raw_connection can be nil. This is a notable occasion where we do want to still use the raw ivar, though, and not [the upcoming definition of] any_raw_connection: if we don't currently have a connection, there's no point establishing one just to ask what it last inserted, because the answer will definitely be nil.

This is not a behaviour change: a connection that hasn't yet performed an insert will have a nil last_id.

eileencodes · 2022-03-28T15:32:14Z

activerecord/lib/active_record/connection_adapters/sqlite3/database_statements.rb

          mark_transaction_written_if_write(sql)

          type_casted_binds = type_casted_binds(binds)

          log(sql, name, binds, type_casted_binds, async: async) do
-            ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
+            with_raw_connection do |conn|


I like this new method, much easier to understand what's going on in the block.

matthewd · 2022-03-31T12:44:20Z

I am wondering if some of the refactoring changes can be pulled out into separate PRs

I think I already pulled out the more cleanly separable changes into preceding PRs (#44526 #44527 #44530 ~~#44531~~ #44570)... I believe the remainder are all dependent on the key behaviour of with_raw_connection. Which did you have in mind?

So far this is just infrastructure: no errors are considered retryable, and no real queries are marked retryable. Even in this state, we can drop connection verification from checkout, and defer it until the first query is ready. But the more interesting within-reach win will be when 'BEGIN' is retryable, and we can recognise connectivity-related exceptions.

We're still ultimately accessing the raw connection directly, but this is necessary to keep the statement pool working across full reconnects.

We may be restoring a connection after #with_raw_connection has already passed #materialize_transactions, so it's important we eagerly return things to exactly the state they had before.

Connectivity issues need a reconnect to be retried, but there are also in-server errors that can also be retried (lock timeouts and deadlocks). To handle this better, we move translation into #with_raw_connection: it's a much better representation of the boundary between the underlying adapter library and Active Record's abstractions anyway. (It doesn't know about the query being run, though, so we let #log enrich that into the exception later.)

Building on the work done in rails#44576 and rails#44591, we extend the logic that automatically reconnects broken db connections to take into account a timeout limit. This ensures that retries + reconnects are slow-query aware, and that we don't retry queries if a given amount of time has already passed since the query was first tried. This value will default to 5 seconds, but can be adjusted via the `connection_retry_timeout` config.

This works with either `adapter.materialize_transactions` before the savepoint creation or by skipping the savepoint entirely and reverting the migration by raising `ActiveRecord::Rollback` Close #179 Ref: rails/rails@57bc28f rails/rails#44576

Support rails/rails#44576

xiankaing · 2023-10-27T23:45:06Z

thank you!

@lock

We can't call @raw_connection right away anymore, because it is nil until it is used after the changes in rails/rails#44576. Also we need to have @lock established in order to be able to use `with_raw_connection`. In the abstract_adapter implementation, that happens after the arel_visitor is set, but in oracle enhanced, we need to use `with_raw_connection` to check the database version in order to pick with visitor to use. Calling lock_thread = nil before super means @lock isi available to set the arel_visitor. PR#44576 also requires an internal_exec_query method that is called by the abstract adpater's exec_query. Naively renaming the oracle-enhanced's exec_query to internal_exec_query makes the test pass, but there is probably more to be done here.

rails/rails#44576 adds `with_raw_connection`. Using that in database_statements instead of @raw_connection mostly works, except it broke 2 tests: * ["should reconnect and execute query if connection is lost and auto retry is enabled"](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L493) * ["among other things will reconnect and retry if there is a connection error.](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L500) `with_raw_connection` has a retry and reconnect built into it, but it was catching the oracle connection error. Adjust `retryable_connection_error?` so that both is aware of the OCI errors and reflects the `should_retry?` logic of the `with_retry` that database_statements used before.

@lock

We can't call @raw_connection right away anymore, because it is nil until it is used after the changes in rails/rails#44576. Also we need to have @lock established in order to be able to use `with_raw_connection`. In the abstract_adapter implementation, that happens after the arel_visitor is set, but in oracle enhanced, we need to use `with_raw_connection` to check the database version in order to pick with visitor to use. Calling lock_thread = nil before super means @lock isi available to set the arel_visitor. Rails rails/rails#48188 also requires an internal_exec_query method that is called by the abstract adpater's exec_query. Naively renaming the oracle-enhanced's exec_query to internal_exec_query makes the test run, but there is probably more to be done here.

@lock

We can't call @raw_connection right away anymore, because it is nil until it is used after the changes in rails/rails#44576 and rails/rails#44591. Also we need to have @lock established in order to be able to use `with_raw_connection`. In the abstract_adapter implementation, that happens after the arel_visitor is set, but in oracle enhanced, we need to use `with_raw_connection` to check the database version in order to pick with visitor to use. Calling lock_thread = nil before super means @lock isi available to set the arel_visitor. Rails rails/rails#48188 also requires an internal_exec_query method that is called by the abstract adpater's exec_query. Naively renaming the oracle-enhanced's exec_query to internal_exec_query makes the oracle enhanced tests run. I also saw errors due to `active?` using @raw_connection. `valid_raw_connection` ends up calling `active?` via `with_raw_connection` and `verify` so that can't be used as a replacement for @raw_connection here. Borrowing from the postgres adapter implementation to get around this.

@lock

We can't call @raw_connection right away anymore, because it is nil until it is used after the changes in rails/rails#44576 and rails/rails#44591. Also we need to have @lock established in order to be able to use `with_raw_connection`. In the abstract_adapter implementation, that happens after the arel_visitor is set, but in oracle enhanced, we need to use `with_raw_connection` to check the database version in order to pick with visitor to use. Calling lock_thread = nil before super means @lock isi available to set the arel_visitor. Rails rails/rails#48229 also requires an internal_exec_query method that is called by the abstract adpater's exec_query. Naively renaming the oracle-enhanced's exec_query to internal_exec_query makes the oracle enhanced tests run. I also saw errors due to `active?` using @raw_connection. `valid_raw_connection` ends up calling `active?` via `with_raw_connection` and `verify` so that can't be used as a replacement for @raw_connection here. Borrowing from the postgres adapter implementation to get around this.

@lock

We can't call @raw_connection right away anymore, because it is nil until it is used after the changes in rails/rails#44576. Also we need to have @lock established in order to be able to use `with_raw_connection`. In the abstract_adapter implementation, that happens after the arel_visitor is set, but in oracle enhanced, we need to use `with_raw_connection` to check the database version in order to pick with visitor to use. Calling lock_thread = nil before super means @lock isi available to set the arel_visitor. PR#44576 also requires an internal_exec_query method that is called by the abstract adpater's exec_query. Naively renaming the oracle-enhanced's exec_query to internal_exec_query makes the test pass, but there is probably more to be done here.

rails/rails#44576 adds `with_raw_connection`. Using that in database_statements instead of @raw_connection mostly works, except it broke 2 tests: * ["should reconnect and execute query if connection is lost and auto retry is enabled"](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L493) * ["among other things will reconnect and retry if there is a connection error.](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L500) `with_raw_connection` has a retry and reconnect built into it, but it was catching the oracle connection error. Adjust `retryable_connection_error?` so that both is aware of the OCI errors and reflects the `should_retry?` logic of the `with_retry` that database_statements used before.

rails-bot bot added the activerecord label Feb 28, 2022

matthewd force-pushed the defer-db-verify branch from 43e448f to 80c7ca8 Compare March 1, 2022 18:37

matthewd mentioned this pull request Mar 1, 2022

Simplify adapter construction; defer connect until first use #44591

Merged

Base automatically changed from restore-transactions to main March 3, 2022 17:10

matthewd force-pushed the defer-db-verify branch 2 times, most recently from 4a45007 to e02e06e Compare March 8, 2022 15:45

matthewd requested review from byroot, eileencodes, jhawthorn and kamipo March 8, 2022 16:08

byroot reviewed Mar 8, 2022

View reviewed changes

matthewd force-pushed the defer-db-verify branch 2 times, most recently from 910340d to 8ff11be Compare March 23, 2022 13:07

matthewd marked this pull request as ready for review March 23, 2022 13:27

eileencodes reviewed Mar 28, 2022

View reviewed changes

matthewd force-pushed the defer-db-verify branch from 8ff11be to 1c7f5dc Compare March 31, 2022 13:33

matthewd mentioned this pull request Apr 26, 2022

Savepoints are not used anymore on main #44955

Closed

matthewd force-pushed the defer-db-verify branch from 1c7f5dc to e09e1f7 Compare June 2, 2022 11:56

matthewd mentioned this pull request Jun 15, 2022

Perform a full reconnect on PG::UnableToSend during reconnect! #45022

Open

matthewd force-pushed the defer-db-verify branch 4 times, most recently from bc93b3a to dfd9f4e Compare July 15, 2022 20:21

matthewd added 5 commits July 26, 2022 22:07

Switch Postgres statement pool to use the AR connection

35d3ae3

We're still ultimately accessing the raw connection directly, but this is necessary to keep the statement pool working across full reconnects.

Default most internal/schema queries to retryable

129bfa1

Restore transactions in materialized state where appropriate

b9f44e6

We may be restoring a connection after #with_raw_connection has already passed #materialize_transactions, so it's important we eagerly return things to exactly the state they had before.

This was referenced Sep 22, 2022

Reconnect and retry on connection errors in #quote_string adrianna-chang-shopify/rails#1

Closed

Use #with_raw_connection in #quote_string to retry connection errors #46108

Merged

tagliala mentioned this pull request Oct 1, 2022

Add Rails 7.1 edge support ifad/chronomodel#179

Closed

yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request Mar 20, 2023

Use rails/rails#44576 tentatively

d28329b

yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request Apr 11, 2023

Use rails/rails#44576 tentatively

5467f22

yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request May 2, 2023

Use rails/rails#44576 tentatively

edccc3c

yahonda mentioned this pull request May 2, 2023

Support rails/rails#44576 rsim/oracle-enhanced#2318

Merged

yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request May 8, 2023

Use Rails main branch as of rails/rails#44576

f05d1de

yahonda added a commit to rsim/oracle-enhanced that referenced this pull request May 8, 2023

Merge pull request #2318 from yahonda/diag2306

3e2ea32

Support rails/rails#44576

matthewd mentioned this pull request Nov 7, 2023

Prevent marking broken connections as verified #49811

Merged

4 tasks

zvkemp mentioned this pull request Jan 3, 2024

Incorrect notification order in ActiveRecord transactions #50562

Open

davinlagerroos mentioned this pull request Feb 14, 2024

Updates to run tests with activerecord 7.1 rsim/oracle-enhanced#2359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer verification of database connections #44576

Defer verification of database connections #44576

matthewd commented Feb 28, 2022

matthewd commented Mar 8, 2022

byroot left a comment

eileencodes left a comment

eileencodes Mar 10, 2022

matthewd Mar 31, 2022

eileencodes Mar 28, 2022

matthewd Mar 31, 2022

eileencodes Mar 28, 2022

matthewd Mar 31, 2022

eileencodes Mar 28, 2022

matthewd Mar 31, 2022

eileencodes Mar 28, 2022

matthewd Mar 31, 2022

eileencodes Mar 28, 2022

matthewd commented Mar 31, 2022

xiankaing commented Oct 27, 2023

Defer verification of database connections #44576

Defer verification of database connections #44576

Conversation

matthewd commented Feb 28, 2022

matthewd commented Mar 8, 2022

byroot left a comment

Choose a reason for hiding this comment

eileencodes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewd commented Mar 31, 2022

xiankaing commented Oct 27, 2023