Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer verification of database connections #44576

Merged
merged 9 commits into from
Jul 29, 2022
Merged

Defer verification of database connections #44576

merged 9 commits into from
Jul 29, 2022

Conversation

matthewd
Copy link
Member

Instead of verifying upon checkout, wait until we run a query.

If the query we need to run is safely retryable (including schema inspection, and most notably BEGIN), we don't need to explicitly verify, and can just recover if the query fails.

By extension, we can apply the same connection-recovery logic to every retryable query we run -- so e.g. the beginning of any top-level transaction is a safe transparent DB-reconnection point.

Base automatically changed from restore-transactions to main March 3, 2022 17:10
@matthewd matthewd force-pushed the defer-db-verify branch 2 times, most recently from 4a45007 to e02e06e Compare March 8, 2022 15:45
@matthewd
Copy link
Member Author

matthewd commented Mar 8, 2022

Still draft because tests are very much needed, but I think this is at least in a state for us to decide whether we're comfortable with the API changes I'm making, etc.

cc @yahonda as I can only request reviews from committers

Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor "tactical" comments.

The overall approach seem good to me so far.

@matthewd matthewd force-pushed the defer-db-verify branch 2 times, most recently from 910340d to 8ff11be Compare March 23, 2022 13:07
@matthewd matthewd marked this pull request as ready for review March 23, 2022 13:27
Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but this mostly looks good and correct to me. I am wondering if some of the refactoring changes can be pulled out into separate PRs so it's easier to read some of the bigger changes.

@@ -145,6 +146,10 @@ def use_metadata_table?
@config.fetch(:use_metadata_table, true)
end

def connection_retries
(@config[:connection_retries] || 3).to_i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 3 a number you chose or was it previously a default somewhere? If previously retries were 0, is it surprising behavior or potentially bad to retry if that's unexpected?

My main concern is that this is defaulting to 3 and if someone wants to turn it off they need to know it exists AND that it's set to 3 by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary choice.

On reflection I think I'll make it lower (1, maybe 2), but I do think it should be greater than zero: the whole idea is that this is a new better default behaviour, which is not believed to have any downsides for existing users. (There is no "review your code for X pattern before turning this on" sort of recommendation, so I don't think there's any value in applying it as a versioned default.)

Notably, we only retry on queries we're absolutely sure are safe (like "BEGIN"), or [in future] if we're absolutely sure the query didn't make it to the database server -- we'll never automatically retry something that has any chance of having executed on the server.

The ability to know the option exists and set it to zero is mostly just as a last-ditch safety mechanism, ensuring there is a workaround if some subset of users discover an unanticipated flaw in the logic that claims this is universally safe.

# In the latter case, we really ought to guarantee the inner
# call will not reconnect (which would interfere with the
# still-yielded connection in the outer block), but we currently
# provide no special enforcement there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commentary reads like it's unsure what will really happen or that Rails should be doing more and isn't. Can you clarify this paragraph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's an (internal, implementation-scoped) comment noting a limitation of the current implementation. My intention was basically to acknowledge there's an obscure corner case that we're not handling; if a future reader is investigating an issue of that shape [because it's more likely than I expected, or because of other implementation changes], it's a helpful breadcrumb that they're on the right trail. I'll try to rephrase.

else
# We don't know whether the connection is okay, but it also doesn't matter:
# we wouldn't be able to reconnect anyway. We're just going to run our query
# and hope for the best.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you did this so it's easy to comment but having 2 empty conditions feels odd and I think some of the comments aren't necessary. I think this can be simplified to the following:

if reconnectable
  unless allow_retry
     verify!
  end
end

Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's verify! unless @verified || !reconnectable || allow_retry... it just seemed important to me to spell out the three very different reasons we might choose not to verify.

# Returns a raw connection for internal use with methods that are known
# to both be thread-safe and not rely upon actual server communication.
# This is useful for e.g. string escaping methods.
def any_raw_connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this name because it feels like "get anything, literally any raw connection" but it's a specific raw connection right? I don't have a better idea other than continuing to use @raw_connection instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partly a bleed-forward from #44591, where it becomes @raw_connection || connect. Adding it here, with its more defined cousins valid_raw_connection and with_raw_connection, helps in auditing existing users of @raw_connection: most of them should be [and in this PR, are] using with_raw_connection, but completing the set forces us to review each one, and where appropriate, explicitly opt in to this "I'll take absolutely anything you can offer" behaviour.

Contrasted with "valid", I think "anything, literally any" is a pretty fair representation of how this qualifies what it's returning. "A connection you already know is invalid" is rather extreme... I did briefly consider calling this even_an_invalid_raw_connection, but that felt a bit too pessimistic. (It's still an instance method, and IMO that scoping is sufficient to convey that it's still only our raw connection, just in any possible state.)

@@ -109,7 +112,7 @@ def default_insert_value(column)
end

def last_inserted_id(result)
@raw_connection.last_id
@raw_connection&.last_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before we had a guaranteed raw_connection but now we might not have one? Does this change behavior of last_inserted_id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically a tiny bleed-over of #44591 too -- this change alone doesn't introduce any occasions where @raw_connection can be nil. This is a notable occasion where we do want to still use the raw ivar, though, and not [the upcoming definition of] any_raw_connection: if we don't currently have a connection, there's no point establishing one just to ask what it last inserted, because the answer will definitely be nil.

This is not a behaviour change: a connection that hasn't yet performed an insert will have a nil last_id.

mark_transaction_written_if_write(sql)

type_casted_binds = type_casted_binds(binds)

log(sql, name, binds, type_casted_binds, async: async) do
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
with_raw_connection do |conn|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this new method, much easier to understand what's going on in the block.

@matthewd
Copy link
Member Author

I am wondering if some of the refactoring changes can be pulled out into separate PRs

I think I already pulled out the more cleanly separable changes into preceding PRs (#44526 #44527 #44530 #44531 #44570)... I believe the remainder are all dependent on the key behaviour of with_raw_connection. Which did you have in mind?

So far this is just infrastructure: no errors are considered retryable,
and no real queries are marked retryable.

Even in this state, we can drop connection verification from checkout,
and defer it until the first query is ready. But the more interesting
within-reach win will be when 'BEGIN' is retryable, and we can recognise
connectivity-related exceptions.
We're still ultimately accessing the raw connection directly, but this
is necessary to keep the statement pool working across full reconnects.
We may be restoring a connection after #with_raw_connection has already
passed #materialize_transactions, so it's important we eagerly return
things to exactly the state they had before.
Connectivity issues need a reconnect to be retried, but there are also
in-server errors that can also be retried (lock timeouts and deadlocks).

To handle this better, we move translation into #with_raw_connection:
it's a much better representation of the boundary between the underlying
adapter library and Active Record's abstractions anyway. (It doesn't
know about the query being run, though, so we let #log enrich that into
the exception later.)
adrianna-chang-shopify added a commit to adrianna-chang-shopify/rails that referenced this pull request Sep 22, 2022
Building on the work done in rails#44576 and rails#44591, we extend the logic that automatically
reconnects broken db connections to take into account a timeout limit. This ensures
that retries + reconnects are slow-query aware, and that we don't retry queries
if a given amount of time has already passed since the query was first tried.

This value will default to 5 seconds, but can be adjusted via the `connection_retry_timeout`
config.
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 1, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 2, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 2, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 2, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 3, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 8, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 8, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
tagliala added a commit to ifad/chronomodel that referenced this pull request Oct 8, 2022
This works with either `adapter.materialize_transactions` before
the savepoint creation or by skipping the savepoint entirely and
reverting the migration by raising `ActiveRecord::Rollback`

Close #179

Ref: rails/rails@57bc28f rails/rails#44576
yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request Mar 20, 2023
yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request Apr 11, 2023
yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request May 2, 2023
yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request May 8, 2023
yahonda added a commit to rsim/oracle-enhanced that referenced this pull request May 8, 2023
@xiankaing
Copy link

thank you!

davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 14, 2024
We can't call @raw_connection right away anymore, because it is nil until
it is used after the changes in rails/rails#44576.

Also we need to have @lock established in order to be able to use
`with_raw_connection`. In the abstract_adapter implementation, that happens
after the arel_visitor is set, but in oracle enhanced, we need to use
`with_raw_connection` to check the database version in order to pick with
visitor to use. Calling lock_thread = nil before super means @lock isi
available to set the arel_visitor.

PR#44576 also requires an internal_exec_query method that is called by the
abstract adpater's exec_query. Naively renaming the oracle-enhanced's
exec_query to internal_exec_query makes the test pass, but there is probably
more to be done here.
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 14, 2024
rails/rails#44576 adds `with_raw_connection`. Using
that in database_statements instead of @raw_connection mostly works, except
 it broke 2 tests:
* ["should reconnect and execute query if connection is lost and auto retry is enabled"](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L493)
* ["among other things will reconnect and retry if there is a connection error.](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L500)

`with_raw_connection` has a retry and reconnect built into it, but it was
catching the oracle connection error. Adjust `retryable_connection_error?`
so that both is aware of the OCI errors and reflects the `should_retry?`
logic of the `with_retry` that database_statements used before.
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 14, 2024
We can't call @raw_connection right away anymore, because it is nil until
it is used after the changes in rails/rails#44576.

Also we need to have @lock established in order to be able to use
`with_raw_connection`. In the abstract_adapter implementation, that happens
after the arel_visitor is set, but in oracle enhanced, we need to use
`with_raw_connection` to check the database version in order to pick with
visitor to use. Calling lock_thread = nil before super means @lock isi
available to set the arel_visitor.

Rails rails/rails#48188 also requires an internal_exec_query
method that is called by the abstract adpater's exec_query. Naively renaming
the oracle-enhanced's exec_query to internal_exec_query makes the test run,
but there is probably more to be done here.
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 14, 2024
We can't call @raw_connection right away anymore, because it is nil until
it is used after the changes in rails/rails#44576
and rails/rails#44591.

Also we need to have @lock established in order to be able to use
`with_raw_connection`. In the abstract_adapter implementation, that happens
after the arel_visitor is set, but in oracle enhanced, we need to use
`with_raw_connection` to check the database version in order to pick with
visitor to use. Calling lock_thread = nil before super means @lock isi
available to set the arel_visitor.

Rails rails/rails#48188 also requires an internal_exec_query
method that is called by the abstract adpater's exec_query. Naively renaming
the oracle-enhanced's exec_query to internal_exec_query makes the oracle
enhanced tests run.

I also saw errors due to `active?` using @raw_connection. `valid_raw_connection`
ends up calling `active?` via `with_raw_connection` and `verify` so that
can't be used as a replacement for @raw_connection here. Borrowing from
the postgres adapter implementation to get around this.
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 14, 2024
We can't call @raw_connection right away anymore, because it is nil until
it is used after the changes in rails/rails#44576
and rails/rails#44591.

Also we need to have @lock established in order to be able to use
`with_raw_connection`. In the abstract_adapter implementation, that happens
after the arel_visitor is set, but in oracle enhanced, we need to use
`with_raw_connection` to check the database version in order to pick with
visitor to use. Calling lock_thread = nil before super means @lock isi
available to set the arel_visitor.

Rails rails/rails#48229 also requires an internal_exec_query
method that is called by the abstract adpater's exec_query. Naively renaming
the oracle-enhanced's exec_query to internal_exec_query makes the oracle
enhanced tests run.

I also saw errors due to `active?` using @raw_connection. `valid_raw_connection`
ends up calling `active?` via `with_raw_connection` and `verify` so that
can't be used as a replacement for @raw_connection here. Borrowing from
the postgres adapter implementation to get around this.
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 21, 2024
We can't call @raw_connection right away anymore, because it is nil until
it is used after the changes in rails/rails#44576.

Also we need to have @lock established in order to be able to use
`with_raw_connection`. In the abstract_adapter implementation, that happens
after the arel_visitor is set, but in oracle enhanced, we need to use
`with_raw_connection` to check the database version in order to pick with
visitor to use. Calling lock_thread = nil before super means @lock isi
available to set the arel_visitor.

PR#44576 also requires an internal_exec_query method that is called by the
abstract adpater's exec_query. Naively renaming the oracle-enhanced's
exec_query to internal_exec_query makes the test pass, but there is probably
more to be done here.
davinlagerroos pushed a commit to umn-asr/oracle-enhanced that referenced this pull request Feb 21, 2024
rails/rails#44576 adds `with_raw_connection`. Using
that in database_statements instead of @raw_connection mostly works, except
 it broke 2 tests:
* ["should reconnect and execute query if connection is lost and auto retry is enabled"](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L493)
* ["among other things will reconnect and retry if there is a connection error.](https://github.com/rsim/oracle-enhanced/blob/deb214decc3799608c8be386e91c6c7531c59793/spec/active_record/connection_adapters/oracle_enhanced/connection_spec.rb#L500)

`with_raw_connection` has a retry and reconnect built into it, but it was
catching the oracle connection error. Adjust `retryable_connection_error?`
so that both is aware of the OCI errors and reflects the `should_retry?`
logic of the `with_retry` that database_statements used before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants