New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry known idempotent SELECT queries on connection-related exceptions #51336
Retry known idempotent SELECT queries on connection-related exceptions #51336
Conversation
39a2400
to
4323c84
Compare
Test failures seem unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. I'm not sure if the by not retrying queries that have SQL literals we will have any practical effect with this PR.
Did you investigate how many queries Rails generates with SQLLiteral nodes that we know are retriable?
This looks fantastic, thank you! It'd be nice to have some test coverage at the Arel level, where we could just focus on "does visiting this node make us un-retryable?", and consequently be exhaustive without all the work of actually constructing queries and observing real reconnect/retries [in addition to having some of that full-stack coverage, as you do now]... but given we don't seem to have so-shaped tests for I have read through
I do hope that we can independently figure out anything that's making Trilogy connections more prone to errors that would benefit from retrying (#51335?) -- this improves our ability to retry when necessary, but that still feels more like an improved workaround than a proper solution to the problem you're describing in the changelog. Scrolling through, I'm spotting a couple of places where |
I think there are a few places we internally use |
4323c84
to
c5164f6
Compare
Done!
Agree, this mitigates the issue slightly but is ultimately a bandaid solution. It seems like the root of the issue is that it's easy for Trilogy to end up with a closed connection -- when an error (e.g. timeout) is encountered, in the case of infra blips or deploys, etc. We've generally been seeing connection errors pop up in long running background jobs, where Rails' ability to reconnect between requests / units of work doesn't help us. Mysql2 had an auto reconnect feature (though this appears to be deprecated for MySQL 8?), which was possibly why this wasn't a problem for us before.
Agree, I think a lot of the internal SQL literals that are generated can be marked as "known idempotent", but this likely requires introducing a new Arel node or a safe SQL literal variant. Still, I think a good portion of queries will benefit from the changes in this PR, so this feels like a good first step. We might also want to make updates / deletes retryable eventually.. |
7d9c15d
to
98de07f
Compare
Actually, let me whip up a safe-SqlLiteral variant as part of this PR. Should be straightforward. |
56915c3
to
f2a8bf9
Compare
Okay, I've added f2a8bf9. If we're okay merging this as a first pass, we can expand what we consider to be safe in future PRs. |
Test are failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates and deletes are not safely retryable, because if we don't know whether the first one occurred, we can't trust that e.g. we aren't deleting a row that someone else inserted in between.
@@ -93,7 +93,7 @@ def strict_loading? | |||
def append_constraints(connection, join, constraints) | |||
if join.is_a?(Arel::Nodes::StringJoin) | |||
join_string = Arel::Nodes::And.new(constraints.unshift join.left) | |||
join.left = Arel.sql(connection.visitor.compile(join_string)) | |||
join.left = Arel.sql(connection.visitor.compile(join_string), retryable: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is join_string
never app-supplied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT no, these are only constructed internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI when join.is_a?(Arel::Nodes::StringJoin)
, the join is a user supplied string.
e.g.
has_many :taggings_with_no_tag, -> { joins("LEFT OUTER JOIN tags ON tags.id = taggings.tag_id").where("tags.id": nil) }, as: :taggable, class_name: "Tagging" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks @kamipo , I was struggling to find which codepath would test this, and settled for trying to walk through the code to deduce whether this could come from the app. I'll cut a quick PR to fix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -446,7 +446,7 @@ def aggregate_column(column_name) | |||
return column_name if Arel::Expressions === column_name | |||
|
|||
arel_column(column_name.to_s) do |name| | |||
Arel.sql(column_name == :all ? "*" : name) | |||
Arel.sql(column_name == :all ? "*" : name, retryable: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure name
here is an arbitrary string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arf, yeah you're right, I thought we always resolved arel_column
from the columns hash on the class, but there's an else
branch where we just yield the field directly:
rails/activerecord/lib/active_record/relation/query_methods.rb
Lines 1863 to 1877 in 7c68c52
def arel_column(field) | |
field = klass.attribute_aliases[field] || field | |
from = from_clause.name || from_clause.value | |
if klass.columns_hash.key?(field) && (!from || table_name_matches?(from)) | |
table[field] | |
elsif field.match?(/\A\w+\.\w+\z/) | |
table, column = field.split(".") | |
predicate_builder.resolve_arel_attribute(table, column) do | |
lookup_table_klass_from_join_dependencies(table) | |
end | |
else | |
yield field | |
end | |
end |
@@ -3,7 +3,7 @@ | |||
module Arel # :nodoc: all | |||
module Collectors | |||
class Composite | |||
attr_accessor :preparable | |||
attr_accessor :preparable, :retryable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this (and preparable
) should propagate to the composed left
and right
collectors when assigned 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (retryable
only, don't want to mix changes to preparable
into this PR 😅 )
83d129b
to
7327f20
Compare
@matthewd thanks for flagging a couple codepaths where user-generated SQL could be passed through -- I've removed the retryable option from any such nodes. Let me know if you think this is missing anything else. |
This commit makes two types of queries retry-able by opting into our `allow_retry` flag: 1) SELECT queries we construct by walking the Arel tree via `#to_sql_and_binds`. We use a new `retryable` attribute on collector classes, which defaults to true for most node types, but will be set to false for non-idempotent node types (functions, SQL literals, etc). The `retryable` value is returned from `#to_sql_and_binds` and used by `#select_all` and passed down the call stack, eventually reaching the adapter's `#internal_exec_query` method. Internally-generated SQL literals are marked as retryable via a new `retryable` attribute on `Arel::Nodes::SqlLiteral`. 2) `#find` and `#find_by` queries with known attributes. We set `allow_retry: true` in `#cached_find_by`, and pass this down to `#find_by_sql` and `#_query_by_sql`. These changes ensure that queries we know are safe to retry can be retried automatically.
7327f20
to
eabcff2
Compare
This is great! Thanks again for pushing through the much more complicated approach! ❤️ It'll be interesting to see what portion of real-world queries this makes "naturally retryable". |
Motivation / Background
Take 2 of #51166, but rather than assuming that any SQL coming from the
#select
methods is safe to retry, we retry only queries we have constructed and thus know to be idempotent.Detail
This PR makes two types of queries retry-able by opting into our
allow_retry
flag:SELECT queries we construct by walking the Arel tree via
#to_sql_and_binds
. We use a newretryable
attribute on collector classes, which defaults to true for most node types, but will be set to false for non-idempotent node types (functions, SQL literals, update / delete / insert statements, etc). Theretryable
value is returned from#to_sql_and_binds
and used by#select_all
and passed down the call stack, eventually reaching the adapter's#internal_exec_query
method.#find
and#find_by
queries with known attributes. We setallow_retry: true
in#cached_find_by
, and pass this down to#find_by_sql
and#_query_by_sql
.These changes ensure that queries we know are safe to retry can be retried automatically.
Additional information
To verify:
BEGIN
.NamedFunction
andSqlLiteral
nodes to be non-idempotent, which likely excludes a lot of potential queries that would be retry-able, but this is the safest option for now.cached_find_by
always guaranteed to produce an idempotent query? This is the only way to ensurePost.find
/Post.find_by(title: "foo")
etc. are retryable, and I definitely think we should make these types of queries retryable.find_by
with a SQL fragment supplied as args will be flagged as non-retryable.cc @matthewd
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]