New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid coercing all SELECTs with the same column name to the same value #36186
Conversation
50e3d69
to
030bf80
Compare
030bf80
to
c9c6ec4
Compare
Just some suggestions after quick review:
|
49f1f16
to
4c98921
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
4c98921
to
9fd7b62
Compare
9fd7b62
to
7e0e269
Compare
df61f33
to
7270b6c
Compare
7270b6c
to
3f05186
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
3f05186
to
af637b2
Compare
@kamipo, could I get 👀 on this PR? I know you've got a strong familiarity with ActiveRecord and Postgres. I'm not sure if mentioning you is the right protocol — sorry if it's not 😬. (I'm not quite sure how to promote my PR and solicit a review.) 🙏 Thanks! |
@@ -34,13 +34,13 @@ module ActiveRecord | |||
class Result | |||
include Enumerable | |||
|
|||
attr_reader :columns, :rows, :column_types | |||
attr_reader :columns, :rows, :types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a strong reason to rename the interface/instance variable to types
. This can break something that's not covered by tests, e.g. the initialize_copy
is now broken because it only copies @column_type
but not @types
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Good callout re: initialize_copy
! I'll audit uses of the instance variable in this class.
I wouldn't expect the instance variable to be part of this classes interface, though, and I did implement column_types
(below) to maintain the classes interface.
My concern with simply reusing @column_types
for the new third argument to initialize
is that the type of that instance variable has changed (from a dictionary of types correlated to columns by name to an array of types correlated to columns by position).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize_copy
appears to be the only place where @column_types
was referenced — I made that change! 👍 ✅
Fixes rails#36042 Rails had been mapping types to columns in a Result Set by indexing types on columns' names. When two values in a Result Set have the same column name, they are forced to share a type. There are several scenarios where multiple values may share the same column name. 1. One (where two tables happen to have columns with the same name but the column's values differ) is outlined in rails#36042. 2. Another is when using functions in a `.select` or `.pluck`. In the absence of an Alias clause, Postgres names values with the last function called, so if two values in the same query happen to use `COALESCE`, they'll both be named `"coalesce"`. ```ruby ActiveRecord::Base.pluck( Arel.sql("COALESCE(NULL, 'four')"), Arel.sql("COALESCE(NULL, 4)") ) ``` This commit matches `types` with `columns` and `values` (in a row) by array position (which is an assumption already made in `PostgreSQL::DatabaseStatements#exec_query`). It adds `column_types` as a public method on `ActiveRecord::Result` to preserve the class's public API. Does Rails expect client code to be constructing `ActiveRecord::Result` objects? Do we need to check whether its third argument is a `Hash`, give — maybe — a deprecation warning and convert it to an `Array`?
15fe337
to
20f3f0e
Compare
|
||
def column_types | ||
if @types.present? | ||
Hash[@columns.zip(@types)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is sometimes called inside a loop:
calculated_data.column_types.fetch(aliaz, Type.default_value) |
The result should be memoised so that it will only be calculated once, as it was before this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth roflscaling this a bit, similar to hash_rows
below, since this implementation allocates more intermediate objects than before (a two-element array for each column, and an array to hold them all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @st0012 was suggesting it earlier, I roflscaled this by assigning @column_types
in initialize
fields = result.fields | ||
fields.each_with_index do |fname, i| | ||
ftype = result.ftype i | ||
fmod = result.fmod i | ||
types[fname] = get_oid_type(ftype, fmod, fname) | ||
types << get_oid_type(ftype, fmod, fname) | ||
end | ||
build_result(columns: fields, rows: result.values, column_types: types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this column_types:
parameter should be renamed to types:
.
The default value is now of the wrong type too:
def build_result(columns:, rows:, column_types: {}) |
Actually this doesn't fix #36042, I'm not excited to change the |
916d46b
to
f4630d8
Compare
I know this issue affects only for PostgreSQL adapter (originally works for other MySQL, SQLite3, Oracle, and SQL Server adapters), I'm still not excited extra allocation for all cases to address such a minor case. Like as a workaround for #36042, is it not enough to give a column alias? relation.pluck(Arel.sql("COALESCE(format, 'unspecified') a, COALESCE(nullable_status, 0) b")) |
Yep! Using an alias is a totally viable workaround! (When I first made this PR, my team also went that route: boblail/pluck_map#14) So if you want to decline this PR, we do have an acceptable path forward. I just think that the behavior is surprising — especially since you get different results from MySQL and Sqlite3. For the developer troubleshooting the issue, the root cause is not immediately obvious!! Only a developer who has spelunked in Rails' source would be able to tell that Rails keys type information on name and Postgres returns more-generic names than the other two databases:
Do you think it would make sense for Rails to raise an exception if an OR — the extra allocation you're referring to — is that the Hash of (Thanks for the feedback!! 😄) |
As @boblail mentioned (and I agree) ActiveRecord::Base.pluck(
Arel.sql("COALESCE(NULL, 'four')"),
Arel.sql("COALESCE(NULL, 4)")
) # => [[0, 4]] is not expected result. Anyway I agree with @kamipo this is big change to handle this (probably minor) problem. It would be nice to at least mention this in documentation or raise on unsupported (conflicting) input to not surprise developers (and keep the Rails experience nice). |
`column_types` is empty except PostgreSQL adapter, and `attribute_types.each_key { |k| column_types.delete k }` is also empty even if PostgreSQL adapter almost all case, so that code is quite useless. This improves performance for `find_by_sql` to avoid that useless loop as much as possible. ```ruby ActiveRecord::Schema.define do create_table :active_storage_blobs do |t| t.string :key, null: false t.string :filename, null: false t.string :content_type t.text :metadata t.string :service_name, null: false t.bigint :byte_size, null: false t.string :checksum, null: false t.datetime :created_at, null: false t.index [ :key ], unique: true end end class ActiveStorageBlob < ActiveRecord::Base end Benchmark.ips do |x| x.report("find_by") { ActiveStorageBlob.find_by(id: 1) } end ``` Before: ``` Warming up -------------------------------------- find_by 1.256k i/100ms Calculating ------------------------------------- find_by 12.595k (± 3.4%) i/s - 64.056k in 5.091599s ``` After: ``` Warming up -------------------------------------- find_by 1.341k i/100ms Calculating ------------------------------------- find_by 13.170k (± 3.5%) i/s - 67.050k in 5.097439s ``` To avoid column types loop for PostgreSQL adapter, this skips returning additional column types if a column has already been type casted by pg decoders. Fortunately this fixes rails#36186 partly for common types.
Overview of Issue
Rails had been mapping types to columns in a Result Set by indexing types on columns' names. When two values in a Result Set have the same column name, they are forced to share a type.
There are several scenarios where multiple values may share the same column name.
One involves
One scenario (where two tables happen to have columns with the same name but the column's values differ) is explained in Plucking associated data from an enum column returns nil #36042. This PR does not resolve the issue in Plucking associated data from an enum column returns nil #36042 but proved part of a solution to that issue.
Another is when using functions in a
.select
or.pluck
. In the absence of an Alias clause, Postgres names values with the last function called, so if two values in the same query happen to useCOALESCE
, they'll both be named"coalesce"
.I was able to reproduce this on Rails 4.2.11.1, 5.0.7.2, 5.1.7, 5.2.3, and master.
Solution
This commit matches
types
withcolumns
andvalues
(in a row) by array position (which is an assumption already made inPostgreSQL::DatabaseStatements#exec_query
). It addscolumn_types
as a public method onActiveRecord::Result
to preserve the class's public API.Concerns
Does Rails expect client code to be constructing
ActiveRecord::Result
objects? Do we need to check whether its third argument is aHash
, give — maybe — a deprecation warning and convert it to anArray
?