Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid coercing all SELECTs with the same column name to the same value #36186

Closed

Conversation

boblail
Copy link
Contributor

@boblail boblail commented May 5, 2019

Overview of Issue

Rails had been mapping types to columns in a Result Set by indexing types on columns' names. When two values in a Result Set have the same column name, they are forced to share a type.

There are several scenarios where multiple values may share the same column name.

One involves

  1. One scenario (where two tables happen to have columns with the same name but the column's values differ) is explained in Plucking associated data from an enum column returns nil #36042. This PR does not resolve the issue in Plucking associated data from an enum column returns nil #36042 but proved part of a solution to that issue.

  2. Another is when using functions in a .select or .pluck. In the absence of an Alias clause, Postgres names values with the last function called, so if two values in the same query happen to use COALESCE, they'll both be named "coalesce".

    psql> SELECT COALESCE(NULL, 'four'), COALESCE(NULL, 4);
     coalesce | coalesce
    ----------+----------
     four     |        4
    (1 row)
    
    ActiveRecord::Base.pluck(
      Arel.sql("COALESCE(NULL, 'four')"),
      Arel.sql("COALESCE(NULL, 4)")
    ) # => [[0, 4]]

I was able to reproduce this on Rails 4.2.11.1, 5.0.7.2, 5.1.7, 5.2.3, and master.

Solution

This commit matches types with columns and values (in a row) by array position (which is an assumption already made in PostgreSQL::DatabaseStatements#exec_query). It adds column_types as a public method on ActiveRecord::Result to preserve the class's public API.

Concerns

Does Rails expect client code to be constructing ActiveRecord::Result objects? Do we need to check whether its third argument is a Hash, give — maybe — a deprecation warning and convert it to an Array?

@boblail boblail force-pushed the solve-key-collision-with-pg-types branch 2 times, most recently from 50e3d69 to 030bf80 Compare May 6, 2019 01:23
@boblail boblail force-pushed the solve-key-collision-with-pg-types branch from 030bf80 to c9c6ec4 Compare July 26, 2019 04:04
@st0012
Copy link
Contributor

st0012 commented Jul 28, 2019

Just some suggestions after quick review:

@boblail boblail force-pushed the solve-key-collision-with-pg-types branch 2 times, most recently from 49f1f16 to 4c98921 Compare September 8, 2019 21:07
@rails-bot
Copy link

rails-bot bot commented Dec 17, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot
Copy link

rails-bot bot commented Apr 11, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Apr 11, 2020
@boblail boblail force-pushed the solve-key-collision-with-pg-types branch from 3f05186 to af637b2 Compare April 18, 2020 14:34
@rails-bot rails-bot bot removed the stale label Apr 18, 2020
@boblail
Copy link
Contributor Author

boblail commented Apr 18, 2020

@st0012, rebased. I think the specs were failing b/c of a flaky test. Rebuilt and it looks OK now.

I added two test cases for this. The second — test_support_different_types_for_columns_with_the_same_name_but_different_tables is the same scenario as in #36042.

@boblail
Copy link
Contributor Author

boblail commented Apr 18, 2020

@kamipo, could I get 👀 on this PR? I know you've got a strong familiarity with ActiveRecord and Postgres. I'm not sure if mentioning you is the right protocol — sorry if it's not 😬. (I'm not quite sure how to promote my PR and solicit a review.)

🙏 Thanks!

activerecord/lib/active_record/result.rb Outdated Show resolved Hide resolved
@@ -34,13 +34,13 @@ module ActiveRecord
class Result
include Enumerable

attr_reader :columns, :rows, :column_types
attr_reader :columns, :rows, :types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a strong reason to rename the interface/instance variable to types. This can break something that's not covered by tests, e.g. the initialize_copy is now broken because it only copies @column_type but not @types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Good callout re: initialize_copy! I'll audit uses of the instance variable in this class.

I wouldn't expect the instance variable to be part of this classes interface, though, and I did implement column_types (below) to maintain the classes interface.

My concern with simply reusing @column_types for the new third argument to initialize is that the type of that instance variable has changed (from a dictionary of types correlated to columns by name to an array of types correlated to columns by position).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize_copy appears to be the only place where @column_types was referenced — I made that change! 👍 ✅

activerecord/lib/active_record/result.rb Outdated Show resolved Hide resolved
Fixes rails#36042

Rails had been mapping types to columns in a Result Set by indexing types on columns' names. When two values in a Result Set have the same column name, they are forced to share a type.

There are several scenarios where multiple values may share the same column name.

1. One (where two tables happen to have columns with the same name but the column's values differ) is outlined in rails#36042.

2. Another is when using functions in a `.select` or `.pluck`. In the absence of an Alias clause, Postgres names values with the last function called, so if two values in the same query happen to use `COALESCE`, they'll both be named `"coalesce"`.

   ```ruby
   ActiveRecord::Base.pluck(
     Arel.sql("COALESCE(NULL, 'four')"),
     Arel.sql("COALESCE(NULL, 4)")
   )
   ```

This commit matches `types` with `columns` and `values` (in a row) by array position (which is an assumption already made in `PostgreSQL::DatabaseStatements#exec_query`). It adds `column_types` as a public method on `ActiveRecord::Result` to preserve the class's public API.

Does Rails expect client code to be constructing `ActiveRecord::Result` objects? Do we need to check whether its third argument is a `Hash`, give — maybe — a deprecation warning and convert it to an `Array`?
@boblail boblail force-pushed the solve-key-collision-with-pg-types branch from 15fe337 to 20f3f0e Compare April 22, 2020 01:24

def column_types
if @types.present?
Hash[@columns.zip(@types)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is sometimes called inside a loop:

calculated_data.column_types.fetch(aliaz, Type.default_value)

The result should be memoised so that it will only be calculated once, as it was before this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth roflscaling this a bit, similar to hash_rows below, since this implementation allocates more intermediate objects than before (a two-element array for each column, and an array to hold them all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @st0012 was suggesting it earlier, I roflscaled this by assigning @column_types in initialize

activerecord/lib/active_record/result.rb Outdated Show resolved Hide resolved
fields = result.fields
fields.each_with_index do |fname, i|
ftype = result.ftype i
fmod = result.fmod i
types[fname] = get_oid_type(ftype, fmod, fname)
types << get_oid_type(ftype, fmod, fname)
end
build_result(columns: fields, rows: result.values, column_types: types)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this column_types: parameter should be renamed to types:.

The default value is now of the wrong type too:

def build_result(columns:, rows:, column_types: {})

@kamipo
Copy link
Member

kamipo commented Apr 22, 2020

Actually this doesn't fix #36042, I'm not excited to change the cast_values' behavior which is private API unless it does fix public API (e.g. pluck etc).

@boblail boblail force-pushed the solve-key-collision-with-pg-types branch from 916d46b to f4630d8 Compare April 22, 2020 21:49
@boblail
Copy link
Contributor Author

boblail commented Apr 22, 2020

@kamipo, great callout! This does not fix #36042. We'd have to handle type_overrides differently here.

I added a test, though, to demonstrate a bug with pluck that this PR does still fix: 6b57bf9

When run in master, the new test fails like this:
Screen Shot 2020-04-22 at 4 35 28 PM

@kamipo
Copy link
Member

kamipo commented Apr 22, 2020

I know this issue affects only for PostgreSQL adapter (originally works for other MySQL, SQLite3, Oracle, and SQL Server adapters), I'm still not excited extra allocation for all cases to address such a minor case.

Like as a workaround for #36042, is it not enough to give a column alias?

relation.pluck(Arel.sql("COALESCE(format, 'unspecified') a, COALESCE(nullable_status, 0) b"))

@boblail
Copy link
Contributor Author

boblail commented Apr 23, 2020

@kamipo,

Yep! Using an alias is a totally viable workaround! (When I first made this PR, my team also went that route: boblail/pluck_map#14)

So if you want to decline this PR, we do have an acceptable path forward.

I just think that the behavior is surprising — especially since you get different results from MySQL and Sqlite3. For the developer troubleshooting the issue, the root cause is not immediately obvious!!

Only a developer who has spelunked in Rails' source would be able to tell that Rails keys type information on name and Postgres returns more-generic names than the other two databases:

Here's the output from each database in its REPL:

SQLite
sqlite> select COALESCE(NULL, 'four'), COALESCE(NULL, 4);
COALESCE(NULL, 'four')  COALESCE(NULL, 4)
----------------------  -----------------
four                    4
MySQL
mysql> select COALESCE(NULL, 'four'), COALESCE(NULL, 4);
+------------------------+-------------------+
| COALESCE(NULL, 'four') | COALESCE(NULL, 4) |
+------------------------+-------------------+
| four                   |                 4 |
+------------------------+-------------------+
1 row in set (0.00 sec)
PostgreSQL
boblail=# select COALESCE(NULL, 'four'), COALESCE(NULL, 4);
 coalesce | coalesce
----------+----------
 four     |        4
(1 row)

Do you think it would make sense for Rails to raise an exception if an ActiveRecord::Result uses the same name more than once???

OR — the extra allocation you're referring to — is that the Hash of @column_types in addition to the Array of @types? Would this PR be acceptable if we did that allocation only on-demand or could do without it?

(Thanks for the feedback!! 😄)

@simi
Copy link
Contributor

simi commented Apr 23, 2020

As @boblail mentioned (and I agree)

ActiveRecord::Base.pluck(
  Arel.sql("COALESCE(NULL, 'four')"),
  Arel.sql("COALESCE(NULL, 4)")
) # => [[0, 4]]

is not expected result. Anyway I agree with @kamipo this is big change to handle this (probably minor) problem. It would be nice to at least mention this in documentation or raise on unsupported (conflicting) input to not surprise developers (and keep the Rails experience nice).

kamipo added a commit to kamipo/rails that referenced this pull request May 1, 2020
`column_types` is empty except PostgreSQL adapter, and
`attribute_types.each_key { |k| column_types.delete k }` is also empty
even if PostgreSQL adapter almost all case, so that code is quite
useless. This improves performance for `find_by_sql` to avoid that
useless loop as much as possible.

```ruby
ActiveRecord::Schema.define do
  create_table :active_storage_blobs do |t|
    t.string   :key,          null: false
    t.string   :filename,     null: false
    t.string   :content_type
    t.text     :metadata
    t.string   :service_name, null: false
    t.bigint   :byte_size,    null: false
    t.string   :checksum,     null: false
    t.datetime :created_at,   null: false

    t.index [ :key ], unique: true
  end
end

class ActiveStorageBlob < ActiveRecord::Base
end

Benchmark.ips do |x|
  x.report("find_by") { ActiveStorageBlob.find_by(id: 1) }
end
```

Before:

```
Warming up --------------------------------------
             find_by     1.256k i/100ms
Calculating -------------------------------------
             find_by     12.595k (± 3.4%) i/s -     64.056k in   5.091599s
```

After:

```
Warming up --------------------------------------
             find_by     1.341k i/100ms
Calculating -------------------------------------
             find_by     13.170k (± 3.5%) i/s -     67.050k in   5.097439s
```

To avoid column types loop for PostgreSQL adapter, this skips returning
additional column types if a column has already been type casted by pg
decoders. Fortunately this fixes rails#36186 partly for common types.
@kamipo kamipo closed this in #39097 May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants