New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Avoid coercing all SELECTs with the same column name to the same value #36186

Closed

boblail wants to merge 4 commits into rails:master from boblail:solve-key-collision-with-pg-types

Contributor

boblail commented May 5, 2019 •

edited

Overview of Issue

Rails had been mapping types to columns in a Result Set by indexing types on columns' names. When two values in a Result Set have the same column name, they are forced to share a type.

There are several scenarios where multiple values may share the same column name.

One involves

One scenario (where two tables happen to have columns with the same name but the column's values differ) is explained in Plucking associated data from an enum column returns nil #36042. This PR does not resolve the issue in Plucking associated data from an enum column returns nil #36042 but proved part of a solution to that issue.

Another is when using functions in a .select or .pluck. In the absence of an Alias clause, Postgres names values with the last function called, so if two values in the same query happen to use COALESCE, they'll both be named "coalesce".

psql> SELECT COALESCE(NULL, 'four'), COALESCE(NULL, 4);
 coalesce | coalesce
----------+----------
 four     |        4
(1 row)

ActiveRecord::Base.pluck(
  Arel.sql("COALESCE(NULL, 'four')"),
  Arel.sql("COALESCE(NULL, 4)")
) # => [[0, 4]]

I was able to reproduce this on Rails 4.2.11.1, 5.0.7.2, 5.1.7, 5.2.3, and master.

Solution

This commit matches types with columns and values (in a row) by array position (which is an assumption already made in PostgreSQL::DatabaseStatements#exec_query). It adds column_types as a public method on ActiveRecord::Result to preserve the class's public API.

Concerns

Does Rails expect client code to be constructing ActiveRecord::Result objects? Do we need to check whether its third argument is a Hash, give — maybe — a deprecation warning and convert it to an Array?

rails-bot bot added the activerecord label

boblail mentioned this pull request

Restored alias to single, non-symbol attribute selects boblail/pluck_map#14

Merged

boblail force-pushed the solve-key-collision-with-pg-types branch 2 times, most recently from 50e3d69 to 030bf80 Compare

May 6, 2019 01:23

boblail force-pushed the solve-key-collision-with-pg-types branch from 030bf80 to c9c6ec4 Compare

July 26, 2019 04:04

Contributor

st0012 commented Jul 28, 2019

Just some suggestions after quick review:

I think the solution makes sense, great job 😄
Can you fix the codeclimate issue? So it doesn't look broken on the PR list page.
Since you this PR fixes Plucking associated data from an enum column returns nil #36042, can you add some test cases similar to the ones in that issue?

boblail force-pushed the solve-key-collision-with-pg-types branch 2 times, most recently from 49f1f16 to 4c98921 Compare

September 8, 2019 21:07

rails-bot bot commented Dec 17, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

rails-bot bot added the stale label

boblail force-pushed the solve-key-collision-with-pg-types branch from 4c98921 to 9fd7b62 Compare

December 19, 2019 02:06

rails-bot bot removed the stale label

boblail force-pushed the solve-key-collision-with-pg-types branch from 9fd7b62 to 7e0e269 Compare

December 21, 2019 14:46

boblail force-pushed the solve-key-collision-with-pg-types branch 2 times, most recently from df61f33 to 7270b6c Compare

January 12, 2020 20:31

boblail requested a review from rafaelfranca as a code owner

January 12, 2020 20:31

rails-bot bot added actioncable actionmailbox actionmailer labels

boblail force-pushed the solve-key-collision-with-pg-types branch from 7270b6c to 3f05186 Compare

January 12, 2020 20:32

eugeneius removed actioncable actionmailbox actionmailer labels

eugeneius removed the request for review from rafaelfranca

January 12, 2020 23:43

rails-bot bot commented Apr 11, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

rails-bot bot added the stale label

boblail force-pushed the solve-key-collision-with-pg-types branch from 3f05186 to af637b2 Compare

April 18, 2020 14:34

rails-bot bot removed the stale label

Contributor Author

boblail commented Apr 18, 2020

@st0012, rebased. I think the specs were failing b/c of a flaky test. Rebuilt and it looks OK now.

I added two test cases for this. The second — test_support_different_types_for_columns_with_the_same_name_but_different_tables is the same scenario as in #36042.

Contributor Author

boblail commented Apr 18, 2020

@kamipo, could I get 👀 on this PR? I know you've got a strong familiarity with ActiveRecord and Postgres. I'm not sure if mentioning you is the right protocol — sorry if it's not 😬. (I'm not quite sure how to promote my PR and solicit a review.)

🙏 Thanks!

st0012 suggested changes

View reviewed changes

activerecord/lib/active_record/result.rb Outdated Show resolved Hide resolved

activerecord/lib/active_record/result.rb Outdated

    
            @@ -34,13 +34,13 @@ module ActiveRecord
          
                class Result

                  include Enumerable

                  attr_reader :columns, :rows, :column_types

                  attr_reader :columns, :rows, :types

Contributor

st0012 Apr 19, 2020

I don't see a strong reason to rename the interface/instance variable to types. This can break something that's not covered by tests, e.g. the initialize_copy is now broken because it only copies @column_type but not @types.

Contributor Author

boblail Apr 22, 2020

🤔 Good callout re: initialize_copy! I'll audit uses of the instance variable in this class.

I wouldn't expect the instance variable to be part of this classes interface, though, and I did implement column_types (below) to maintain the classes interface.

My concern with simply reusing @column_types for the new third argument to initialize is that the type of that instance variable has changed (from a dictionary of types correlated to columns by name to an array of types correlated to columns by position).

Contributor Author

boblail Apr 22, 2020

initialize_copy appears to be the only place where @column_types was referenced — I made that change! 👍 ✅

activerecord/lib/active_record/result.rb Outdated Show resolved Hide resolved


          Avoid coercing all SELECTs with the same column name to the same value

20f3f0e

Fixes rails#36042

Rails had been mapping types to columns in a Result Set by indexing types on columns' names. When two values in a Result Set have the same column name, they are forced to share a type.

There are several scenarios where multiple values may share the same column name.

1. One (where two tables happen to have columns with the same name but the column's values differ) is outlined in rails#36042.

2. Another is when using functions in a `.select` or `.pluck`. In the absence of an Alias clause, Postgres names values with the last function called, so if two values in the same query happen to use `COALESCE`, they'll both be named `"coalesce"`.

   ```ruby
   ActiveRecord::Base.pluck(
     Arel.sql("COALESCE(NULL, 'four')"),
     Arel.sql("COALESCE(NULL, 4)")
   )
   ```

This commit matches `types` with `columns` and `values` (in a row) by array position (which is an assumption already made in `PostgreSQL::DatabaseStatements#exec_query`). It adds `column_types` as a public method on `ActiveRecord::Result` to preserve the class's public API.

Does Rails expect client code to be constructing `ActiveRecord::Result` objects? Do we need to check whether its third argument is a `Hash`, give — maybe — a deprecation warning and convert it to an `Array`?

boblail force-pushed the solve-key-collision-with-pg-types branch from 15fe337 to 20f3f0e Compare

April 22, 2020 01:24

eugeneius reviewed

View reviewed changes

activerecord/lib/active_record/result.rb Outdated

+                  def column_types
+                    if @types.present?
+                      Hash[@columns.zip(@types)]

Member

eugeneius Apr 22, 2020

This method is sometimes called inside a loop:

rails/activerecord/lib/active_record/relation/calculations.rb

Line 373 in a3f62cb

calculated_data.column_types.fetch(aliaz, Type.default_value)

The result should be memoised so that it will only be calculated once, as it was before this change.

Member

eugeneius Apr 22, 2020

I wonder if it's worth roflscaling this a bit, similar to hash_rows below, since this implementation allocates more intermediate objects than before (a two-element array for each column, and an array to hold them all).

Contributor Author

boblail Apr 22, 2020

Since @st0012 was suggesting it earlier, I roflscaled this by assigning @column_types in initialize

activerecord/lib/active_record/result.rb Outdated Show resolved Hide resolved

activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb Outdated

                           fields = result.fields
                           fields.each_with_index do |fname, i|
                             ftype = result.ftype i
                             fmod  = result.fmod i
-                            types[fname] = get_oid_type(ftype, fmod, fname)
+                            types << get_oid_type(ftype, fmod, fname)
                           end
                           build_result(columns: fields, rows: result.values, column_types: types)

Member

eugeneius Apr 22, 2020

I think this column_types: parameter should be renamed to types:.

The default value is now of the wrong type too:

rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb

Line 746 in a3f62cb

def build_result(columns:, rows:, column_types: {})

Member

kamipo commented Apr 22, 2020

Actually this doesn't fix #36042, I'm not excited to change the cast_values' behavior which is private API unless it does fix public API (e.g. pluck etc).

boblail added 3 commits

April 22, 2020 16:25


          fixup! renamed :column_types to :types in build_result

cb0dc11


          fixup! add test for pluck that shows what public behavior this change…

6b57bf9

… fixes


          fixup! precalculated @column_types

f4630d8

boblail force-pushed the solve-key-collision-with-pg-types branch from 916d46b to f4630d8 Compare

April 22, 2020 21:49

Contributor Author

boblail commented Apr 22, 2020 •

edited

@kamipo, great callout! This does not fix #36042. We'd have to handle type_overrides differently here.

I added a test, though, to demonstrate a bug with pluck that this PR does still fix: 6b57bf9

When run in master, the new test fails like this:

Member

kamipo commented Apr 22, 2020

I know this issue affects only for PostgreSQL adapter (originally works for other MySQL, SQLite3, Oracle, and SQL Server adapters), I'm still not excited extra allocation for all cases to address such a minor case.

Like as a workaround for #36042, is it not enough to give a column alias?

relation.pluck(Arel.sql("COALESCE(format, 'unspecified') a, COALESCE(nullable_status, 0) b"))

Contributor Author

boblail commented Apr 23, 2020

Yep! Using an alias is a totally viable workaround! (When I first made this PR, my team also went that route: boblail/pluck_map#14)

So if you want to decline this PR, we do have an acceptable path forward.

I just think that the behavior is surprising — especially since you get different results from MySQL and Sqlite3. For the developer troubleshooting the issue, the root cause is not immediately obvious!!

Only a developer who has spelunked in Rails' source would be able to tell that Rails keys type information on name and Postgres returns more-generic names than the other two databases:

Here's the output from each database in its REPL:

SQLite

sqlite> select COALESCE(NULL, 'four'), COALESCE(NULL, 4);
COALESCE(NULL, 'four')  COALESCE(NULL, 4)
----------------------  -----------------
four                    4

MySQL

mysql> select COALESCE(NULL, 'four'), COALESCE(NULL, 4);
+------------------------+-------------------+
| COALESCE(NULL, 'four') | COALESCE(NULL, 4) |
+------------------------+-------------------+
| four                   |                 4 |
+------------------------+-------------------+
1 row in set (0.00 sec)

PostgreSQL

boblail=# select COALESCE(NULL, 'four'), COALESCE(NULL, 4);
 coalesce | coalesce
----------+----------
 four     |        4
(1 row)

Do you think it would make sense for Rails to raise an exception if an ActiveRecord::Result uses the same name more than once???

OR — the extra allocation you're referring to — is that the Hash of @column_types in addition to the Array of @types? Would this PR be acceptable if we did that allocation only on-demand or could do without it?

(Thanks for the feedback!! 😄)

Contributor

simi commented Apr 23, 2020

As @boblail mentioned (and I agree)

ActiveRecord::Base.pluck(
  Arel.sql("COALESCE(NULL, 'four')"),
  Arel.sql("COALESCE(NULL, 4)")
) # => [[0, 4]]

is not expected result. Anyway I agree with @kamipo this is big change to handle this (probably minor) problem. It would be nice to at least mention this in documentation or raise on unsupported (conflicting) input to not surprise developers (and keep the Rails experience nice).

kamipo added a commit to kamipo/rails that referenced this pull request


          Avoid quite useless loop in find_by_sql

71e4558

`column_types` is empty except PostgreSQL adapter, and
`attribute_types.each_key { |k| column_types.delete k }` is also empty
even if PostgreSQL adapter almost all case, so that code is quite
useless. This improves performance for `find_by_sql` to avoid that
useless loop as much as possible.

```ruby
ActiveRecord::Schema.define do
  create_table :active_storage_blobs do |t|
    t.string   :key,          null: false
    t.string   :filename,     null: false
    t.string   :content_type
    t.text     :metadata
    t.string   :service_name, null: false
    t.bigint   :byte_size,    null: false
    t.string   :checksum,     null: false
    t.datetime :created_at,   null: false

    t.index [ :key ], unique: true
  end
end

class ActiveStorageBlob < ActiveRecord::Base
end

Benchmark.ips do |x|
  x.report("find_by") { ActiveStorageBlob.find_by(id: 1) }
end
```

Before:

```
Warming up --------------------------------------
             find_by     1.256k i/100ms
Calculating -------------------------------------
             find_by     12.595k (± 3.4%) i/s -     64.056k in   5.091599s
```

After:

```
Warming up --------------------------------------
             find_by     1.341k i/100ms
Calculating -------------------------------------
             find_by     13.170k (± 3.5%) i/s -     67.050k in   5.097439s
```

To avoid column types loop for PostgreSQL adapter, this skips returning
additional column types if a column has already been type casted by pg
decoders. Fortunately this fixes rails#36186 partly for common types.

kamipo mentioned this pull request

Avoid quite useless loop in find_by_sql #39097

Merged

kamipo closed this in #39097

ghiculescu mentioned this pull request

Don't skip some columns in column_types on Postgres #42933

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment