Simplify and optimize Postgres query for primary_keys() #27961
Conversation
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @schneems (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review. Please see the contribution instructions for more information. |
JOIN pg_attribute a | ||
ON a.attrelid=i.indrelid | ||
AND a.attnum=i.indkey[k] | ||
WHERE i.indrelid=#{quote(quote_table_name(table_name))}::regclass |
kamipo
Feb 10, 2017
Member
I confirmed that generate_subscripts
behaves as expected.
psql -E activerecord_unittest
psql (9.5.4)
Type "help" for help.
activerecord_unittest=# create table comp3 (a int, b int, c int, primary key (a,c, b));
CREATE TABLE
activerecord_unittest=# select a.attname from pg_index i cross join unnest(i.indkey) as k join pg_attribute a on a.attrelid = i.indrelid and a.attnum = k where indrelid='comp3'::regclass and i.indisprimary;
attname
---------
a
b
c
(3 rows)
activerecord_unittest=# select a.attname from pg_index i cross join generate_subscripts(i.indkey, 1) k join pg_attribute a on a.attrelid = i.indrelid and a.attnum = i.indkey[k] where indrelid='comp3'::regclass and i.indisprimary;
attname
---------
a
c
b
(3 rows)
activerecord_unittest=#
I confirmed that generate_subscripts
behaves as expected.
psql -E activerecord_unittest
psql (9.5.4)
Type "help" for help.
activerecord_unittest=# create table comp3 (a int, b int, c int, primary key (a,c, b));
CREATE TABLE
activerecord_unittest=# select a.attname from pg_index i cross join unnest(i.indkey) as k join pg_attribute a on a.attrelid = i.indrelid and a.attnum = k where indrelid='comp3'::regclass and i.indisprimary;
attname
---------
a
b
c
(3 rows)
activerecord_unittest=# select a.attname from pg_index i cross join generate_subscripts(i.indkey, 1) k join pg_attribute a on a.attrelid = i.indrelid and a.attnum = i.indkey[k] where indrelid='comp3'::regclass and i.indisprimary;
attname
---------
a
c
b
(3 rows)
activerecord_unittest=#
kamipo
Feb 10, 2017
Member
[nit] I prefer space around equals even in SQL like Style/SpaceAroundOperators
.
Lines 66 to 67
in
55d66e2
We have few lines missing space around equals.
% git grep -n "\w=[\w#']" lib/active_record/connection_adapters
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:556: raw_table_options.sub!(/(ENGINE=\w+)(?: AUTO_INCREMENT=\d+)/, '\1')
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:561: if raw_table_options.sub!(/ COMMENT='.+'/, "")
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:845: # to work with MySQL 5.7.6 which sets optimizer_switch='derived_merge=on'
lib/active_record/connection_adapters/sqlite3_adapter.rb:348: WHERE name=#{quote(row['name'])} AND type='index'
lib/active_record/connection_adapters/sqlite3_adapter.rb:352: WHERE name=#{quote(row['name'])} AND type='index'
% git grep -n "}=[\w#']" lib/active_record/connection_adapters
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:486: execute("UPDATE #{quote_table_name(table_name)} SET #{quote_column_name(column_name)}=#{quote(default)} WHERE #{quote_column_name(column_name)} IS NULL")
lib/active_record/connection_adapters/postgresql/schema_statements.rb:528: execute("UPDATE #{quote_table_name(table_name)} SET #{quote_column_name(column_name)}=#{quote_default_expression(default, column)} WHERE #{quote_column_name(column_name)} IS NULL") if column
lib/active_record/connection_adapters/sqlite3_adapter.rb:418: exec_query("UPDATE #{quote_table_name(table_name)} SET #{quote_column_name(column_name)}=#{quote(default)} WHERE #{quote_column_name(column_name)} IS NULL")
[nit] I prefer space around equals even in SQL like Style/SpaceAroundOperators
.
Lines 66 to 67 in 55d66e2
We have few lines missing space around equals.
% git grep -n "\w=[\w#']" lib/active_record/connection_adapters
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:556: raw_table_options.sub!(/(ENGINE=\w+)(?: AUTO_INCREMENT=\d+)/, '\1')
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:561: if raw_table_options.sub!(/ COMMENT='.+'/, "")
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:845: # to work with MySQL 5.7.6 which sets optimizer_switch='derived_merge=on'
lib/active_record/connection_adapters/sqlite3_adapter.rb:348: WHERE name=#{quote(row['name'])} AND type='index'
lib/active_record/connection_adapters/sqlite3_adapter.rb:352: WHERE name=#{quote(row['name'])} AND type='index'
% git grep -n "}=[\w#']" lib/active_record/connection_adapters
lib/active_record/connection_adapters/abstract_mysql_adapter.rb:486: execute("UPDATE #{quote_table_name(table_name)} SET #{quote_column_name(column_name)}=#{quote(default)} WHERE #{quote_column_name(column_name)} IS NULL")
lib/active_record/connection_adapters/postgresql/schema_statements.rb:528: execute("UPDATE #{quote_table_name(table_name)} SET #{quote_column_name(column_name)}=#{quote_default_expression(default, column)} WHERE #{quote_column_name(column_name)} IS NULL") if column
lib/active_record/connection_adapters/sqlite3_adapter.rb:418: exec_query("UPDATE #{quote_table_name(table_name)} SET #{quote_column_name(column_name)}=#{quote(default)} WHERE #{quote_column_name(column_name)} IS NULL")
jordanlewis
Feb 10, 2017
Author
Contributor
Added the spaces.
Added the spaces.
While we're touching these, can we just refactor it to use SELECT column_name FROM information_schema.key_column_usage
WHERE constraint_name IN (
SELECT table_constraints.constraint_name FROM information_schema.table_constraints WHERE constraint_type = 'PRIMARY KEY'
) AND table_name = $1 AND table_schema = $2 (If anyone wanted to refactor all of our schema lookup code to use |
@sgrif seems like a pretty good idea. The query can't be optimized quite as well this way, but I suppose the gain in DRY in the Rails codebase could be of greater importance. What do you think? I'm happy to update my patch as you suggest, but I think I'll stick to just modifying The query diff (before refactoring to the rest of the adapters) would be:
And there's the
|
Seems fine to me. |
I'm a fan of leaning on information_schema by default. If the motivation is to speed these up, though, IS perf will often disappoint. |
Do we have an actual use case where this is a hot spot? Ultimately |
@sgrif Good point. My motivation here had more to do with simplifying the query than optimizing it, to improve compatibility for Postgres backends that might not support common table expressions, a feature which is only used in this query - the latter was just a side-effect. I've updated the patch to modify this query to use I spent a little time seeing what it would take to move the |
Yeah, the definition of "schema" varies by backend (and of the in-tree adapters is only relevant to postgres), so you'll probably want to have the abstract version ask the adapter to split the string into schema and table name (with a default impl returning |
primary_keys(table) needs to query various metadata tables in Postgres to determine the primary key for the table. Previously, it did so using a complex common table expression against pg_constraint and pg_attribute. This patch simplifies the query by using information_schema tables. This simplifies the logic, making the query far easier to understand, and additionally avoids an expensive unnest, window function query, and common table expression.
b8e3af7
Cool - build's green now. Thanks for the tips, @sgrif! |
Fixes CI failure caused by rails#27961. https://travis-ci.org/rails/rails/jobs/201472146#L2390-L2409
Take two on #27949, which was flawed because
unnest
doesn't always return in a consistent order. Now we usegenerate_subscripts
which should always return rows in a consistent order. It is available since Postgres 8.4.In summary, this pull request simplifies the query used to determine the primary keys of a table in Postgres to achieve a ~%66 speedup per table primary key query during application startup.
cc @kamipo (thanks for your patience!)
Here's the
EXPLAIN
data with the new and old queries on a sample table:Before:
After: