Skip to content

Conversation

jordanlewis
Copy link
Contributor

column_definitions() for Postgres needs to fetch the collation for every column, if present. Previously, it did so using a correlated subquery - a subquery that references results from the outer scope.

This patch updates the query to remove the subquery in favor of a simpler and more efficient JOIN clause.

Running the two queries through EXPLAIN against Postgres additionally shows that the original form with a correlated subquery requires a Nested Loop Left Join, while the new form with a simple JOIN can use a more efficient Merge Left Join.

Old plan:

jordan=# explain SELECT a.attname,
  format_type(a.atttypid, a.atttypmod),
  pg_get_expr(d.adbin, d.adrelid),
  a.attnotnull,
  a.atttypid,
  a.atttypmod,
  (SELECT c.collname
   FROM pg_collation c, pg_type t
   WHERE c.oid = a.attcollation
   AND t.oid = a.atttypid
   AND a.attcollation <> t.typcollation),
   col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a LEFT JOIN pg_attrdef d
ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE a.attrelid = 'a'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=4.46..74.93 rows=3 width=119)
   Join Filter: ((a.attrelid = d.adrelid) AND (a.attnum = d.adnum))
   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..12.26 rows=3 width=83)
         Index Cond: ((attrelid = '3525321'::oid) AND (attnum > 0))
         Filter: (NOT attisdropped)
   ->  Materialize  (cost=4.18..12.66 rows=4 width=38)
         ->  Bitmap Heap Scan on pg_attrdef d  (cost=4.18..12.64 rows=4 width=38)
               Recheck Cond: (adrelid = '3525321'::oid)
               ->  Bitmap Index Scan on pg_attrdef_adrelid_adnum_index  (cost=0.00..4.18 rows=4 width=0)
                     Index Cond: (adrelid = '3525321'::oid)
   SubPlan 1
     ->  Nested Loop  (cost=0.29..16.34 rows=1 width=64)
           ->  Index Scan using pg_collation_oid_index on pg_collation c  (cost=0.15..8.17 rows=1 width=64)
                 Index Cond: (oid = a.attcollation)
           ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.15..8.17 rows=1 width=0)
                 Index Cond: (oid = a.atttypid)
                 Filter: (a.attcollation <> typcollation)
(17 rows)

New plan:

jordan=# explain SELECT a.attname,
  format_type(a.atttypid, a.atttypmod),
  pg_get_expr(d.adbin, d.adrelid),
  a.attnotnull,
  a.atttypid,
  a.atttypmod,
  c.collname,
  col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d
ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_collation c
ON c.oid = a.attcollation
LEFT JOIN pg_type t
ON t.oid = a.atttypid AND a.attcollation <> t.typcollation
WHERE a.attrelid = 'collation_test'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=38.41..39.23 rows=3 width=179)
   Merge Cond: (a.attnum = d.adnum)
   Join Filter: (a.attrelid = d.adrelid)
   ->  Sort  (cost=25.73..25.73 rows=3 width=143)
         Sort Key: a.attnum
         ->  Hash Right Join  (cost=12.30..25.70 rows=3 width=143)
               Hash Cond: (c.oid = a.attcollation)
               ->  Seq Scan on pg_collation c  (cost=0.00..11.69 rows=269 width=68)
               ->  Hash  (cost=12.26..12.26 rows=3 width=83)
                     ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..12.26 rows=3 width=83)
                           Index Cond: ((attrelid = '3525342'::oid) AND (attnum > 0))
                           Filter: (NOT attisdropped)
   ->  Sort  (cost=12.68..12.69 rows=4 width=38)
         Sort Key: d.adnum
         ->  Bitmap Heap Scan on pg_attrdef d  (cost=4.18..12.64 rows=4 width=38)
               Recheck Cond: (adrelid = '3525342'::oid)
               ->  Bitmap Index Scan on pg_attrdef_adrelid_adnum_index  (cost=0.00..4.18 rows=4 width=0)
                     Index Cond: (adrelid = '3525342'::oid)
(18 rows)

@rails-bot
Copy link

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @senny (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_collation c ON a.attcollation = c.oid
LEFT JOIN pg_type t on a.atttypid = t.oid AND a.attcollation <> t.typcollation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this line's on is lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jordanlewis jordanlewis force-pushed the simplify-postgres-column-definitions branch from ed7db0e to 7e2e3f2 Compare January 19, 2017 21:11
@jordanlewis
Copy link
Contributor Author

Thanks for your review, @kamipo! What can I do to help get this PR reviewed and merged?

@jordanlewis
Copy link
Contributor Author

Hi! Is there anything else I can do to help get an eye or two on this one? @senny

Thanks!

@rafaelfranca
Copy link
Member

Could you squash your commits?

column_definitions() needs to fetch the collation for every column, if
present. Previously, it did so using a correlated subquery - a subquery
that references results from the outer scope.

This patch updates the query to remove the subquery in favor of a
simpler and more efficient JOIN clause.

Running the two queries through EXPLAIN against Postgres additionally
shows that the original form with a correlated subquery requires a
Nested Loop Left Join, while the new form with a simple JOIN can use a
more efficient Merge Left Join.
@jordanlewis jordanlewis force-pushed the simplify-postgres-column-definitions branch from f7c6cfa to f9c6cbd Compare January 30, 2017 16:28
@jordanlewis
Copy link
Contributor Author

Squashed and rebased.

@rafaelfranca rafaelfranca merged commit eda7f26 into rails:master Jan 30, 2017
@jordanlewis
Copy link
Contributor Author

Thank you, @rafaelfranca!

@jordanlewis jordanlewis deleted the simplify-postgres-column-definitions branch February 8, 2017 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants