Use single query to retrieve indexes in PostgreSQL #45381

fatkodima · 2022-06-16T14:56:32Z

Previously, to get all indexes for the database in PostgreSQL, it was needed # of tables + # of indexes sql queries. Now, it is only # of tables sql queries.

For example, for gitlab's source base, which has 500 tables and 2000 indexes, it was needed 2500 queries, now 500.

This PR greatly reduces the time needed to generate a schema.rb file, for example. Or in other tools, where it is needed to retrieve all the indexes from the database. For example, I was able to speedup active_record_doctor by 3x (from 126 seconds to 39 seconds) using schema caching - gregnavis/active_record_doctor#101. And with this PR, it reduces execution time by another 8 seconds.

matthewd · 2022-06-16T15:08:48Z

activerecord/lib/active_record/connection_adapters/postgresql/schema_statements.rb

+                            pg_catalog.obj_description(i.oid, 'pg_class') AS comment, d.indisvalid,
+                            ARRAY(
+                              SELECT pg_get_indexdef(d.indexrelid, k + 1, true)
+                              FROM generate_subscripts(d.indkey, 1) AS k


I'm 30% sure this needs an ORDER BY k to be Technically Correct 🤔

This also feels like a behaviour change for non-column-reference expressions. Is that true?

I think the ORDER BY k is not needed. indkey returns a vector (in postgres terminology) with column numbers from the table. So for table users(id, name, email, created_at) and index on (created_at, email) it returns [4, 3].

This also feels like a behaviour change for non-column-reference expressions.

Can you provide an example?

My concern was for an index that mixed columns and non-column expressions, but per Discord conversation, I now see that the changed codepath (around line 125 below) is only taken for indexes that consist exclusively of table columns: as soon as any expression is present, we hit if indkey.include?(0), and things diverge.

(It separately seems less-than-ideal that we skip over the "richer" behaviour for those index columns that are non-expression references to table columns, but that's not relevant to this change.)

(I'm also happy to assume that generate_subscripts produces rows in a defined order within the immediate subquery, even without an explicit ORDER BY, just on the basis that set returning functions are weird.)

Added ORDER BY k to be sure. I see that was used in other examples on the internets, but I did not get it why we need to use it, because seems like generate_subscripts result should already be sorted.

(It separately seems less-than-ideal that we skip over the "richer" behaviour for those index columns that are non-expression references to table columns, but that's not relevant to this change.)

Probably I should got to sleep, but I did not get what is the problem here? 🤔 I would appreciate if you can provide a concrete example where the new approach won't work.

simi · 2022-06-16T15:25:45Z

Is getting all indexes the most common usage? Can't we make it into 1 query and iterate over the result safely (in batches)?

fatkodima · 2022-06-16T15:29:09Z

Is getting all indexes the most common usage? Can't we make it into 1 query and iterate over the result safely (in batches)?

I didn't get it. We do not get all indexes in a single query, we get all indexes during the execution of the program (like in active_record_doctor, mentioned in the PR description). We get all indexes per table at a time.

simi · 2022-06-16T15:30:06Z

Is getting all indexes the most common usage? Can't we make it into 1 query and iterate over the result safely (in batches)?

I didn't get it. We do not get all indexes in a single query, we get all indexes during the execution of the program (like in active_record_doctor, mentioned in the PR description). We get all indexes per table at a time.

Is for example active_record_doctor getting info for all tables?

fatkodima · 2022-06-16T15:32:05Z

You can disable specific tables for specific checks, but usually all tables in the db are checked.

fatkodima · 2022-08-17T11:06:30Z

@yahonda Can you, please, take a look at this PR?

yahonda · 2022-08-23T14:35:21Z

I'd like https://github.com/rails/rails/pull/45381/files#r899190812 discussion to be resolved between @fatkodima and @matthewd .

rails-bot bot added the activerecord label Jun 16, 2022

fatkodima mentioned this pull request Jun 16, 2022

Reduce the number of queries by caching schema info gregnavis/active_record_doctor#101

Open

matthewd reviewed Jun 16, 2022

View reviewed changes

fatkodima requested a review from matthewd June 18, 2022 20:03

fatkodima mentioned this pull request Jul 11, 2022

tag for v6.1.6 does not seem to live in this repository #45560

Closed

Use single query to retrieve indexes in PostgreSQL

758ad99

fatkodima force-pushed the pg-indexes-single-query branch from 0f961b2 to 758ad99 Compare September 10, 2022 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use single query to retrieve indexes in PostgreSQL #45381

Use single query to retrieve indexes in PostgreSQL #45381

fatkodima commented Jun 16, 2022

matthewd Jun 16, 2022

fatkodima Jun 16, 2022

matthewd Sep 10, 2022

matthewd Sep 10, 2022

fatkodima Sep 10, 2022

simi commented Jun 16, 2022

fatkodima commented Jun 16, 2022 •

edited

simi commented Jun 16, 2022

fatkodima commented Jun 16, 2022

fatkodima commented Aug 17, 2022

yahonda commented Aug 23, 2022

Use single query to retrieve indexes in PostgreSQL #45381

Are you sure you want to change the base?

Use single query to retrieve indexes in PostgreSQL #45381

Conversation

fatkodima commented Jun 16, 2022

matthewd Jun 16, 2022

Choose a reason for hiding this comment

fatkodima Jun 16, 2022

Choose a reason for hiding this comment

matthewd Sep 10, 2022

Choose a reason for hiding this comment

matthewd Sep 10, 2022

Choose a reason for hiding this comment

fatkodima Sep 10, 2022

Choose a reason for hiding this comment

simi commented Jun 16, 2022

fatkodima commented Jun 16, 2022 • edited

simi commented Jun 16, 2022

fatkodima commented Jun 16, 2022

fatkodima commented Aug 17, 2022

yahonda commented Aug 23, 2022

fatkodima commented Jun 16, 2022 •

edited