Skip to content

Conversation

aokiji
Copy link
Contributor

@aokiji aokiji commented Sep 28, 2025

Summary

Tables info can be queried faster by calling pg_total_relation_size only once per row, pg_total_relation_size is a volatile function, meaning that it would be called twice per row as volatile functions are not guaranteed the same result per table lookup, but that is not meaningful in this context

Additionally the group by expression can be removed entirely since only one row per oid can be produced by the join conditions

the speed up can be noticed specially in databases with large number of tables, to test this out I will set up a test database and measure the improvement

Set up the environment

We will create a postgres 12 database and populate it with 50000 tables with primary key

Using the following init.sql script:

-- init.sql
CREATE TABLE IF NOT EXISTS public.client_types (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);

-- create multiple tables with foreign keys in public
DO $$
DECLARE
    table_prefix TEXT := 'client_catalog_';
    number_of_tables_to_create INT := 50000;
BEGIN

    -- create tables
    FOR i IN 1..number_of_tables_to_create LOOP
        EXECUTE format('
            CREATE TABLE IF NOT EXISTS public.%I ( LIKE public.client_types )' , table_prefix || i::text);

    END LOOP;
END $$;

Let's start a postgres server that initializes with init.sql

❯ docker run --rm -p 15432:5432 -e POSTGRES_USER=example_user -e POSTGRES_PASSWORD=example_password -e POSTGRES_DB=example_db -v $PWD/init.sql:/docker-entrypoint-initdb.d/init.sql:ro --name pg_sql_bench postgres:12-alpine -c shared_buffers=512MB -c max_locks_per_transaction=1000

Validation

Using the two versions of the query:

current_query
-- current_query.sql
select
  c.oid :: int8 as "id!",
  nc.nspname as schema,
  c.relname as name,
  c.relkind as table_kind,
  c.relrowsecurity as rls_enabled,
  c.relforcerowsecurity as rls_forced,
  case
    when c.relreplident = 'd' then 'DEFAULT'
    when c.relreplident = 'i' then 'INDEX'
    when c.relreplident = 'f' then 'FULL'
    else 'NOTHING'
  end as "replica_identity!",
  pg_total_relation_size(format('%I.%I', nc.nspname, c.relname)) :: int8 as "bytes!",
  pg_size_pretty(
    pg_total_relation_size(format('%I.%I', nc.nspname, c.relname))
  ) as "size!",
  pg_stat_get_live_tuples(c.oid) as "live_rows_estimate!",
  pg_stat_get_dead_tuples(c.oid) as "dead_rows_estimate!",
  obj_description(c.oid) as comment
from
  pg_namespace nc
  join pg_class c on nc.oid = c.relnamespace
where
  c.relkind in ('r', 'p', 'v', 'm')
  and not pg_is_other_temp_schema(nc.oid)
  and (
    pg_has_role(c.relowner, 'USAGE')
    or has_table_privilege(
      c.oid,
      'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'
    )
    or has_any_column_privilege(c.oid, 'SELECT, INSERT, UPDATE, REFERENCES')
  )
group by
  c.oid,
  c.relname,
  c.relkind,
  c.relrowsecurity,
  c.relforcerowsecurity,
  c.relreplident,
  nc.nspname;
new_query
-- new_query.sql
select
  c.oid :: int8 as "id!",
  nc.nspname as schema,
  c.relname as name,
  c.relkind as table_kind,
  c.relrowsecurity as rls_enabled,
  c.relforcerowsecurity as rls_forced,
  case
    when c.relreplident = 'd' then 'DEFAULT'
    when c.relreplident = 'i' then 'INDEX'
    when c.relreplident = 'f' then 'FULL'
    else 'NOTHING'
  end as "replica_identity!",
  relation_size:: int8 as "bytes!",
  pg_size_pretty(relation_size) as "size!",
  pg_stat_get_live_tuples(c.oid) as "live_rows_estimate!",
  pg_stat_get_dead_tuples(c.oid) as "dead_rows_estimate!",
  obj_description(c.oid) as comment
from
  pg_namespace nc
  join pg_class c on nc.oid = c.relnamespace
  cross join lateral pg_total_relation_size(c.oid) relation_size
where
  c.relkind in ('r', 'p', 'v', 'm')
  and not pg_is_other_temp_schema(nc.oid)
  and (
    pg_has_role(c.relowner, 'USAGE')
    or has_table_privilege(
      c.oid,
      'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'
    )
    or has_any_column_privilege(c.oid, 'SELECT, INSERT, UPDATE, REFERENCES')
  )

The first step is to test that both queries produce the same results

❯ psql postgres://example_user:example_password@localhost:15432/example_db -t -A < current_query.sql | sort > current_query_results
❯ psql postgres://example_user:example_password@localhost:15432/example_db -t -A < new_query.sql | sort > new_query_results
❯ diff -s current_query_results new_query_results
Files current_query_results and new_query_results are identical

Performance analysis

Knowing both queries produce the same result lets check the performance

Prior to each hyperfine run we must restart the database to ensure queries run on equal context

❯ hyperfine 'psql postgres://example_user:example_password@localhost:15432/example_db -t -A < current_query.sql'
Benchmark 1: psql postgres://example_user:example_password@localhost:15432/example_db -t -A < current_query.sql
  Time (mean ± σ):      1.670 s ±  0.379 s    [User: 0.050 s, System: 0.041 s]
  Range (min … max):    1.542 s …  2.748 s    10 runs

❯ hyperfine 'psql postgres://example_user:example_password@localhost:15432/example_db -t -A < new_query.sql'
Benchmark 1: psql postgres://example_user:example_password@localhost:15432/example_db -t -A < new_query.sql
  Time (mean ± σ):      1.049 s ±  0.359 s    [User: 0.046 s, System: 0.039 s]
  Range (min … max):    0.929 s …  2.071 s    10 runs

Mean and max time differ on 700ms

tables info can be queried faster by calling pg_total_relation_size only
once per row
Copy link
Collaborator

@psteinroe psteinroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the verbose PR!

@psteinroe psteinroe merged commit f61d5a1 into supabase-community:main Sep 29, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants