Skip to content

Conversation

@miketheman
Copy link
Member

When a journal entry changes, the update_project_last_serial trigger fires, which runs maintain_project_last_serial().

This function looks for the max(id) for a given name, meaning it first has to find all records for name, and then find the highest id (at the "end" of the records).

By creating a compound index that has the name first, and then the most recent id at the top of the index should return much faster, speeding up any trigger/function executions, which are sadly somewhat opaque to our instrumentation.

When a journal entry changes, the `update_project_last_serial` trigger
fires, which runs `maintain_project_last_serial()`.

This function looks for the `max(id)` for a given `name`, meaning it
first has to find all records for `name`, and then find the highest
`id` (at the "end" of the records).

By creating a compound index that has the name first, and then the most
recent `id` at the **top** of the index should return much faster,
speeding up any trigger/function executions, which are sadly somewhat
opaque to our instrumentation.

Signed-off-by: Mike Fiedler <miketheman@gmail.com>
@miketheman miketheman requested a review from a team as a code owner November 14, 2025 18:37
@miketheman
Copy link
Member Author

Here's some comparisons of the query plans.

The effective function query is:

SELECT max(id) as last_serial
    FROM journals
    WHERE journals.name = 'foo';

production (without this new index)

Ignore the costs and timings, they are polluted by me testing queries earlier, what's important is the plan structure

warehouse=> explain analyze SELECT max(id) as last_serial
                    FROM journals
                    WHERE journals.name = 'foo';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1590.48..1590.49 rows=1 width=4) (actual time=1.925..1.926 rows=1 loops=1)
   ->  Bitmap Heap Scan on journals  (cost=7.55..1589.48 rows=402 width=4) (actual time=1.921..1.922 rows=3 loops=1)
         Recheck Cond: (name = 'foo'::text)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on journals_name_idx  (cost=0.00..7.45 rows=402 width=0) (actual time=1.214..1.215 rows=3 loops=1)
               Index Cond: (name = 'foo'::text)
 Planning Time: 0.070 ms
 Execution Time: 1.947 ms

development

without the new index

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=93.21..93.22 rows=1 width=4) (actual time=0.908..0.910 rows=1 loops=1)
   ->  Bitmap Heap Scan on journals  (cost=4.60..93.15 rows=23 width=4) (actual time=0.893..0.895 rows=4 loops=1)
         Recheck Cond: (name = 'foo'::text)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on journals_name_idx  (cost=0.00..4.60 rows=23 width=0) (actual time=0.400..0.401 rows=4 loops=1)
               Index Cond: (name = 'foo'::text)
 Planning Time: 1.248 ms
 Execution Time: 0.962 ms

Same structural approach - the cost and timings are not important since the data, resources are vastly different.
But there's no combined index today that covers both name and id - so there's a recheck condition.

with the new index

                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.79..0.80 rows=1 width=4) (actual time=0.847..0.850 rows=1 loops=1)
   InitPlan 1
     ->  Limit  (cost=0.42..0.79 rows=1 width=4) (actual time=0.831..0.833 rows=1 loops=1)
           ->  Index Only Scan using journals_name_id_idx on journals  (cost=0.42..8.83 rows=23 width=4) (actual time=0.827..0.828 rows=1 loops=1)
                 Index Cond: (name = 'foo'::text)
                 Heap Fetches: 0
 Planning Time: 0.940 ms
 Execution Time: 0.918 ms

So we can see that with the new index applied, we'd only ever need to check one index to satisfy the query. We'll still hit the function N times for every journal update via the trigger, but at least now the speed to find each one should be faster.

@ewdurbin ewdurbin merged commit 661c06f into pypi:main Nov 14, 2025
21 checks passed
@ewdurbin
Copy link
Member

warehouse=> explain analyze SELECT max(id) as last_serial                                                                                                         
                    FROM journals                                                                                                                     
                    WHERE journals.name = 'foo';
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.89..1.90 rows=1 width=4) (actual time=0.015..0.016 rows=1 loops=1)
   InitPlan 1
     ->  Limit  (cost=0.56..1.89 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)
           ->  Index Only Scan using journals_name_id_idx on journals  (cost=0.56..559.95 rows=422 width=4) (actual time=0.013..0.013 rows=1 loops=1)
                 Index Cond: (name = 'foo'::text)
                 Heap Fetches: 0
 Planning Time: 0.081 ms
 Execution Time: 0.024 ms
(8 rows)

Not too bad!

@miketheman miketheman deleted the miketheman/19033-follow-up-index branch November 14, 2025 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants