Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ANY instead of IN for SQL query with variable number of placeholders #4713

Merged
merged 1 commit into from
Jun 21, 2022

Conversation

kraih
Copy link
Member

@kraih kraih commented Jun 21, 2022

Since enabling pg_stat_statements we are seeing lots of queries like this
on OSD with a variable number of placeholders at the top of the list:

SELECT me.job_id, me.key, me.value FROM job_settings me
WHERE ( job_id IN ( $1, $2, $3, $4 ...many more

This patch replaces IN with ANY, which uses an array to pass the values
and therefore only requires one prepared SQL query like:

SELECT me.job_id, me.key, me.value FROM job_settings me
WHERE ( job_id = ANY(?) );

Under ideal circumstances both should actually result in the exact same
query plan. But under less than ideal circumstances the ANY variant
should be a little more reliable, since there is only one version for
the query planner to keep track of. And of course it won't clutter up
our pg_stat_statements statistics.

@kraih
Copy link
Member Author

kraih commented Jun 21, 2022

Before:

openqa=# EXPLAIN ANALYZE SELECT me.job_id, me.key, me.value FROM job_settings me WHERE ( job_id IN (80000, 99764, 99926, 99927, 99928, 99936, 99937, 99938, 99939, 99940, 99946, 99947, 99961, 99963, 99981));
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on job_settings me  (cost=29460.75..567343.02 rows=2643885 width=58) (actual time=0.043..0.044 rows=0 loops=1)
   Recheck Cond: (job_id = ANY ('{80000,99764,99926,99927,99928,99936,99937,99938,99939,99940,99946,99947,99961,99963,99981}'::bigint[]))
   ->  Bitmap Index Scan on job_settings_idx_job_id  (cost=0.00..28799.74 rows=2643885 width=0) (actual time=0.041..0.041 rows=0 loops=1)
         Index Cond: (job_id = ANY ('{80000,99764,99926,99927,99928,99936,99937,99938,99939,99940,99946,99947,99961,99963,99981}'::bigint[]))
 Planning Time: 0.194 ms
 JIT:
   Functions: 4
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.758 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.758 ms
 Execution Time: 0.907 ms
(10 rows)

After:

openqa=# EXPLAIN ANALYZE SELECT me.job_id, me.key, me.value FROM job_settings me WHERE ( job_id = ANY('{"80000","99764","99926","99927","99928","99936","99937","99938","99939","99940","99946","99947","99961","99963","99981"}') );
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on job_settings me  (cost=29460.75..567343.02 rows=2643885 width=58) (actual time=0.048..0.049 rows=0 loops=1)
   Recheck Cond: (job_id = ANY ('{80000,99764,99926,99927,99928,99936,99937,99938,99939,99940,99946,99947,99961,99963,99981}'::bigint[]))
   ->  Bitmap Index Scan on job_settings_idx_job_id  (cost=0.00..28799.74 rows=2643885 width=0) (actual time=0.045..0.045 rows=0 loops=1)
         Index Cond: (job_id = ANY ('{80000,99764,99926,99927,99928,99936,99937,99938,99939,99940,99946,99947,99961,99963,99981}'::bigint[]))
 Planning Time: 0.135 ms
 JIT:
   Functions: 4
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.702 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.702 ms
 Execution Time: 0.868 ms
(10 rows)

Since enabling pg_stat_statements we are seeing lots of queries like
this on OSD with a variable number of placeholders at the top of the
list:

```
SELECT me.job_id, me.key, me.value FROM job_settings me
WHERE ( job_id IN ( $1, $2, $3, $4 ...many more
```

This patch replaces IN with ANY, which uses an array to pass the values
and therefore only requires one prepared SQL query like:

```
SELECT me.job_id, me.key, me.value FROM job_settings me
WHERE ( job_id = ANY(?) );
```

Under ideal circumstances both should actually result in the exact same
query plan. But under less than ideal circumstances the ANY variant
should be a little more reliable, since there is only one version for
the query planner to keep track of. And of course it won't clutter up
our pg_stat_statements statistics.
@kraih kraih marked this pull request as ready for review June 21, 2022 14:29
Copy link
Contributor

@Martchus Martchus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you know what you're doing (syntax-wise) and our tests should catch regressions.

@kraih
Copy link
Member Author

kraih commented Jun 21, 2022

The queries have good test coverage and i ran DBI_TRACE=SQL openqa-test t/api/04-jobs.t a lot to check the exact results.

@codecov
Copy link

codecov bot commented Jun 21, 2022

Codecov Report

Merging #4713 (9d0a39c) into master (7831729) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #4713   +/-   ##
=======================================
  Coverage   98.07%   98.07%           
=======================================
  Files         374      374           
  Lines       34674    34674           
=======================================
  Hits        34008    34008           
  Misses        666      666           
Impacted Files Coverage Δ
lib/OpenQA/WebAPI/Controller/API/V1/Job.pm 95.67% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7831729...9d0a39c. Read the comment docs.

@mergify mergify bot merged commit d850257 into master Jun 21, 2022
@Martchus Martchus deleted the k/less_placeholders branch June 21, 2022 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants