Simplify query in bootstrap_query method#310
Simplify query in bootstrap_query method#310jordanlewis wants to merge 1 commit intoelixir-ecto:masterfrom
Conversation
Previously, bootstrap_query constructed a pretty complicated `SELECT` statement that used a subquery, array and generate_series to exclude rows whose oids were part of a predefined set. The query looked like this: ``` SELECT ... WHERE t.oid NOT IN (SELECT (ARRAY[integer, literals, ...])[i] FROM generate_series(1, length_of_above_array)) ``` This subquery is now replaced by a simpler list of integer literals, which should be more performant and easier to understand. Now, the query looks like this: ``` SELECT ... WHERE t.oid NOT IN (integer, literals, ...) ```
|
Hello, @jordanlewis! This is your first Pull Request that will be reviewed by Ebert, an automatic Code Review service. It will leave comments on this diff with potential issues and style violations found in the code as you push new commits. You can also see all the issues found on this Pull Request on its review page. Please check our documentation for more information. |
|
Can you check the query plan for this on postgres < 9.6, i believe
postgresql will make a suboptimal decision causing O(n^2) complexity.
…On 4 Apr 2017 7:53 pm, "Jordan Lewis" ***@***.***> wrote:
Previously, bootstrap_query constructed a pretty complicated SELECT
statement that used a subquery, array and generate_series to exclude
rows whose oids were part of a predefined set. The query looked like
this:
SELECT ... WHERE t.oid NOT IN
(SELECT (ARRAY[integer, literals, ...])[i]
FROM generate_series(1, length_of_above_array))
This subquery is now replaced by a simpler list of integer literals,
which should be more performant and easier to understand.
Now, the query looks like this:
SELECT ... WHERE t.oid NOT IN (integer, literals, ...)
This should work in all versions of Postgres and Redshift, and works
around a typing issue in CockroachDB: cockroachdb/cockroach#14554
<cockroachdb/cockroach#14554>
------------------------------
You can view, comment on, or merge this pull request online at:
#310
Commit Summary
- Simplify query in bootstrap_query method
File Changes
- *M* lib/postgrex/types.ex
<https://github.com/elixir-ecto/postgrex/pull/310/files#diff-0> (7)
Patch Links:
- https://github.com/elixir-ecto/postgrex/pull/310.patch
- https://github.com/elixir-ecto/postgrex/pull/310.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#310>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB6JTfEIAkgEsPLiYf6_UNO0I1daVW8Fks5rspGIgaJpZM4MzRx1>
.
|
|
It seems like 9.5 and 9.6 both claim to plan the simpler query as a simple filter, which does sound like it would be I think it's probably not a big deal either way, and the simpler query has the benefit of being much easier to understand. On 9.5: On 9.6: |
|
Also, now that I think about it, even if the more complex query is in fact hashing the array as it claims with "Hashed SubPlan 1", wouldn't the overall query still be an Regardless, I'm still of the opinion that the efficiency difference here is negligible because of the small size of the data involved. |
|
Unfortunately the efficiency difference is significant and a function scan can be orders of magnitude slower for some users. The hashed anti join is O(nlog(n)) IIRC for the expensive part. |
|
I don't believe that the more complicated query uses a hashed anti-join, though, since it doesn't show up in the EXPLAIN output. When a plan actually uses a hashed anti-join it shows up prominently in EXPLAIN. Can you give an example of an instance of an order of magnitude slowdown between these two queries? |
|
On a local database this branch takes approximately 37 seconds and master takes 100 milliseconds. |
|
Wow! That's a huge difference. 37 seconds/100ms to do what though? I'd love to reproduce it locally. |
|
That is the execution time when I ran {:ok, pid} = Postgrex.start_link([])
%{types: types} = Postgrex.prepare!(pid, "SELECT 42", [])
statement = Postgrex.Types.bootstrap_query({9, 5, 5}, types)
%{rows: rows} = Postgrex.query!(pid, ["EXPLAIN ANALYSE " | statement], [], [timeout: :infinity])
IO.puts Enum.join(rows, "\n") |
|
Thank you! Clearly this PR won't work. |
|
I was hoping you could figure out a query that has similar efficiency and works in cockroach and redshift :( |
|
Another option is to use a |
|
Hmm, I'm wondering what the purpose of this query is in general. It seems like |
|
I'm not sure that redshift supports that query.
…On 5 Apr 2017 2:53 am, "Jordan Lewis" ***@***.***> wrote:
Another option is to use a VALUES clause instead of the array subquery.
This seems to also produce a hashed subplan in Postgres, which I assume
will get hash anti-joined just the same as the other version.
jordan=# explain analyze select typname from pg_type where oid not in (values(26),(27),(28),(29),(30));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on pg_type (cost=0.07..13.56 rows=180 width=64) (actual time=0.037..0.180 rows=365 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 5
SubPlan 1
-> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=4) (actual time=0.004..0.004 rows=5 loops=1)
Planning time: 0.075 ms
Execution time: 0.241 ms
(7 rows)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#310 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB6JTSyiBJuKjR09uvUdrqHucCrTPguQks5rsvQsgaJpZM4MzRx1>
.
|
On connect we want to fetch all rows we don't currently know about. |
|
redshift does not support |
Previously, bootstrap_query constructed a pretty complicated
SELECTstatement that used a subquery, array and generate_series to exclude
rows whose oids were part of a predefined set. The query looked like
this:
This subquery is now replaced by a simpler list of integer literals,
which should be more performant and easier to understand.
Now, the query looks like this:
This should work in all versions of Postgres and Redshift, and works around a typing issue in CockroachDB: cockroachdb/cockroach#14554