Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Use a query that postgresql optimises better for get_events_around #906

Merged
merged 5 commits into from Jul 5, 2016

Conversation

Projects
None yet
2 participants
Contributor

NegativeMjark commented Jul 4, 2016

No description provided.

Owner

erikjohnston commented Jul 4, 2016

LGTM

Contributor

NegativeMjark commented Jul 4, 2016

@erikjohnston Having looked at how postgres handles the UNION all version I'm less sure that it's actually doing what we want here. It turns out that postgres really wants you to format things as (a,b) < (x,y) to use multi-column indexes.

I've switched everything to use lower_bound/upper_bound from the streams.py and made those functions switch on the database engine to decide which format of SQL to use :(

Owner

erikjohnston commented Jul 5, 2016

Having looked at how postgres handles the UNION all version I'm less sure that it's actually doing what we want here.

Can you elaborate?

@erikjohnston erikjohnston commented on the diff Jul 5, 2016

synapse/storage/stream.py
token.topological, "topological_ordering",
token.topological, "topological_ordering",
- token.stream, "stream_ordering",
+ token.stream, inclusive, "stream_ordering",
@erikjohnston

erikjohnston Jul 5, 2016

Owner

Is this not vulnerable to SQL injection?

@NegativeMjark

NegativeMjark Jul 5, 2016

Contributor

It would be if inclusive was ever specified by the user.

@NegativeMjark

NegativeMjark Jul 5, 2016

Contributor

I've made inclusive a boolean to make it a bit clearer. d44d11d

Contributor

NegativeMjark commented Jul 5, 2016 edited

Can you elaborate?

When I ran that query in a test harness it resulted in a scan over half the index.

DROP TABLE IF EXISTS t;
CREATE TABLE t (c INT, a INT, b INT);

INSERT INTO t (c,a,b) SELECT 0, data.*, data.* FROM
    (SELECT * FROM generate_series(1,1000000)) AS data;

CREATE INDEX t_c_a_b ON t USING btree (c,a,b);

ANALYZE t;

EXPLAIN ANALYSE
        SELECT a,b FROM t WHERE c = 0 AND a < 500000
    UNION ALL
        SELECT a,b FROM t WHERE c = 0 AND a = 500000 AND b < 100
    ORDER BY a DESC, b DESC LIMIT 1;

Resulted in:

                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=24416.59..24416.59 rows=1 width=8) (actual time=123.093..123.094 rows=1 loops=1)
   ->  Sort  (cost=24416.59..25663.78 rows=498876 width=8) (actual time=123.093..123.093 rows=1 loops=1)
         Sort Key: t.a DESC, t.b DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Append  (cost=0.00..21922.21 rows=498876 width=8) (actual time=0.009..91.581 rows=499999 loops=1)
               ->  Seq Scan on t  (cost=0.00..16925.00 rows=498875 width=8) (actual time=0.009..71.667 rows=499999 loops=1)
                     Filter: (a < 500000)
                     Rows Removed by Filter: 500001
               ->  Index Only Scan using t_a_b on t t_1  (cost=0.42..8.45 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=1)
                     Index Cond: ((a = 500000) AND (b < 100))
                     Heap Fetches: 0
 Planning time: 0.050 ms
 Execution time: 123.109 ms
(13 rows)
Use true/false for boolean parameter inclusive to avoid potential for…
… sqli, and possibly make the code clearer
Owner

erikjohnston commented Jul 5, 2016

This presumably still has terrible performance on SQLite?

Contributor

NegativeMjark commented Jul 5, 2016

This presumably still has terrible performance on SQLite?

Yes. Interestingly SQLite does a reasonable job of optimising the UNION ALL version of the query.

sqlite> EXPLAIN query plan  
   ...> SELECT a,b FROM t WHERE c = 0 AND (a < 50000 OR (a = 50000 AND b < 100))
   ...> ORDER BY a DESC, b DESC LIMIT 1;
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=?)
sqlite> EXPLAIN query plan
   ...>         SELECT a,b FROM t WHERE c = 0 AND a < 50000
   ...>     UNION ALL
   ...>         SELECT a,b FROM t WHERE c = 0 AND a = 50000 AND b < 100
   ...>     ORDER BY a DESC, b DESC LIMIT 1;
1|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a<?)
2|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a=? AND b<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
Contributor

NegativeMjark commented Jul 5, 2016

So for a sqlite3 database only the UNION ALL form has the desired performance :(

sqlite> EXPLAIN query plan SELECT a,b FROM t WHERE c = 0 AND (a < 500000 OR (a = 500000 AND b < 100)) ORDER BY a DESC, b DESC LIMIT 1;
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=?)

sqlite> EXPLAIN query plan SELECT a,b FROM t WHERE (c = 0 AND a < 500000) OR (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1; 
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a<?)
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a=? AND b<?)
0|0|0|USE TEMP B-TREE FOR ORDER BY

sqlite> EXPLAIN query plan SELECT a,b FROM t WHERE (c = 0 AND a < 500000) UNION ALL SELECT a,b FROM t WHERE (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1;
1|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a<?)
2|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a=? AND b<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
In [47]: %time f("SELECT a,b FROM t WHERE c = 0 AND (a < 500000 OR (a = 500000 AND b < 100)) ORDER BY a DESC, b DESC LIMIT 1;", ())
CPU times: user 61.6 ms, sys: 7 ms, total: 68.6 ms
Wall time: 68 ms
Out[47]: [(499999, 499999)]

In [48]: %time f("SELECT a,b FROM t WHERE (c = 0 AND a < 500000) OR (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1;", ())
CPU times: user 279 ms, sys: 8.64 ms, total: 287 ms
Wall time: 287 ms
Out[48]: [(499999, 499999)]

in [49]: %time f("SELECT a,b FROM t WHERE (c = 0 AND a < 500000) UNION ALL SELECT a,b FROM t WHERE (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1;", ())
CPU times: user 368 us, sys: 75 us, total: 443 us
Wall time: 268 us
Out[49]: [(499999, 499999)]
Contributor

NegativeMjark commented Jul 5, 2016

Complete test harness for postgresql fwiw

btree_desc.sql.txt

Contributor

NegativeMjark commented Jul 5, 2016

Owner

erikjohnston commented Jul 5, 2016

LGTM

@NegativeMjark NegativeMjark merged commit 04dee11 into develop Jul 5, 2016

10 checks passed

Flake8 + Packaging (Commit) Build #1097 origin/markjh/faster_events_around succeeded in 34 sec
Details
Flake8 + Packaging (Merged PR) Build finished.
Details
Sytest Dendron (Commit) Build #213 origin/markjh/faster_events_around succeeded in 6 min 47 sec
Details
Sytest Dendron (Merged PR) Build finished.
Details
Sytest Postgres (Commit) Build #1053 origin/markjh/faster_events_around succeeded in 6 min 6 sec
Details
Sytest Postgres (Merged PR) Build finished.
Details
Sytest SQLite (Commit) Build #1069 origin/markjh/faster_events_around succeeded in 5 min 26 sec
Details
Sytest SQLite (Merged PR) Build finished.
Details
Unit Tests (Commit) Build #1136 origin/markjh/faster_events_around succeeded in 1 min 25 sec
Details
Unit Tests (Merged PR) Build finished.
Details

@richvdh richvdh deleted the markjh/faster_events_around branch Dec 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment