Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

The state_group_edges table doesn't have a unique index; can severely impact WITH RECURSIVE state(state_group) AS ... query performance if duplicates are somehow introduced #11779

Closed
reivilibre opened this issue Jan 20, 2022 · 19 comments · Fixed by #12687
Assignees
Labels
A-Performance Performance, both client-facing and admin-facing S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@reivilibre
Copy link
Contributor

This is similar to, but not the same as, #10301.

The state_group_edges table doesn't have a unique index, so it's 'easy' to introduce duplicate rows e.g. by restoring a backup twice (or potentially even due to pure bugs in Synapse).

This query as part of _get_state_groups_from_groups:

WITH RECURSIVE state(state_group) AS ( VALUES(3405820::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC

would be seriously slowed down by the presence of duplicate rows (I seem to think someone mentioned it would return the square of the number of rows it would usually, but I haven't verified that and I can't find that again to reference it).

It may be worth adding a unique index or perhaps having an SQL snippet that administrators can run to recover from this (though I would favour the former — make it hard to do in the first place).

As an aside: state_group_edges can potentially contain loops but that shouldn't arise from recovering backups (as far as I know) and it's not so easy to do something about — but worth bearing in mind that it would cause the query to go forever.

@richvdh
Copy link
Member

richvdh commented Jan 20, 2022

I seem to think someone mentioned it would return the square of the number of rows it would usually, but I haven't verified that and I can't find that again to reference it

This was #5064 (comment). Suppose you've got 4 state groups (A,B,C,D), each linked with edges, and each with (say) one state event. Correct behaviour says you should get 4 events back.

If you double the edges, then you end up checking SG A once, B twice, C four times, D 8 times, for a total of 15 state groups (and hence 15 events). Hence, square.

@richvdh
Copy link
Member

richvdh commented Jan 20, 2022

see also #7772 for my first discovery of this issue.

@reivilibre
Copy link
Contributor Author

That sounds more exponential (2^n - 1) to me (i.e. worse) — if you add an E in the same configuration you'll check it 16 times for this (perhaps artificial) example.

@richvdh
Copy link
Member

richvdh commented Jan 20, 2022

errr good point.

@richvdh
Copy link
Member

richvdh commented Jan 20, 2022

(and given state group chains often run into a hundred links... that's a lot of rows)

@anoadragon453 anoadragon453 added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jan 24, 2022
@Half-Shot
Copy link
Collaborator

Half-Shot commented Feb 8, 2022

This has severely impacted my performance fwiw.

Just from tailing logs in postgres:

2022-02-08 10:25:39 UTC  synapse 00000 SELECTLOG:  duration: 10906.144 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10173932::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:25:47 UTC  synapse 00000 SELECTLOG:  duration: 11543.318 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10174020::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:25:49 UTC  synapse 00000 SELECTLOG:  duration: 9987.646 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10173619::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:25:57 UTC  synapse 00000 SELECTLOG:  duration: 9929.305 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10170127::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:25:59 UTC  synapse 00000 SELECTLOG:  duration: 10540.440 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10173620::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:09 UTC  synapse 00000 SELECTLOG:  duration: 10176.697 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10168566::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:10 UTC  synapse 00000 SELECTLOG:  duration: 12951.172 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10173904::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:11 UTC  synapse 00000 SELECTLOG:  duration: 14658.650 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10174633::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:13 UTC  synapse 00000 SELECTLOG:  duration: 16470.107 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10174634::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:20 UTC  synapse 00000 SELECTLOG:  duration: 10754.803 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10173945::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:23 UTC  synapse 00000 SELECTLOG:  duration: 13181.290 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10174044::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:25 UTC  synapse 00000 SELECTLOG:  duration: 11143.152 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10174016::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
2022-02-08 10:26:34 UTC  synapse 00000 SELECTLOG:  duration: 10926.478 ms  statement: WITH RECURSIVE state(state_group) AS ( VALUES(10173991::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state )  AND ((type = 'm.room.history_visibility' AND state_key = '') OR (type = 'm.room.member' AND state_key = '@Half-Shot:half-shot.uk')) ORDER BY type, state_key, state_group DESC
 Unique  (cost=11984.20..11984.21 rows=1 width=86)
   CTE state
     ->  Recursive Union  (cost=0.00..850.03 rows=101 width=8)
           ->  Result  (cost=0.00..0.01 rows=1 width=8)
           ->  Nested Loop  (cost=0.43..84.80 rows=10 width=8)
                 ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8)
                 ->  Index Scan using state_group_edges_idx on state_group_edges e  (cost=0.43..8.45 rows=1 width=16)
                       Index Cond: (state_group = s.state_group)
   ->  Sort  (cost=11134.17..11134.18 rows=1 width=86)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         ->  Nested Loop  (cost=108.45..11134.16 rows=1 width=86)
               ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8)
                     Group Key: state.state_group
                     ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8)
               ->  Bitmap Heap Scan on state_groups_state  (cost=106.17..110.20 rows=1 width=86)
                     Recheck Cond: ((state_group = state.state_group) AND (((state_key = ''::text) AND (type = 'm.room.history_visibility'::text)) OR ((state_key = '@Half-Shot:half-shot.uk'::text) AND (type = 'm.room.member'::text))))
                     ->  BitmapAnd  (cost=106.17..106.17 rows=1 width=0)
                           ->  Bitmap Index Scan on state_groups_state_state_group_type_state_key_idx  (cost=0.00..5.08 rows=50 width=0)
                                 Index Cond: (state_group = state.state_group)
                           ->  BitmapOr  (cost=100.85..100.85 rows=6423 width=0)
                                 ->  Bitmap Index Scan on state_groups_state_state_key_type  (cost=0.00..11.00 rows=630 width=0)
                                       Index Cond: ((state_key = ''::text) AND (type = 'm.room.history_visibility'::text))
                                 ->  Bitmap Index Scan on state_groups_state_state_key_type  (cost=0.00..86.63 rows=5793 width=0)
                                       Index Cond: ((state_key = '@Half-Shot:half-shot.uk'::text) AND (type = 'm.room.member'::text))
(24 rows)

@reivilibre
Copy link
Contributor Author

reivilibre commented Feb 8, 2022

@Half-Shot : to confirm it's caused by duplicate rows, would you mind running this?:

SELECT COUNT(1) from ( SELECT state_group, prev_state_group FROM state_group_edges GROUP BY state_group, prev_state_group HAVING COUNT(ctid) > 1 ) AS dupes;

This ran fairly quickly for me (less than a second for 295k rows in the table) but I can't guarantee whether it will or not for you, but it would be nice to confirm that this is the cause if it works easily.

@Half-Shot
Copy link
Collaborator

synapse=> SELECT COUNT(1) from ( SELECT state_group, prev_state_group FROM state_group_edges GROUP BY state_group, prev_state_group HAVING COUNT(ctid) > 1 ) AS dupes;

 count
-------
     0
(1 row)

Bah, looks like I don't have dupes, just regular slowness.

@reivilibre
Copy link
Contributor Author

A community member in the Synapse Admins room today came with an issue where this query was being slow.

Luckily I knew of this issue (and the query I posted two messages above came back with a non-zero result to confirm my hypothesis!).

What went wrong is that they cancelled (^C?) the import of their backup half-way through and later did it again. It's not that difficult a mistake to make.

'Luckily' their Synapse was so completely borked by the slow queries that they came looking for help and the suggested solution was to delete the database and freshly import it again.
(Theoretically we could write a query to fix the issue for this table, but there might be other cases in the database that we're not aware of and so restoring from fresh is the ideal course of action, considering there wasn't a risk of data loss since the homeserver was too broken to use anyway).

However in general it would be nice to prevent others from falling into this trap. As evidenced today: it does happen!

@DMRobertson
Copy link
Contributor

What went wrong is that they cancelled (^C?) the import of their backup half-way through and later did it again. It's not that difficult a mistake to make.

Can we (do we not?) wrap the import in a transaction to prevent this kind of thing?

(That said, I'd still rather have uniqueness requirements enforced by the schema)

@reivilibre
Copy link
Contributor Author

Can we (do we not?) wrap the import in a transaction to prevent this kind of thing?

That depends on what pgdump does. I can't say I know off the top of my head :/.

@DMRobertson
Copy link
Contributor

Can we (do we not?) wrap the import in a transaction to prevent this kind of thing?

That depends on what pgdump does. I can't say I know off the top of my head :/.

Looks like we'd have to opt-in to it with pg_restore --single-transaction. That prevents you from using --jobs to parallelise the slow bits though.

@artemislena
Copy link

artemislena commented Feb 9, 2022

fc17: That was me… and I used psql, not pg_restore

@chagai95
Copy link
Contributor

chagai95 commented Mar 5, 2022

I think I have the same issue, tried the dupes command:
Screenshot from 2022-03-05 14-36-31

I tried a few times to log in and every time it kind of made my server so slow until I restarted it sometimes I had to restart twice until it got back to working normally... I have some bridges configured but I really don't think I have that much trafic on my server...

@chagai95
Copy link
Contributor

chagai95 commented Mar 5, 2022

Maybe using workers will help this?

@richvdh
Copy link
Member

richvdh commented Mar 7, 2022

@chagai95 this issue is about performance problems caused by duplicate entries. As your query shows, you don't have any. Workers are not the right solution to duplicate rows.

@chagai95
Copy link
Contributor

chagai95 commented Mar 7, 2022

Ok thx, so I'll create s new ticket, thx!

@richvdh
Copy link
Member

richvdh commented May 4, 2022

I think the work here is to convert the existing index on state_group_edges.state_group to a unique index (presumably by creating a new index, and then deleting the old one).

(A technically preferable solution might be to replace the state_group_edges table with a nullable prev_state_group column on state_groups - but that is a pain in the butt to orchestrate.)

This should be a fairly simple thing to do via background updates (the register_background_index_update helper exists for this purpose).

@reivilibre reivilibre self-assigned this May 10, 2022
reivilibre added a commit that referenced this issue May 18, 2022
Should this be in state store instead...?
@reivilibre
Copy link
Contributor Author

Useful Queries

Postgres

Diagnosing the condition

SELECT COUNT(1) from ( SELECT state_group, prev_state_group FROM state_group_edges GROUP BY state_group, prev_state_group HAVING COUNT(ctid) > 1 ) AS dupes;

This will return 0 if the table is fine, or a bigger number if there are duplicate rows.

Solving the condition

BEGIN;
DELETE FROM state_group_edges WHERE (ctid, state_group, prev_state_group) IN (
  SELECT row_id, state_group, prev_state_group
  FROM (
    SELECT
      ctid AS row_id,
      MIN(ctid) OVER (PARTITION BY state_group, prev_state_group) AS min_row_id,
      state_group,
      prev_state_group
    FROM state_group_edges
  ) AS t1
  WHERE row_id <> min_row_id
);
COMMIT;

If you want some peace of mind, check that psql says DELETE xyz where xyz
matches the number from the first query, before entering COMMIT.

SQLite

Diagnosing the condition

SELECT COUNT(1) from ( SELECT state_group, prev_state_group FROM state_group_edges GROUP BY state_group, prev_state_group HAVING COUNT(rowid) > 1 ) AS dupes;

This will return 0 if the table is fine, or a bigger number if there are duplicate rows.

Solving the condition

Using the sqlite3 command-line tool:

BEGIN;
DELETE FROM state_group_edges WHERE (rowid, state_group, prev_state_group) IN (
  SELECT row_id, state_group, prev_state_group
  FROM (
    SELECT
      rowid AS row_id,
      MIN(rowid) OVER (PARTITION BY state_group, prev_state_group) AS min_row_id,
      state_group,
      prev_state_group
    FROM state_group_edges
  )
  WHERE row_id <> min_row_id
);
COMMIT;

If you want some peace of mind, enter .changes on first and before you
COMMIT, check that the number of changes matches the number obtained from
the first query.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
8 participants