Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Very slow deletes from device_inbox during /sync requests #16479

Closed
richvdh opened this issue Oct 12, 2023 · 12 comments
Closed

Very slow deletes from device_inbox during /sync requests #16479

richvdh opened this issue Oct 12, 2023 · 12 comments
Assignees

Comments

@richvdh
Copy link
Member

richvdh commented Oct 12, 2023

Description

Synchrotron workers are maxing out db connections, resulting in multi-minute scheduling time:

image

Inspection shows that these are operations of the shape

                DELETE FROM device_inbox WHERE ctid IN (
                  SELECT ctid FROM device_inbox
                  WHERE user_id = '@<user_id>' AND device_id = 'LTHNOYWSMJ' AND stream_id <= 7526430
                  LIMIT 100
                )

Some queries have LIMIT 1000 rather than LIMIT 100.

Steps to reproduce

  • have lots of messages in device_inbox
  • /sync

Homeserver

EMS host

Synapse Version

1.94.0

Installation Method

Docker (matrixdotorg/synapse)

Database

postgres 11

Workers

Multiple workers

Platform

k8s

Configuration

No response

Relevant log output

...

Anything else that would be useful to know?

No response

@richvdh
Copy link
Member Author

richvdh commented Oct 12, 2023

an EXPLAIN on the queries in question shows that they are sequence-scanning the table to do the delete:

Delete on device_inbox  (cost=8.60..2559182.13 rows=1 width=36)
  ->  Nested Loop  (cost=8.60..2559182.13 rows=1 width=36)
        Join Filter: (device_inbox.ctid = "ANY_subquery".ctid)
        ->  Unique  (cost=8.60..8.61 rows=1 width=36)
              ->  Sort  (cost=8.60..8.61 rows=1 width=36)
                    Sort Key: "ANY_subquery".ctid
                    ->  Subquery Scan on "ANY_subquery"  (cost=0.56..8.59 rows=1 width=36)
                          ->  Limit  (cost=0.56..8.58 rows=1 width=6)
                                ->  Index Scan using device_inbox_user_stream_id on device_inbox device_inbox_1  (cost=0.56..8.58 rows=1 width=6)
                                      Index Cond: ((user_id = '<user_id>'::text) AND (device_id = 'SAJAWKWGHO'::text) AND (stream_id <= 7535635))
        ->  Seq Scan on device_inbox  (cost=0.00..2376344.90 rows=14626290 width=6)

Apparently the query optimiser isn't smart enough to know that it would do better searching for each of the 100 ctids individually than scanning the whole table.

@richvdh
Copy link
Member Author

richvdh commented Oct 12, 2023

This code changed recently in #16240 and again in #16312.

A couple of thoughts:

  • I'm surprised it was necessary to change the code in the SyncHandler at all. We know that will be deleting at most 100 rows, because they are exactly the rows that were served to the client on the previous /sync. AFAICT there is no need to LIMIT the delete, and therefore no need to mess about with ctid and no need for the fallback-to-async-task.
  • For the cleanup when a device is deleted, we should probably pick a stream_id to delete up to for each batch, rather than messing with ctid.

@MatMaul MatMaul self-assigned this Oct 13, 2023
@MatMaul
Copy link
Contributor

MatMaul commented Oct 13, 2023

Apparently the query optimiser isn't smart enough to know that it would do better searching for each of the 100 ctids individually than scanning the whole table.

Sadness...
https://postgrespro.com/list/thread-id/2446912

For the cleanup when a device is deleted, we should probably pick a stream_id to delete up to for each batch, rather than messing with ctid.

something along that?

DELETE FROM device_inbox WHERE user_id = ? AND device_id = ? AND stream_id <= (
  SELECT MAX(stream_id) FROM device_inbox
  WHERE user_id = '@<user_id>' AND device_id = 'LTHNOYWSMJ' AND stream_id <= 7526430
  LIMIT 100
)

I'm surprised it was necessary to change the code in the SyncHandler at all. We know that will be deleting at most 100 rows, because they are exactly the rows that were served to the client on the previous /sync. AFAICT there is no need to LIMIT the delete, and therefore no need to mess about with ctid and no need for the fallback-to-async-task.

🤦 Indeed, I just didn't think about it.

@kegsay
Copy link
Member

kegsay commented Oct 13, 2023

The query planner woes sounds similar to matrix-org/sliding-sync@fa67467 except in that case we wanted a seq scan.

@richvdh
Copy link
Member Author

richvdh commented Oct 13, 2023

Apparently the query optimiser isn't smart enough to know that it would do better searching for each of the 100 ctids individually than scanning the whole table.

Sadness... https://postgrespro.com/list/thread-id/2446912

For the cleanup when a device is deleted, we should probably pick a stream_id to delete up to for each batch, rather than messing with ctid.

something along that?

DELETE FROM device_inbox WHERE user_id = ? AND device_id = ? AND stream_id <= (
  SELECT MAX(stream_id) FROM device_inbox
  WHERE user_id = '@<user_id>' AND device_id = 'LTHNOYWSMJ' AND stream_id <= 7526430
  LIMIT 100
)

Yes. Except you need an ORDER BY stream_id in the inner query.

And I don't think you need stream_id <= 7526430 for device deletions (we need to delete all messages for the given device, irrespective of stream_id). It's harmless, but redundant.

@MatMaul
Copy link
Contributor

MatMaul commented Oct 13, 2023

And I don't think you need stream_id <= 7526430 for device deletions (we need to delete all messages for the given device, irrespective of stream_id). It's harmless, but redundant.

Someone can create a device with the same ID, and without the check it would then delete the messages aimed at the new device.

@richvdh
Copy link
Member Author

richvdh commented Oct 13, 2023

well, that would be a stupid thing to do :)

But fair enough. Stick a comment in, though - that's quite subtle!

@MatMaul
Copy link
Contributor

MatMaul commented Oct 14, 2023

I don't get the same plan with my deploy on pg15:

 Delete on device_inbox  (cost=1801.45..1962.75 rows=0 width=0)
   ->  Nested Loop  (cost=1801.45..1962.75 rows=40 width=36)
         ->  HashAggregate  (cost=1801.45..1801.85 rows=40 width=36)
               Group Key: "ANY_subquery".ctid
               ->  Subquery Scan on "ANY_subquery"  (cost=1641.08..1801.35 rows=40 width=36)
                     ->  Limit  (cost=1641.08..1800.95 rows=40 width=6)
                           ->  Bitmap Heap Scan on device_inbox device_inbox_1  (cost=1641.08..1800.95 rows=40 width=6)
                                 Recheck Cond: ((user_id = '@mxid'::text) AND (device_id = 'DEVICEID'::text))
                                 ->  Bitmap Index Scan on device_inbox_user_stream_id  (cost=0.00..1641.07 rows=40 width=0)
                                       Index Cond: ((user_id = '@mxid'::text) AND (device_id = 'DEVICEID'::text))
         ->  Tid Scan on device_inbox  (cost=0.00..4.01 rows=1 width=6)
               TID Cond: (ctid = "ANY_subquery".ctid)
(12 rows)

With #16492 , I get:

 Delete on device_inbox  (cost=2178.99..2231.08 rows=0 width=0)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=1646.15..1646.16 rows=1 width=8)
           ->  Limit  (cost=0.67..1645.65 rows=40 width=8)
                 ->  Index Only Scan using device_inbox_user_stream_id on device_inbox device_inbox_1  (cost=0.67..1645.65 rows=40 width=8)
                       Index Cond: ((user_id = '@mat:tout.im'::text) AND (device_id = 'CCUJZYMJLP'::text))
   ->  Bitmap Heap Scan on device_inbox  (cost=532.83..584.92 rows=13 width=6)
         Recheck Cond: ((user_id = '@mat:tout.im'::text) AND (device_id = 'CCUJZYMJLP'::text) AND (stream_id <= $0))
         ->  Bitmap Index Scan on device_inbox_user_stream_id  (cost=0.00..532.83 rows=13 width=0)
               Index Cond: ((user_id = '@mat:tout.im'::text) AND (device_id = 'CCUJZYMJLP'::text) AND (stream_id <= $0))
(10 rows)

So it's probably a tad slower on more modern PG, but at least we will (?) not get a seqscan on older ones.

@richvdh could you do a comparison with your test DB to check the new query is ok on PG11?

Also note that my new query delete 10 rows and not 12, I need to check why.

@MatMaul
Copy link
Contributor

MatMaul commented Oct 14, 2023

The indicated rows is the number of lines for the plan and not the number of deleted records 😭 .

I have executed the 2 inner SELECT and they do return the same number of rows for my test case (125 rows).

@clokep
Copy link
Contributor

clokep commented Oct 17, 2023

I think this was fixed by #16491?

@clokep clokep closed this as completed Oct 17, 2023
@richvdh
Copy link
Member Author

richvdh commented Oct 18, 2023

#16491 will only have fixed half the problem.

To-device message deletions happen in two places: 1) after they are successfully sent to the target device via /sync; 2) when the target device is deleted. Both were using delete_messages_for_device_txn which uses a very inefficient query plan on Postgres 11.

#16491 changes the /sync case to no longer use delete_messages_for_device_txn and therefore fixes case 1, but does not change the query plan in that function so does not fix case 2.

@richvdh richvdh reopened this Oct 18, 2023
@richvdh
Copy link
Member Author

richvdh commented Oct 18, 2023

oh, in fairness, the subject of this issue is just about the /sync requests, so yes, I guess it is fixed. Sorry.

For completeness, the device-deletion side looks to be covered in #16492.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants