Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Database transaction get_unread_event_push_actions_by_room churns database CPU #11887

Closed
jaywink opened this issue Feb 2, 2022 · 11 comments · Fixed by #13005
Closed

Database transaction get_unread_event_push_actions_by_room churns database CPU #11887

jaywink opened this issue Feb 2, 2022 · 11 comments · Fixed by #13005
Assignees
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@jaywink
Copy link
Member

jaywink commented Feb 2, 2022

Description

We have a host in EMS with a room that is getting approx 2 events per second into it. A few users sitting in the room are churning the database CPU due to constant sync which is causing the synchrotron process to spend a large amount of transaction time in get_unread_event_push_actions_by_room.

image

image

The database query is:

SELECT COUNT(CASE WHEN notif = ? THEN ? END), COUNT(CASE WHEN highlight = ? THEN ? END), COUNT(CASE WHEN unread = ? THEN ? END) FROM event_push_actions ea WHERE user_id = ? AND room_id = ? AND stream_ordering > ?

Two of the users in the room are causing an equal 50/50 of the database load.

Database metrics showing the query being mostly CPU time.

image

More context in https://matrix.to/#/!ixSKkmjfDhyFWsKSEY:federator.dev/$ZFJ7IDlk0gt_l8ht9hEK3DZVg5mhucbaEtM4qCKPiQk?via=matrix.org&via=vector.modular.im

Version information

  • Homeserver: EMS, customer in context link

  • Version: v1.51.0

  • Install method: Docker, official images

  • Platform: EMS

@clokep
Copy link
Contributor

clokep commented Feb 2, 2022

We did some investigation into this today and it was due to a room with a high rate of events, but with users with very old read receipts. This caused them to query in _get_unread_counts_by_pos_txn (mentioned above) to do a table scan instead of using the indexes.

We pretty much did a:

SELECT count(stream_ordering), user_id FROM event_push_actions WHERE room_id = '...' GROUP by user_id;

This showed a few users with ~100k pending entries while most users were much smaller. This is due to those few users never reading that room (and thus never updating their read receipt).

A short term fix was to delete the rows for their users, which will cause incorrect unread message counts for those rooms.

A long term fix is probably to make the notification count calculation incremental instead of queuing all messages. This should make it robust against users not reading rooms.

@clokep
Copy link
Contributor

clokep commented Feb 2, 2022

#11893 was also found while investigating this but we do not believe to be directly causing this issue.

@richvdh
Copy link
Member

richvdh commented Feb 2, 2022

vaguely related: #5569

@clokep clokep added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Feb 14, 2022
@erikjohnston
Copy link
Member

My understanding was that we moved push actions to a summary table after 24 hours? Though that doesn't happen for "highlight" push actions IIRC.

Next time this goes off we should figure out why the query takes ages to run, is there lots of push actions in the last 24 hours or are they highlights? Sounds like last time it was filled with push actions for the affected user and so the table got table scanned.

@erikjohnston
Copy link
Member

Maybe a solution is to aggregate more aggressively if there are lots of push actions, e.g. move to push actions summary table if there are more than 100 rows. The reason not to do that is that the counts won't be as accurate, but that hardly matters when the count hundreds.

@erikjohnston
Copy link
Member

Also looks like we will rotate at maximum ~3333 events/s (a limit of 10000 rows then a delay of 3s).

@richvdh
Copy link
Member

richvdh commented Feb 23, 2022

Next time this goes off we should figure out why the query takes ages to run, is there lots of push actions in the last 24 hours or are they highlights? Sounds like last time it was filled with push actions for the affected user and so the table got table scanned.

Just lots of regular unread messages.
The query-plan looks like:

explain SELECT COUNT(CASE WHEN notif = 1 THEN 1 END), COUNT(CASE WHEN highlight = 1 THEN 1 END), COUNT(CASE WHEN unread = 1 THEN 1 END) FROM event_push_actions ea WHERE user_id = '@<user>' AND room_id = '!<room>' AND stream_ordering > 13329084;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=22045.65..22045.66 rows=1 width=24)
   ->  Bitmap Heap Scan on event_push_actions ea  (cost=2043.15..21616.92 rows=28582 width=6)
         Recheck Cond: ((room_id = '!<room>'::text) AND (user_id = '@<user>'::text))
         Filter: (stream_ordering > 13329084)
         ->  Bitmap Index Scan on event_push_actions_room_id_user_id  (cost=0.00..2036.00 rows=28745 width=0)
               Index Cond: ((room_id = '!<room>'::text) AND (user_id = '@<user>'::text))
(6 rows)

@erikjohnston
Copy link
Member

Sounds like we need to aggregate more aggressively then.

@erikjohnston
Copy link
Member

This is where we'd need to add the additional rotation logic:

logger.info("Rotating notifications")
caught_up = await self.db_pool.runInteraction(
"_rotate_notifs", self._rotate_notifs_txn
)
if caught_up:
break
await self.hs.get_clock().sleep(self._rotate_delay)

Right now, it works by rotating all push actions that have a stream ordering that is older than one day, doing a max of 10000 rows per iteration.

I think we can add a second loop where we check if there are lots of push actions for a given room, and rotate old rows until there is a max of e.g. 100 rows left in the event_push_actions table. This should be safe if we change the bounds of how we fetch unprocessed counts at:

WHERE ? <= stream_ordering AND stream_ordering < ?

to instead use the old stream ordering that we store in event_push_summary.

@erikjohnston erikjohnston self-assigned this Jun 7, 2022
@erikjohnston
Copy link
Member

Ok, so my suggestion in the previous comment doesn't work, as it results in table scanning event_push_actions, which is not tenable for larger hosts.

@erikjohnston
Copy link
Member

New plan:

Have a new table called event_push_counts, which is like event_push_summaries but keeps track of the full count (rather than just up to one day ago). This will allow fetching the unread counts to be the same as today, except that the range we'll need to scan in event_push_actions is much smaller.

To keep it updated:

  1. On a new event we:
    1. Fetch all rows from event_push_actions for the room/user from the stream_ordering in event_push_counts (or if a row there doesn't exist, from event_push_summaries).
    2. Add that count to event_push_counts and update the stream_ordering.
  2. When we get a receipt from a user in a room we recalculate event_push_counts by counting remaining matching rows in event_push_actions.
  3. On rotation of push actions into event_push_summaries no action needs to be taken, as it should be a no-op.

erikjohnston added a commit that referenced this issue Jun 15, 2022
Fixes #11887 hopefully.

The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller.

This needs two major changes:
1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it
2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`.

In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants