Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

AS scheduler may drop events during a restart #11637

Open
Fizzadar opened this issue Dec 23, 2021 · 1 comment
Open

AS scheduler may drop events during a restart #11637

Fizzadar opened this issue Dec 23, 2021 · 1 comment
Labels
A-Application-Service Related to AS support T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@Fizzadar
Copy link
Contributor

Description

The AS scheduler keeps track of events to send to each AS in memory as they come in, only pushing these into the database (as an AS transaction) once any in-flight requests have been completed:

async def _send_request(self, service: ApplicationService) -> None:
# sanity-check: we shouldn't get here if this service already has a sender
# running.
assert service.id not in self.requests_in_flight
self.requests_in_flight.add(service.id)
try:
while True:
all_events = self.queued_events.get(service.id, [])
events = all_events[:MAX_PERSISTENT_EVENTS_PER_TRANSACTION]
del all_events[:MAX_PERSISTENT_EVENTS_PER_TRANSACTION]
all_events_ephemeral = self.queued_ephemeral.get(service.id, [])
ephemeral = all_events_ephemeral[:MAX_EPHEMERAL_EVENTS_PER_TRANSACTION]
del all_events_ephemeral[:MAX_EPHEMERAL_EVENTS_PER_TRANSACTION]
if not events and not ephemeral:
return
try:
await self.txn_ctrl.send(service, events, ephemeral)
except Exception:
logger.exception("AS request failed")
finally:
self.requests_in_flight.discard(service.id)

Steps to reproduce

This means AS events could be lost during the following series of events:

  • events come in, AS request begins
  • more events come in while AS is processing, these only exist in memory
  • a restart/crash/etc of the synapse or AS pusher process - the above AS events are lost forever

Possible solution

Would it be possible to setup some kind of exit handling (atexit from stdlib?) that dumps any in-memory events into a new txn in the database before the process exits, this would prevent any loss of AS events.

@reivilibre
Copy link
Contributor

Would it be possible to setup some kind of exit handling (atexit from stdlib?) that dumps any in-memory events into a new txn in the database before the process exits, this would prevent any loss of AS events.

This sounds like it won't really solve the problem if it's caused by a crash / power cut / etc.

My imagination of how this should work is that the AS scheduler shouldn't advance its stream position (in whatever source the events come from in the first place) until it has actually handled the events.

I still need to dig through the code and see what's going on — it's not an area I'm very familiar with but I can always use an excuse to get more familiar with it. :)

@H-Shay H-Shay added A-Application-Service Related to AS support T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. labels Jan 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Application-Service Related to AS support T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

No branches or pull requests

3 participants