Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix appservices being backlogged and not receiving new events due to a bug in notify_interested_services #2631

Merged
merged 3 commits into from Nov 8, 2017

Conversation

xyzz
Copy link
Contributor

@xyzz xyzz commented Nov 2, 2017

The bug is:

  1. get_new_events_for_appservice returns max(event_id) for all retrieved rows
  2. get_new_events_for_appservice accepts upper_bound and all retrieved rows have event_id <= upper_bound
  3. since notify_interested_services sets upper_bound to whatever get_new_events_for_appservice returned, what happens is the while True loop at line 75 is executed only once when your server has more than 100 events: first, it gets 100 events, the next iteration it will always get 0
  4. so if your server has a lot of unprocessed events, it will only process 100 at a time (e.g. when a message is sent) and it looks as if appservices are not getting events submitted to them

I saw this happen on my homeserver.

Changing it to current_max ensures that it always gets some events, until the backlog is clear.

The second change is removing the len(events) < limit check - I'm not sure why, but sometimes it returned 99 events for me, even while there were a lot of events backlogged. Since there's another check: if not events, this should not make things infloop.

Signed-off-by: Ilya Zhuravlev whatever@xyz.is

@matrixbot
Copy link
Member

Can one of the admins verify this patch?

1 similar comment
@matrixbot
Copy link
Member

Can one of the admins verify this patch?

@xyzz
Copy link
Contributor Author

xyzz commented Nov 2, 2017

the test infloops because it's made to always return a single event

edit: I fixed it in the commit below

@xyzz xyzz force-pushed the fix_appservice_event_backlog branch from cc2b1e3 to 8a4a0dd Compare November 2, 2017 20:20
@ara4n
Copy link
Member

ara4n commented Nov 7, 2017

@xyzz - thanks for this; it looks like a pretty plausible bug & fix. @erikjohnston can you review it (when back from hol) please?

@erikjohnston
Copy link
Member

@matrixbot ok to test

@erikjohnston
Copy link
Member

Woo, thanks.

The second change is removing the len(events) < limit check - I'm not sure why, but sometimes it returned 99 events for me, even while there were a lot of events backlogged. Since there's another check: if not events, this should not make things infloop.

I imagine this happens when notify_interested_services is called again, i.e. there are new events, while the original notify_interested_services is processing the original set of events. There are two possible solutions:

  1. Do what you did and remove the len(events) < limjt check, or
  2. Check if current_max has changed during the loop, and always recheck if it has

@xyzz any thoughts?

@@ -74,7 +74,7 @@ def notify_interested_services(self, current_id):
limit = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help avoid confusion if upper_bound = self.current_max above this line was removed

@xyzz
Copy link
Contributor Author

xyzz commented Nov 7, 2017

I like current solution better because there's less code to maintain. Is there any benefit to checking len of returned events?

I removed the assignment as you suggested.

@erikjohnston
Copy link
Member

Cool, thanks! :)

@erikjohnston erikjohnston merged commit d305987 into matrix-org:develop Nov 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants