Fix a bug where servers could be marked as up when they were failing #16506

clokep · 2023-10-16T17:54:27Z

As described in #15226, the RetryDestinationLimiter seems to mark servers as up when the connection fails. It seems to be a regression from #12500, see #12500 (comment).

I think this will cause the notifier to be woken up and additional replication traffic. I think it will then cause federation traffic via a new transaction being sent to the unreachable server?

clokep · 2023-10-16T17:56:54Z

changelog.d/16506.bugfix

@@ -0,0 +1 @@
+Fix a bug introduced in Synapse 1.59.0 where servers would be incorrectly marked as available when a request resulted in an error.


I'm not sure if there's a more visible symptom? Perhaps it would cause things to be retried too often?

clokep · 2023-10-16T18:02:15Z

synapse/util/retryutils.py

-                    # the notifier.
-                    self.replication_client.send_remote_server_up(self.destination)
+                # If the server was previously failing, but is no longer.
+                if previously_failing:


@erikjohnston this might need some thoughts from you as the original author of #12500 -- was this done on purpose and I'm missing some understanding?

Actually, I think this is not quite right still, it will end up calling this code if we were previously failing & still failing. I think?

Actually, I think this is not quite right still, it will end up calling this code if we were previously failing & still failing. I think?

It should be OK now. 👍

The logic could be simplified to only check not currently_failing, which depends on the earlier return to not kick off the background process. But I find this a bit clearer to include previously_failing and not currently_failing. 🤷

tests/util/test_retryutils.py

erikjohnston

Seems legit to me!

clokep added 3 commits October 16, 2023 13:34

Clarify comments.

e46f13b

Do not mark servers as up for failures.

327a2c9

Newsfragment

7a25e70

This was referenced Oct 16, 2023

Immediately retry any requests that have backed off when a server comes back online. #12500

Merged

Clarify retry notification code #15226

Closed

clokep linked an issue Oct 16, 2023 that may be closed by this pull request

Clarify retry notification code #15226

Closed

clokep commented Oct 16, 2023

View reviewed changes

Additional fixes.

2da1f39

clokep commented Oct 16, 2023

View reviewed changes

tests/util/test_retryutils.py Outdated Show resolved Hide resolved

Typo fix.

022c359

clokep marked this pull request as ready for review October 16, 2023 19:40

clokep requested a review from a team as a code owner October 16, 2023 19:40

erikjohnston approved these changes Oct 17, 2023

View reviewed changes

clokep merged commit 77dfc1f into develop Oct 17, 2023
36 of 38 checks passed

clokep deleted the clokep/server-up branch October 17, 2023 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug where servers could be marked as up when they were failing #16506

Fix a bug where servers could be marked as up when they were failing #16506

clokep commented Oct 16, 2023 •

edited

Loading

clokep Oct 16, 2023

clokep Oct 16, 2023

clokep Oct 16, 2023

clokep Oct 16, 2023

clokep Oct 16, 2023

erikjohnston left a comment

		@@ -0,0 +1 @@
		Fix a bug introduced in Synapse 1.59.0 where servers would be incorrectly marked as available when a request resulted in an error.

Fix a bug where servers could be marked as up when they were failing #16506

Fix a bug where servers could be marked as up when they were failing #16506

Conversation

clokep commented Oct 16, 2023 • edited Loading

clokep Oct 16, 2023

Choose a reason for hiding this comment

clokep Oct 16, 2023

Choose a reason for hiding this comment

clokep Oct 16, 2023

Choose a reason for hiding this comment

clokep Oct 16, 2023

Choose a reason for hiding this comment

clokep Oct 16, 2023

Choose a reason for hiding this comment

erikjohnston left a comment

Choose a reason for hiding this comment

clokep commented Oct 16, 2023 •

edited

Loading