Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't do long retries when calling the key notary server. #5334

Merged

Conversation

Projects
None yet
2 participants
@richvdh
Copy link
Member

commented Jun 3, 2019

It takes at least 20 minutes to work through the long_retries schedule (11
attempts, each with a 60 second timeout, and 60 seconds between each request),
so if the notary server isn't returning within the timeout, we'll just end up
blocking whatever request is happening for 20 minutes.

Ain't nobody got time for that.

Don't do long retries when calling the key notary server.
It takes at least 20 minutes to work through the long_retries schedule (11
attempts, each with a 60 second timeout, and 60 seconds between each request),
so if the notary server isn't returning within the timeout, we'll just end up
blocking whatever request is happening for 20 minutes.

Ain't nobody got time for that.

@richvdh richvdh requested a review from matrix-org/synapse-core Jun 3, 2019

@codecov

This comment has been minimized.

Copy link

commented Jun 3, 2019

Codecov Report

Merging #5334 into develop will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #5334   +/-   ##
========================================
  Coverage    63.02%   63.02%           
========================================
  Files          341      341           
  Lines        35620    35620           
  Branches      5830     5830           
========================================
  Hits         22449    22449           
  Misses       11603    11603           
  Partials      1568     1568
@erikjohnston
Copy link
Member

left a comment

LGTM, though would prefer if you added the explicit long_retries=False

@@ -638,7 +638,6 @@ def get_server_verify_key_v2_indirect(
for server_name, server_keys in keys_to_fetch.items()
}
},
long_retries=True,

This comment has been minimized.

Copy link
@erikjohnston

erikjohnston Jun 4, 2019

Member

I'd prefer to explicitly set long_retries=False rather than relying on the default tbh.

This comment has been minimized.

Copy link
@richvdh

richvdh Jun 4, 2019

Author Member

as discussed in #synapse-dev

  • we don't set long_retries=False anywhere else
  • given how long long_retries takes, you don't want to set it unless you really want it. Accordingly I think it makes sense for the default to be False and for everything to rely on that being the default.

@richvdh richvdh merged commit 4d08b8f into develop Jun 4, 2019

24 checks passed

buildkite/synapse Build #1779 passed (18 minutes, 42 seconds)
Details
buildkite/synapse/check-sample-config Passed (1 minute, 17 seconds)
Details
buildkite/synapse/isort Passed (18 seconds)
Details
buildkite/synapse/newspaper-newsfile Passed (16 seconds)
Details
buildkite/synapse/packaging Passed (19 seconds)
Details
buildkite/synapse/pep-8 Passed (57 seconds)
Details
buildkite/synapse/pipeline Passed (2 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-postgres-9-dot-4 Passed (16 minutes, 3 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-postgres-9-dot-5 Passed (16 minutes, 26 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-sqlite Passed (7 minutes, 34 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-sqlite-slash-old-deps Passed (9 minutes, 41 seconds)
Details
buildkite/synapse/python-3-dot-5-slash-postgres-9-dot-4 Passed (17 minutes, 27 seconds)
Details
buildkite/synapse/python-3-dot-5-slash-postgres-9-dot-5 Passed (17 minutes, 23 seconds)
Details
buildkite/synapse/python-3-dot-5-slash-sqlite Passed (8 minutes, 44 seconds)
Details
buildkite/synapse/python-3-dot-6-slash-sqlite Passed (8 minutes, 41 seconds)
Details
buildkite/synapse/python-3-dot-7-slash-postgres-11 Passed (17 minutes, 21 seconds)
Details
buildkite/synapse/python-3-dot-7-slash-postgres-9-dot-5 Passed (17 minutes, 29 seconds)
Details
buildkite/synapse/python-3-dot-7-slash-sqlite Passed (8 minutes, 24 seconds)
Details
ci/circleci: sytestpy2merged Your tests passed on CircleCI!
Details
ci/circleci: sytestpy2postgresmerged Your tests passed on CircleCI!
Details
ci/circleci: sytestpy3merged Your tests passed on CircleCI!
Details
ci/circleci: sytestpy3postgresmerged Your tests passed on CircleCI!
Details
codecov/patch Coverage not affected when comparing fec2dcb...2f30565
Details
codecov/project 63.02% (target 0%)
Details

@richvdh richvdh deleted the rav/server_keys/10_no_stop_for_20_minutes_during_join branch Jun 4, 2019

neilisfragile added a commit that referenced this pull request Jun 7, 2019

Merge tag 'v1.0.0rc1' into develop
Synapse 1.0.0rc1 (2019-06-07)
=============================

Features
--------

- Synapse now more efficiently collates room statistics. ([\#4338](#4338), [\#5260](#5260), [\#5324](#5324))
- Add experimental support for relations (aka reactions and edits). ([\#5220](#5220))
- Ability to configure default room version. ([\#5223](#5223), [\#5249](#5249))
- Allow configuring a range for the account validity startup job. ([\#5276](#5276))
- CAS login will now hit the r0 API, not the deprecated v1 one. ([\#5286](#5286))
- Validate federation server TLS certificates by default (implements [MSC1711](https://github.com/matrix-org/matrix-doc/blob/master/proposals/1711-x509-for-federation.md)). ([\#5359](#5359))
- Update /_matrix/client/versions to reference support for r0.5.0. ([\#5360](#5360))
- Add a script to generate new signing-key files. ([\#5361](#5361))
- Update upgrade and installation guides ahead of 1.0. ([\#5371](#5371))
- Replace the `perspectives` configuration section with `trusted_key_servers`, and make validating the signatures on responses optional (since TLS will do this job for us). ([\#5374](#5374))
- Add ability to perform password reset via email without trusting the identity server. ([\#5377](#5377))
- Set default room version to v4. ([\#5379](#5379))

Bugfixes
--------

- Fixes client-server API not sending "m.heroes" to lazy-load /sync requests when a rooms name or its canonical alias are empty. Thanks to @dnaf for this work! ([\#5089](#5089))
- Prevent federation device list updates breaking when processing multiple updates at once. ([\#5156](#5156))
- Fix worker registration bug caused by ClientReaderSlavedStore being unable to see get_profileinfo. ([\#5200](#5200))
- Fix race when backfilling in rooms with worker mode. ([\#5221](#5221))
- Fix appservice timestamp massaging. ([\#5233](#5233))
- Ensure that server_keys fetched via a notary server are correctly signed. ([\#5251](#5251))
- Show the correct error when logging out and access token is missing. ([\#5256](#5256))
- Fix error code when there is an invalid parameter on /_matrix/client/r0/publicRooms ([\#5257](#5257))
- Fix error when downloading thumbnail with missing width/height parameter. ([\#5258](#5258))
- Fix schema update for account validity. ([\#5268](#5268))
- Fix bug where we leaked extremities when we soft failed events, leading to performance degradation. ([\#5274](#5274), [\#5278](#5278), [\#5291](#5291))
- Fix "db txn 'update_presence' from sentinel context" log messages. ([\#5275](#5275))
- Fix dropped logcontexts during high outbound traffic. ([\#5277](#5277))
- Fix a bug where it is not possible to get events in the federation format with the request `GET /_matrix/client/r0/rooms/{roomId}/messages`. ([\#5293](#5293))
- Fix performance problems with the rooms stats background update. ([\#5294](#5294))
- Fix noisy 'no key for server' logs. ([\#5300](#5300))
- Fix bug where a notary server would sometimes forget old keys. ([\#5307](#5307))
- Prevent users from setting huge displaynames and avatar URLs. ([\#5309](#5309))
- Fix handling of failures when processing incoming events where calling `/event_auth` on remote server fails. ([\#5317](#5317))
- Ensure that we have an up-to-date copy of the signing key when validating incoming federation requests. ([\#5321](#5321))
- Fix various problems which made the signing-key notary server time out for some requests. ([\#5333](#5333))
- Fix bug which would make certain operations (such as room joins) block for 20 minutes while attemoting to fetch verification keys. ([\#5334](#5334))
- Fix a bug where we could rapidly mark a server as unreachable even though it was only down for a few minutes. ([\#5335](#5335), [\#5340](#5340))
- Fix a bug where account validity renewal emails could only be sent when email notifs were enabled. ([\#5341](#5341))
- Fix failure when fetching batches of events during backfill, etc. ([\#5342](#5342))
- Add a new room version where the timestamps on events are checked against the validity periods on signing keys. ([\#5348](#5348), [\#5354](#5354))
- Fix room stats and presence background updates to correctly handle missing events. ([\#5352](#5352))
- Include left members in room summaries' heroes. ([\#5355](#5355))
- Fix `federation_custom_ca_list` configuration option. ([\#5362](#5362))
- Fix missing logcontext warnings on shutdown. ([\#5369](#5369))

Improved Documentation
----------------------

- Fix docs on resetting the user directory. ([\#5282](#5282))
- Fix notes about ACME in the MSC1711 faq. ([\#5357](#5357))

Internal Changes
----------------

- Synapse will now serve the experimental "room complexity" API endpoint. ([\#5216](#5216))
- The base classes for the v1 and v2_alpha REST APIs have been unified. ([\#5226](#5226), [\#5328](#5328))
- Simplifications and comments in do_auth. ([\#5227](#5227))
- Remove urllib3 pin as requests 2.22.0 has been released supporting urllib3 1.25.2. ([\#5230](#5230))
- Preparatory work for key-validity features. ([\#5232](#5232), [\#5234](#5234), [\#5235](#5235), [\#5236](#5236), [\#5237](#5237), [\#5244](#5244), [\#5250](#5250), [\#5296](#5296), [\#5299](#5299), [\#5343](#5343), [\#5347](#5347), [\#5356](#5356))
- Specify the type of reCAPTCHA key to use. ([\#5283](#5283))
- Improve sample config for monthly active user blocking. ([\#5284](#5284))
- Remove spurious debug from MatrixFederationHttpClient.get_json. ([\#5287](#5287))
- Improve logging for logcontext leaks. ([\#5288](#5288))
- Clarify that the admin change password API logs the user out. ([\#5303](#5303))
- New installs will now use the v54 full schema, rather than the full schema v14 and applying incremental updates to v54. ([\#5320](#5320))
- Improve docstrings on MatrixFederationClient. ([\#5332](#5332))
- Clean up FederationClient.get_events for clarity. ([\#5344](#5344))
- Various improvements to debug logging. ([\#5353](#5353))
- Don't run CI build checks until sample config check has passed. ([\#5370](#5370))
- Automatically retry buildkite builds (max twice) when an agent is lost. ([\#5380](#5380))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.