Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

dropping requests to .well-known leads to slow media federation #7231

Open
deepbluev7 opened this issue Apr 6, 2020 · 2 comments
Open

dropping requests to .well-known leads to slow media federation #7231

deepbluev7 opened this issue Apr 6, 2020 · 2 comments
Labels
A-Federation A-Media-Repository Uploading, downloading images and video, thumbnailing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@deepbluev7
Copy link
Contributor

deepbluev7 commented Apr 6, 2020

Description

When a server has it's firewall set to drop incoming requests to ports it doesn't use (pf default for the block rule) and doesn't use .well-known (so it drops requests to 443, specifically https://server:443/.well-known/matrix/server), every media request needs to wait for the .well-known request to timeout, before it actually starts fetching the media. So every image in a shared room takes around 30 seconds to load over federation.

Steps to reproduce

  • set up a server A, that only runs on port 8448
  • set the firewall of server A to drop requests on port 443 (use SRV or default 8448 for federation setup)
  • set up server B
  • share a room between server A and B
  • let server A send an image

Server B will now wait for 30 seconds on the .well-known timeout, before it sends the actual media request. This happens for every media!

Expected behaviour

Server B caches the .well-known result and only waits for 30 seconds on the first timeout. Maybe it is just the media worker, that doesn't do that.

relevant logs:

2020-04-05 21:39:01,516 - synapse.http.matrixfederationclient - 408 - INFO - GET-11022 - {GET-O-345} [pink.packageloss.eu] Sending request: GET matrix://pink.packageloss.eu/_matrix/media/v1/download/pink.packageloss.eu/3d1a80ed7d2b89457d96e5212f2378a958b7560c?allow_remote=false; timeout 60.000000s
2020-04-05 21:39:01,517 - synapse.http.federation.well_known_resolver - 234 - INFO - GET-11022 - Fetching https://pink.packageloss.eu/.well-known/matrix/server
2020-04-05 21:39:03,892 - synapse.access.http.8085 - 302 - INFO - GET-11023 - 127.0.0.1 - 8085 - {None} Processed request: 0.003sec/-0.000sec (0.002sec, 0.000sec) (0.001sec/0.001sec/1) 6756B 200 "GET /_matrix/media/r0/download/pink.packageloss.eu/b195fb40cab5095adf1551b876d99610bad24f60 HTTP/1.0" "mtxclient v0.3.0" [0 dbevts]
2020-04-05 21:39:31,825 - synapse.http.federation.well_known_resolver - 250 - INFO - GET-11022 - Error fetching https://pink.packageloss.eu/.well-known/matrix/server: User timeout caused connection failure.
2020-04-05 21:39:31,828 - synapse.http.federation.matrix_federation_agent - 242 - INFO - GET-11022 - Connecting to pink.packageloss.eu:8448
2020-04-05 21:39:32,211 - synapse.http.matrixfederationclient - 442 - INFO - GET-11022 - {GET-O-345} [pink.packageloss.eu] Got response headers: 200 OK
2020-04-05 21:39:32,212 - synapse.http.matrixfederationclient - 909 - INFO - GET-11022 - {GET-O-345} [pink.packageloss.eu] Completed: 200 OK [14490 bytes]
2020-04-05 21:39:32,212 - synapse.rest.media.v1.media_repository - 406 - INFO - GET-11022 - Stored remote media in file '/var/lib/synapse/media_store/remote_content/pink.packageloss.eu/td/yY/oLPpEPNArfeZksvNvGWv'
2020-04-05 21:39:32,356 - synapse.access.http.8085 - 302 - INFO - GET-11022 - ::1 - 8085 - {None} Processed request: 30.857sec/-0.000sec (0.047sec, 0.001sec) (0.003sec/0.110sec/7) 14490B 200 "GET /_matrix/media/r0/download/pink.packageloss.eu/3d1a80ed7d2b89457d96e5212f2378a958b7560c HTTP/1.0" "mtxclient v0.3.0" [0 dbevts]

Version information

  • Homeserver: pink.packageloss.eu (server A) and neko.dev (server B)

If not matrix.org:

  • Version: 10.x and 12.3

  • Install method: ports/ebuild

  • Platform: FreeBSD/Gentoo
@anoadragon453 anoadragon453 added z-bug (Deprecated Label) A-Media-Repository Uploading, downloading images and video, thumbnailing z-p2 (Deprecated Label) labels Apr 8, 2020
@anoadragon453
Copy link
Member

Looks like the media repository simply uses MatrixFederationHttpClient to download media from remote servers:

length, headers = await self.client.get_file(

Which itself spawns a MatrixFederationAgent:

self.agent = MatrixFederationAgent(self.reactor, tls_client_options_factory)

Which spawns a WellKnownResolver:

_well_known_resolver = WellKnownResolver(
self._reactor,
agent=Agent(
self._reactor,
pool=self._pool,
contextFactory=tls_client_options_factory,
),
)

Which should have a TTLCache built in by default:

if well_known_cache is None:
well_known_cache = _well_known_cache

So at first glance it's not entirely clear why well-known lookups wouldn't be cached. Needs more investigation.

@deepbluev7
Copy link
Contributor Author

deepbluev7 commented Apr 8, 2020

Are .well-known requests cached, when the request fails because of a timeout? And why does the media repo even request the .well-known, when the servers are currently federating normally? Is the .well-known cache separate for every worker?

(We also changed the pink.packageloss.eu server now to not drop the requests, but reject them immediately, which makes it fast enough.)

@reivilibre reivilibre added A-Federation T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Major Major functionality / product severely impaired, no satisfactory workaround. O-Uncommon Most users are unlikely to come across this or unexpected workflow labels May 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation A-Media-Repository Uploading, downloading images and video, thumbnailing O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

3 participants