dropping requests to .well-known leads to slow media federation #7231

deepbluev7 · 2020-04-06T20:44:43Z

Description

When a server has it's firewall set to drop incoming requests to ports it doesn't use (pf default for the block rule) and doesn't use .well-known (so it drops requests to 443, specifically https://server:443/.well-known/matrix/server), every media request needs to wait for the .well-known request to timeout, before it actually starts fetching the media. So every image in a shared room takes around 30 seconds to load over federation.

Steps to reproduce

set up a server A, that only runs on port 8448
set the firewall of server A to drop requests on port 443 (use SRV or default 8448 for federation setup)
set up server B
share a room between server A and B
let server A send an image

Server B will now wait for 30 seconds on the .well-known timeout, before it sends the actual media request. This happens for every media!

Expected behaviour

Server B caches the .well-known result and only waits for 30 seconds on the first timeout. Maybe it is just the media worker, that doesn't do that.

relevant logs:

2020-04-05 21:39:01,516 - synapse.http.matrixfederationclient - 408 - INFO - GET-11022 - {GET-O-345} [pink.packageloss.eu] Sending request: GET matrix://pink.packageloss.eu/_matrix/media/v1/download/pink.packageloss.eu/3d1a80ed7d2b89457d96e5212f2378a958b7560c?allow_remote=false; timeout 60.000000s
2020-04-05 21:39:01,517 - synapse.http.federation.well_known_resolver - 234 - INFO - GET-11022 - Fetching https://pink.packageloss.eu/.well-known/matrix/server
2020-04-05 21:39:03,892 - synapse.access.http.8085 - 302 - INFO - GET-11023 - 127.0.0.1 - 8085 - {None} Processed request: 0.003sec/-0.000sec (0.002sec, 0.000sec) (0.001sec/0.001sec/1) 6756B 200 "GET /_matrix/media/r0/download/pink.packageloss.eu/b195fb40cab5095adf1551b876d99610bad24f60 HTTP/1.0" "mtxclient v0.3.0" [0 dbevts]
2020-04-05 21:39:31,825 - synapse.http.federation.well_known_resolver - 250 - INFO - GET-11022 - Error fetching https://pink.packageloss.eu/.well-known/matrix/server: User timeout caused connection failure.
2020-04-05 21:39:31,828 - synapse.http.federation.matrix_federation_agent - 242 - INFO - GET-11022 - Connecting to pink.packageloss.eu:8448
2020-04-05 21:39:32,211 - synapse.http.matrixfederationclient - 442 - INFO - GET-11022 - {GET-O-345} [pink.packageloss.eu] Got response headers: 200 OK
2020-04-05 21:39:32,212 - synapse.http.matrixfederationclient - 909 - INFO - GET-11022 - {GET-O-345} [pink.packageloss.eu] Completed: 200 OK [14490 bytes]
2020-04-05 21:39:32,212 - synapse.rest.media.v1.media_repository - 406 - INFO - GET-11022 - Stored remote media in file '/var/lib/synapse/media_store/remote_content/pink.packageloss.eu/td/yY/oLPpEPNArfeZksvNvGWv'
2020-04-05 21:39:32,356 - synapse.access.http.8085 - 302 - INFO - GET-11022 - ::1 - 8085 - {None} Processed request: 30.857sec/-0.000sec (0.047sec, 0.001sec) (0.003sec/0.110sec/7) 14490B 200 "GET /_matrix/media/r0/download/pink.packageloss.eu/3d1a80ed7d2b89457d96e5212f2378a958b7560c HTTP/1.0" "mtxclient v0.3.0" [0 dbevts]

Version information

Homeserver: pink.packageloss.eu (server A) and neko.dev (server B)

If not matrix.org:

Version: 10.x and 12.3
Install method: ports/ebuild

Platform: FreeBSD/Gentoo

The text was updated successfully, but these errors were encountered:

anoadragon453 · 2020-04-08T13:23:51Z

Looks like the media repository simply uses MatrixFederationHttpClient to download media from remote servers:

synapse/synapse/rest/media/v1/media_repository.py

Line 346 in caec7d4

length, headers = await self.client.get_file(

Which itself spawns a MatrixFederationAgent:

synapse/synapse/http/matrixfederationclient.py

Line 201 in fcfb591

self.agent = MatrixFederationAgent(self.reactor, tls_client_options_factory)

Which spawns a WellKnownResolver:

synapse/synapse/http/federation/matrix_federation_agent.py

Lines 83 to 90 in c37db02

    
           _well_known_resolver = WellKnownResolver( 
        
               self._reactor, 
        
               agent=Agent( 
        
                   self._reactor, 
        
                   pool=self._pool, 
        
                   contextFactory=tls_client_options_factory, 
        
               ), 
        
           )

Which should have a TTLCache built in by default:

synapse/synapse/http/federation/well_known_resolver.py

Lines 86 to 87 in 4548d1f

    
           if well_known_cache is None: 
        
               well_known_cache = _well_known_cache

So at first glance it's not entirely clear why well-known lookups wouldn't be cached. Needs more investigation.

deepbluev7 · 2020-04-08T13:50:22Z

Are .well-known requests cached, when the request fails because of a timeout? And why does the media repo even request the .well-known, when the servers are currently federating normally? Is the .well-known cache separate for every worker?

(We also changed the pink.packageloss.eu server now to not drop the requests, but reject them immediately, which makes it fast enough.)

anoadragon453 added z-bug (Deprecated Label) A-Media-Repository Uploading, downloading images and video, thumbnailing z-p2 (Deprecated Label) labels Apr 8, 2020

reivilibre added A-Federation T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Major Major functionality / product severely impaired, no satisfactory workaround. O-Uncommon Most users are unlikely to come across this or unexpected workflow labels May 23, 2023

matrixbot mentioned this issue Dec 21, 2023

dropping requests to .well-known leads to slow media federation element-hq/synapse#7231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dropping requests to .well-known leads to slow media federation #7231

dropping requests to .well-known leads to slow media federation #7231

deepbluev7 commented Apr 6, 2020 •

edited

anoadragon453 commented Apr 8, 2020

deepbluev7 commented Apr 8, 2020 •

edited

dropping requests to .well-known leads to slow media federation #7231

dropping requests to .well-known leads to slow media federation #7231

Comments

deepbluev7 commented Apr 6, 2020 • edited

Description

Steps to reproduce

Expected behaviour

Version information

anoadragon453 commented Apr 8, 2020

deepbluev7 commented Apr 8, 2020 • edited

deepbluev7 commented Apr 6, 2020 •

edited

deepbluev7 commented Apr 8, 2020 •

edited