New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MSC3916 by adding a federation /download
endpoint
#17172
base: develop
Are you sure you want to change the base?
Conversation
…classes when fetching media for federation download
Does this require more tests? If so, what should be tested? |
complement run failing with |
I think that error is a red herring. In the Synapse logs, I'm seeing the following exception:
The error about conflicting containers is certainly unhelpful though... |
Weird! If it's an attribute error why did only one of the three complement tests fail? Don't they use the same image? I'm gonna dig deeper but I'm surprised... Edit: looks like this was due to workers trying to instantiate the federation download servlet (which relies on the media repo) when the media repo has been disabled via config. I wondered why this wasn't picked up by the Sytest worker runs but I suspect that it is due to the fact that sytest doesn't appear to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fantastic start - a few questions below. In addition, this functionality should be gated before an experimental config option.
@@ -55,13 +57,21 @@ async def store_file(self, path: str, file_info: FileInfo) -> None: | |||
""" | |||
|
|||
@abc.abstractmethod | |||
async def fetch(self, path: str, file_info: FileInfo) -> Optional[Responder]: | |||
async def fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this changing an external API? https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#media_storage_providers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat unclear to me - I couldn't tell from the config documentation what defines a media_storage_provider
- looking at the code that parses the config it uses the load_module
function which loads a Synapse module, and there is a Synapse module FileStorageProvderBackend
that exists in Synapse. Nothing in the docs/module docs mentioned anything about third-party storage providers so I figured it was not an external thing, but I could be wrong? Any insight is appreciated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, indeed the docs don't provide much insight here.
https://github.com/matrix-org/synapse-s3-storage-provider/ is an external storage provider that is widely used (including on matrix.org), and does implement this method: https://github.com/matrix-org/synapse-s3-storage-provider/blob/2c46a764f700e6439afa11c00db827ddf21a9e89/s3_storage_provider.py#L136-L148. This change would cause a TypeError
in that library.
What we've done in the past for the Module API in this situation is to either:
- Define a new function with the new arguments, i.e.
fetch_with_media_info
. - Check the parameter count of the module's implementation of the function, and add arguments accordingly. Module API example, another example. Note that you don't need to define an
async_wrapper
- that was backwards-compatibility from the time before we hadasync
functions everywhere.
2 is probably cleaner from an external perspective.
We'll should also add a note to the changelog signalling to media storage provider module authors that they can take advantage of the enhanced metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an issue to document this at #17193
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that complicates things - the federation
parameter that I have added does change the form of the response from fetch
, which is responsible for streaming the file to the consumer. In the code I've added, if the federation
parameter is True
, fetch will stream the multipart response, rather than just the file as it did previously. So I don't think?? that we can just drop the parameter if the provider doesn't support it, otherwise the response could be incorrect. I guess in that case adding a new function would be the best way forward, although AFAICT that means that third-party storage modules will be incompatible with MSC3916 until they implement the new function supporting a multipart response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How have things like this been handled in the past, as it seems like we need to change the behavior of an external entity, since there are external storage providers and the storage provider is responsible for streaming the response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the more I think about this the more I am concerned that this is a significant problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is too hacky, but we could add a wrapper around the reader returned by fetch
.
In the case of a federation request, we return the multipart response headers and JSON part, then stream the file from the wrapped reader for the second part. Essentially you'd have one IO reader reading from the one returned by fetch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there! Just a few small things now.
@@ -55,13 +57,21 @@ async def store_file(self, path: str, file_info: FileInfo) -> None: | |||
""" | |||
|
|||
@abc.abstractmethod | |||
async def fetch(self, path: str, file_info: FileInfo) -> Optional[Responder]: | |||
async def fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, indeed the docs don't provide much insight here.
https://github.com/matrix-org/synapse-s3-storage-provider/ is an external storage provider that is widely used (including on matrix.org), and does implement this method: https://github.com/matrix-org/synapse-s3-storage-provider/blob/2c46a764f700e6439afa11c00db827ddf21a9e89/s3_storage_provider.py#L136-L148. This change would cause a TypeError
in that library.
What we've done in the past for the Module API in this situation is to either:
- Define a new function with the new arguments, i.e.
fetch_with_media_info
. - Check the parameter count of the module's implementation of the function, and add arguments accordingly. Module API example, another example. Note that you don't need to define an
async_wrapper
- that was backwards-compatibility from the time before we hadasync
functions everywhere.
2 is probably cleaner from an external perspective.
We'll should also add a note to the changelog signalling to media storage provider module authors that they can take advantage of the enhanced metadata.
@@ -55,13 +57,21 @@ async def store_file(self, path: str, file_info: FileInfo) -> None: | |||
""" | |||
|
|||
@abc.abstractmethod | |||
async def fetch(self, path: str, file_info: FileInfo) -> Optional[Responder]: | |||
async def fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an issue to document this at #17193
It seems very unlikely to me that the sytest failure is a result of the last round of changes as I only added a test, newlines, and a comment... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems very unlikely to me that the sytest failure is a result of the last round of changes as I only added a test, newlines, and a comment...
Looks like a flake. I've re-ran the tests.
@@ -55,13 +57,21 @@ async def store_file(self, path: str, file_info: FileInfo) -> None: | |||
""" | |||
|
|||
@abc.abstractmethod | |||
async def fetch(self, path: str, file_info: FileInfo) -> Optional[Responder]: | |||
async def fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is too hacky, but we could add a wrapper around the reader returned by fetch
.
In the case of a federation request, we return the multipart response headers and JSON part, then stream the file from the wrapped reader for the second part. Essentially you'd have one IO reader reading from the one returned by fetch
.
if media_info.media_length is not None: | ||
request.setHeader(b"Content-Length", b"%d" % (media_info.media_length,)) | ||
request.setHeader( | ||
b"Content-Type", b"multipart/form-data; boundary=%s" % responder.boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should multipart/mixed
I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question :) #17172 (comment)
It's in the process of being changed. @H-Shay it would reduce reviewer confusion to get that bit updated in the MSC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, looks like the MSC thread isn't clear - it should be multipart/mixed
for now.
#17213) [MSC3916](https://github.com/matrix-org/matrix-spec-proposals/blob/rav/authentication-for-media/proposals/3916-authentication-for-media.md) adds new media endpoints under `_matrix/client`. This PR adds the `/preview_url`, `/config`, and `/thumbnail` endpoints. `/download` will be added in a follow-up PR once the work for the federation `/download` endpoint is complete (see #17172). Should be reviewable commit-by-commit.
MSC3916 outlines the addition of a new federation endpoint to serve media downloads to other servers. This PR implements that endpoint and the new
multipart/form-data
response format that it returns. Future PRs will implement the rest of MSC3916, including altering the new client-server/download
endpoint to use the new federation endpoint for remote download requests.Should be reviewable commit-by-commit