Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2246: Asynchronous media uploads #2246

Merged
merged 30 commits into from Apr 25, 2023
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
b439277
Proposal for asynchronous media uploads
tulir Aug 24, 2019
a83c79c
Add security consideration and mention possible /create request body
tulir Aug 24, 2019
9a395ed
Expand on security consideration
tulir Aug 25, 2019
29e3463
Change error code for existing media PUT
tulir Aug 26, 2019
7cf22be
Add spam prevention error, example response and auth requirements
tulir Aug 26, 2019
658aac8
Add query parameters to control downloading
tulir Aug 1, 2020
bbd7d08
Rename max_stall to max_stall_ms
tulir Aug 1, 2020
0bffcb7
Merge branch 'master' into asynchronous_uploads
tulir Aug 1, 2020
4d009a9
Specify /create request body content
tulir Aug 1, 2020
1cbc04e
Update links, versions and prefixes
tulir Mar 14, 2022
c65f2bf
Add recommended maximum delay for starting upload
tulir Mar 14, 2022
63cef50
Explicitly deny uploading to other users' media IDs
tulir Mar 14, 2022
8ccf85f
Prevent spam by ratelimiting instead of limiting number of IDs
tulir Mar 18, 2022
12e907b
Remove streaming requirement
tulir Mar 18, 2022
d582bb3
Add unstable prefixes for new error codes
tulir Mar 18, 2022
173edf3
Add missing words
tulir Mar 18, 2022
f438754
Change /create endpoint to use v1
sumnerevans Jul 8, 2022
725675c
Reorganize /upload spec and integrate feedback
sumnerevans Jul 8, 2022
d55f1f9
Rename max_stall_ms -> timeout_ms
sumnerevans Jul 8, 2022
955177b
Mention that maximum value for timeout_ms should be imposed by the se…
sumnerevans Jul 8, 2022
823fcca
Mention that the timeout_ms can be ignored if the media exists already
sumnerevans Jul 8, 2022
3b00026
Change M_NOT_YET_UPLOADED to use 504 status code
sumnerevans Jul 8, 2022
9627af2
Remove retry_after_ms optional field
sumnerevans Jul 8, 2022
045c21e
Make unused_expires_at the deadline for the upload to complete rather…
sumnerevans Mar 30, 2023
011031b
Add notes about suggested rate-limiting techniques
sumnerevans Mar 30, 2023
6cb7e31
Recommend 24 hours instead of 1 minute
sumnerevans Mar 30, 2023
fedc697
Merge pull request #3 from beeper/async-uploads-rate-limiting-improve…
tulir Mar 31, 2023
7652f59
Clarify that rate limiting can apply on /create and /upload
sumnerevans Apr 20, 2023
098dd90
Clarify that unused_expires_at is a POSIX millisecond timestamp
sumnerevans Apr 20, 2023
9559ab0
Merge pull request #4 from beeper/asynchronous_uploads
tulir Apr 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
152 changes: 152 additions & 0 deletions proposals/2246-asynchronous-uploads.md
@@ -0,0 +1,152 @@
# MSC2246: Asynchronous media uploads
Sending media to Matrix currently requires that clients first upload the media
to the content repository and then send the event. This is a problem for some
use cases, such as bridges that want to preserve message order, as reuploading
a large file would block all messages.

## Proposal
This proposal proposes a way to send the event containing media before actually
uploading the media, which would make the aformentioned bridge message order
preservation possible without blocking all other messages behind a long upload.

In the future, this new functionality could be used for streaming file
transfers, as requested in [matrix-spec#432].

### Content repository behavior
The proposal adds two new endpoints to the content repository API and modifies
the download and thumbnail endpoints.

#### `POST /_matrix/media/v1/create`
Create a new MXC URI without content. Like `/upload`, this endpoint requires
auth, can be rate limited, and returns the `content_uri` that can be used in
events.

The request body should be an empty JSON object. In the future, the body could
be used for metadata about the file, such as the mime type or access control
settings (related: [MSC701]).

The server may optionally enforce a maximum age for unused media IDs to delete
media IDs when the client doesn't start the upload in time, or when the upload
was interrupted and not resumed in time. The server should include the maximum
POSIX millisecond timestamp to complete the upload in the `unused_expires_at`
field in the response JSON. The recommended default expiration is 24 hours which
should be enough time to accommodate users on poor connection who find a better
connection to complete the upload.

##### Rate Limiting

The server should rate limit requests to create media.

The server should limit the number of concurrent *pending media uploads* a given
user can have. A pending media upload is a created MXC URI that (a) is not
expired (the `unused_expires_at` timestamp has not passed) and (b) the media has
not yet been uploaded for.

In both cases, the server should respond with `M_LIMIT_EXCEEDED` optionally
providing details in the `error` field, but servers may wish to obscure the
exact limits that are used and not provide such details.

##### Example response
```json
{
"content_uri": "mxc://example.com/AQwafuaFswefuhsfAFAgsw",
"unused_expires_at": 1647257217083
}
```

#### `PUT /_matrix/media/v3/upload/{serverName}/{mediaId}`
tulir marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that this endpoint support the content-type header and filename query parameter, as https://spec.matrix.org/v1.6/client-server-api/#post_matrixmediav3upload does? It looks like the sample implementation at https://github.com/turt2live/matrix-media-repo/pull/364/files#diff-3ddbc505e50723b8440369eef4dc1fa055026668151b1dbecbd1725fc6765727 does, but this isn't mentioned in the MSC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the future those details could be moved to the /create call request body, but that wasn't defined here

Upload content to a MXC URI that was created earlier. This endpoint requires
auth. If the upload is successful, an empty JSON object and status code 200 is
returned. Rate limiting additionally can apply here.

If the endpoint is called with a media ID that already has content, the request
should be rejected with the error code `M_CANNOT_OVERWRITE_MEDIA` and HTTP
status code 409.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the upload fails but the bridge has already sent the message containing the MXC URI? This might need some more detail on exactly how multiple calls to the endpoint work, although it's been touched on in other comments.

To throw a few ideas out:

  • Include a content length & hash in the create endpoint (although of course this would rule out the client generating the media during upload, eg. live streaming video)
  • HTTP chunking on uploads
  • Other things that have already been mentioned like requiring content-length in the upload call, but being explicit that the server would consider the upload 'finalised' once it had the advertised number of bytes.

For concurrent uploads, could leave it up to the HS to decide how to handle sensibly, possibly advising that later uploads kick off any started earlier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, as the person currently holding this MSC up, I'm not too concerned with this case. The two most likely scenarios are:

  1. The http client can't hit the server for some reason. If the call is interrupted or otherwise has a connection error then the upload can/should be cancelled/invalidated. Similarly, most http clients buffer a good portion of the request (and when they don't, http servers tend to do a lot of buffering at the application level): this buffering means that even if the client doesn't see the response the upload is still likely to be successful.

  2. The server encounters an error during the processing of the upload. In this case, the server should reject the http request with a 500 or similar error and not consider any potentially-partial content as uploaded.

There's certainly an argument for improved error handling though - I'll leave that to @dbkr & others to decide if it warrants an @mscbot concern comment :)


If the upload request comes from a user other than the one who created the media
ID, the request should be rejected with an `M_FORBIDDEN` error.

If the serverName/mediaId combination is not known, not local, or expired, an
`M_NOT_FOUND` error is returned.

If the MXC's `unused_expires_at` is reached before the upload completes, the
server may either respond immediately with `M_NOT_FOUND` or allow the upload to
continue.

For other errors, such as file size, file type or user quota errors, the normal
`/upload` rules apply.

#### Changes to the `/download` and `/thumbnail` endpoints
A new query parameter, `timeout_ms` is added to the endpoints that can
download media. It's an integer that specifies the maximum number of
milliseconds that the client is willing to wait to start receiving data.
turt2live marked this conversation as resolved.
Show resolved Hide resolved
The default value is 20000 (20 seconds). The content repository can and should
impose a maximum value for this parameter. The content repository can also
choose to respond before the timeout if it desires.

If the media is available immediately (for example in the case of a
non-asynchronous upload), the content repository should ignore this parameter.

If the MXC has expired, the content repository should respond with `M_NOT_FOUND`
and a HTTP 404 status code.

If the data is not available when the server chooses to respond, the content
repository returns a `M_NOT_YET_UPLOADED` error with a HTTP 504 status code.

For the `/download` endpoint, the server could also stream data directly as it
is being uploaded. However, streaming creates several implementation and spec
complications (e.g. how to stream if the media repo has multiple workers, what
to do if the upload is interrupted), so specifying exactly how streaming works
is left for another MSC.
Comment on lines +96 to +100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, this is sort of an implied feature from most sane media repo implementations: it's quite rare that a server ever wants to buffer the media into memory so will stream it from a cache, disk, or external service as it receives the data. By extension, they already will have had to figure out what stream interruption, multiple workers, etc means. However, clients obviously shouldn't rely on this functionality existing, but I also don't think it necessarily needs an MSC either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph has some hidden history, this is a change from the original requirement (include streaming) after I commented about some security concerns here.

TLDR: it's not as much about being able to stream, but that the file is mutable and uncommitted while it's streamed, that there's an edge case where; if a client closes the upload stream, starts again, but then uploads a wholly different file, how should everything then fail/proceed gracefully to accommodate this new situation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a thing we should fix as part of this MSC regardless, to be honest. Does a partial failed upload count as uploading the media? (ie: does the server mark the media as "uploaded" at the start or the request handling or at the end?) what happens if the client tries to upload the media twice (concurrently)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also poked at the "partial uploads" problem here; imo, there shouldn't be partial uploads, and the client should send content-length in the request body to have the server make sure it received everything before it commits the file.

Concurrent uploading is something I hadn't considered.


## Potential issues
Other clients may time out the download if the sender takes too long to upload
media.

## Alternatives

## Security considerations
turt2live marked this conversation as resolved.
Show resolved Hide resolved

The primary attack vector that must be prevented is a malicious user creating a
large number of MXC URIs and sending them to a room without uploading the
corresponding media. Clients in that room would then attempt to download the
media, holding open connections to the server and potentially exhausting the
number of available connections.

This attack vector is stopped in multiple ways:

1. Limits on `/create` prevent users from creating MXC URIs too quickly and also
require them to finish uploading files (or let some of their MXCs expire)
before creating new MXC URIs.

2. Servers are free to respond to `/download` and `/thumbnail` requests before
the `timeout_ms` has been reached and respond with `M_NOT_YET_UPLOADED`. For
example, if the server is under connection count pressure, it can choose to
respond to waiting download connections with `M_NOT_YET_UPLOADED` to free
connections in the pool.

3. Once the media is expired, servers can respond immediately to `/download` and
`/thumbnail` requests with `M_NOT_FOUND`.

## Future work

Future MSCs might wish to address large file uploads. One approach would be to
add metadata to the `/create` call via a query parameter (for example
`?large_file_upload=true`. Servers would have the ability to impose restrictions
on how many such "large file" uploads a user can have concurrently. For such a
situation, the server would likely send a more generous `unused_expires_at`
timestamp to allow for a long-running upload.

## Unstable prefix
While this MSC is not in a released version of the spec, implementations should
use `fi.mau.msc2246` as a prefix and as an `unstable_features` flag in the
`/versions` endpoint.

* `POST /_matrix/media/unstable/fi.mau.msc2246/create`
* `PUT /_matrix/media/unstable/fi.mau.msc2246/upload/{serverName}/{mediaId}`
* `?fi.mau.msc2246.timeout_ms`
* `FI.MAU.MSC2246_NOT_YET_UPLOADED`
* `FI.MAU.MSC2246_CANNOT_OVERWRITE_MEDIA`

[matrix-spec#432]: https://github.com/matrix-org/matrix-spec/issues/432
[MSC701]: https://github.com/matrix-org/matrix-doc/issues/701