MSC2703: Media ID grammar#2703
Conversation
| ## Proposal | ||
|
|
||
| Media IDs must be treated and represented as opaque identifiers. Only characters in the | ||
| [RFC 2986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. |
There was a problem hiding this comment.
It would be nice if media ids are not allowed to be . or ... Those can lead to issues with local client media caches. While a client should never rely on a path being safe, I think those 2 should be explicitly disallowed.
There was a problem hiding this comment.
how would this be a problem for client caches?
There was a problem hiding this comment.
If you cache the files on a traditional file system, a media id like . and .. would be a reserved filename on many systems mapping to current directory or parent directory respectively. Since in your proposal the media id can't contain /, the use of that is limited, if you want to escape a directory and overwrite other files, but you still wouldn't be able to create files like that. So you would need to work around those being valid ids and need to find alternative mappings. I think all other character combinations should be fine though. That way you can just cache files like <server part>/<media id> in the filesystem, making the client side caches much simpler.
There was a problem hiding this comment.
It's a super bad idea to cache the media IDs as-is on a traditional file system regardless of this MSC - don't do that.
There was a problem hiding this comment.
Because it's effectively untrusted user input and file storage 101: don't use the name given to you when storing files. Collisions, directory issues, etc are all possible which can cause problems for people.
There was a problem hiding this comment.
If you validate, that the id only contains ALPHA / DIGIT / "-" / "." / "_" / "~" and is not one of the special filenames, you only have collision issues now. If 2 media ids collide, you already have bigger issues, so I'm not sure, if that applies here.
There was a problem hiding this comment.
You'd hopefully not be storing just the bare media ID and would instead be accompanying it with a domain, which can theoretically have the same problem: we don't actually specify that the origin needs to be a server, just heavily imply it.
There was a problem hiding this comment.
It's a super bad idea to cache the media IDs as-is on a traditional file system regardless of this MSC - don't do that.
Yeah. Synapse definitely wouldn't do that 🤦
| ## Proposal | ||
|
|
||
| Media IDs must be treated and represented as opaque identifiers. Only characters in the | ||
| [RFC 2986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. |
There was a problem hiding this comment.
| [RFC 2986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. | |
| [RFC 3986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. |
Also, maybe relax the list to pchar instead?
|
this looks like an attempt at fixing matrix-org/matrix-spec#503. An alternative proposal was made in MSC1597 (https://github.com/matrix-org/matrix-spec-proposals/blob/rav/proposals/id_grammar/proposals/1597-id-grammar.md#opaque-ids), though I think it says exactly the same thing. |
|
I haven't checked, but I can believe that. From memory, the plan here was to try and get something through a bit faster than the general case, but considering that clearly hasn't happened, let's just close this one. |
|
personally I'd say the reason that MSC1597 hasn't gone anywhere is because it is trying to fix everything at once. Splitting it up seems like a good idea to me... |
|
fair :D We can always ressurect this MSC or make a new one - I don't really have bandwidth in any case to properly give attention to this one (and honestly forgot it existed...) |
Rendered