Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar for completely opaque IDs (SPEC-388) #174

Open
matrixbot opened this issue Apr 19, 2016 · 5 comments · May be fixed by matrix-org/matrix-spec-proposals#1597
Open

Grammar for completely opaque IDs (SPEC-388) #174

matrixbot opened this issue Apr 19, 2016 · 5 comments · May be fixed by matrix-org/matrix-spec-proposals#1597
Labels
feature Suggestion for a significant extension which needs considerable consideration

Comments

@matrixbot
Copy link
Member

matrixbot commented Apr 19, 2016

"Grammar" might be too strong a word, but we should probably make explicit that the following IDs are entirely implementation-specific byte sequences. The originators are allowed to create them however they like, and the recipient has to send them back as they arrived.

(Imported from https://matrix.org/jira/browse/SPEC-388)

(Reported by @richvdh)

@matrixbot
Copy link
Member Author

Jira watchers: @richvdh

@matrixbot
Copy link
Member Author

matrixbot commented Apr 19, 2016

Links exported from Jira:

relates to SPEC-1

@matrixbot
Copy link
Member Author

Hrm; there are encoding difficulties here.

Some of these IDs end up in JSON strings, which means that they must be interpreted as a sequence of unicode characters - they are not just byte sequences. Likewise, because our URIs are %-encoded UTF-8, having opaque byte sequences in our URIs would require part of a URI to be parsed as UTF-8, and part as 8-bit data, which most URI parsers would not be happy with.

As I see it there are two options here:

  • Allow any unicode characters in these IDs, which puts the onus on recipients to correctly handle unicode characters - for instance, a client would need to parse UTF-16 \uXXXX sequences in the JSON response to POST /user/$id/filter, and then encode it as %-encoded UTF-8 in subsequent URI parameters.
  • Restrict to a common set of ASCII, which puts the onus on originators to make sure that they aren't generating other characters.

Postel's law should guide us here. My inclination is to restrict these IDs to unreserved URI characters (ie, \[A-Za-z0-9._~-]: see RFC3986) - but also to recommend that, if you receive such an ID, you parse it as a unicode string and re-encode it correctly when sending it on. This has the advantage that if you're writing a hacky bash script, you don't need to worry about escaping at all, whilst those creating IDs can still use base-64 to encode whatever they want.

-- @richvdh

@matrixbot
Copy link
Member Author

* is used as a wildcard for device id, so must be forbidden as a device id.

-- @richvdh

@matrixbot matrixbot changed the title Grammar for completely opaque APIs Grammar for completely opaque APIs (SPEC-388) Oct 31, 2016
@matrixbot matrixbot added the feature Suggestion for a significant extension which needs considerable consideration label Nov 7, 2016
@richvdh richvdh changed the title Grammar for completely opaque APIs (SPEC-388) Grammar for completely opaque IDs (SPEC-388) Jul 26, 2018
@turt2live turt2live mentioned this issue Mar 1, 2022
12 tasks
@turt2live turt2live self-assigned this Sep 6, 2018
@turt2live turt2live removed their assignment Sep 14, 2018
@richvdh
Copy link
Member

richvdh commented Jan 20, 2021

Since the links are hard to find above:

Proposals:

  • MXC1597 contains our current best proposal for this in general
  • MXC2746 includes proposals for call IDs

Other tracking issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Suggestion for a significant extension which needs considerable consideration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants