Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1607: Proposal for room alias grammar #1607

Draft
wants to merge 2 commits into
base: old_master
Choose a base branch
from
Draft

Conversation

richvdh
Copy link
Member

@richvdh richvdh commented Aug 29, 2018

User IDs are no longer considered human-readable, so we can get rid of a bunch
of stuff around that.

On the other hand, I think there's a bunch more stuff we need to consider for
aliases.
@richvdh
Copy link
Member Author

richvdh commented Aug 29, 2018

This is very much WIP, in case that wasn't clear. That said, if anyone considers themselves knowledgeable in this field and could help out, I'd certainly appreciate it.

@richvdh richvdh changed the title Proposal for room alias grammar MSC1608: Proposal for room alias grammar Aug 29, 2018
@turt2live turt2live added this to In review (just the PRs) in August 2018 r0 via automation Aug 30, 2018
@turt2live turt2live removed this from In review (just the PRs) in August 2018 r0 Aug 31, 2018
@turt2live turt2live added this to In review (just the PRs) in August 2018 r0 Sep 4, 2018
As with other identifiers using the common identifier format, the ``domain`` is
a `server name`_ - in this case, the server hosting this alias which may be
contacted to resolve the alias to a room ID. The ``domain`` may be an
internationalized domain name, encoded using `punycode`_. When displaying the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to allow unicode domain names in aliases? It feels a bit weird to say that domains are fairly well defined everywhere else, except here.

contain characters from >1 language, defined by the `exemplar characters`_
on http://cldr.unicode.org/
.. _punycode: https://tools.ietf.org/html/rfc3492
.. _RFC3490: https://tools.ietf.org/html/rfc3490
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC3490 isn't referenced

@ara4n
Copy link
Member

ara4n commented Sep 5, 2018

As per RL chat whilst triaging r0 bugs: i was wondering if the whole problem of handling obscure alias formats could just be punted to the dir server, rather than trying to define a canonical format for use on the line.

In other words, something like case sensitivity or even utf8 canonicalisation could be handled by letting the protocol go on about #Foo:matrix.org and #foo:matrix.org equally... but then the directory server which actually resolves the alias to a room ID could be the one to apply whatever logic it likes to resolve it to an unambiguous room ID.

@turt2live
Copy link
Member

I think punting the alias to be a directory server problem is fine in general (where it can decide if #Foo and #foo are the same), however some sort of standard for what is allowed in an alias needs to be defined. Clients are expected to display aliases and should at least have an idea of what to expect. Servers to a lesser degree should also be aware of what is allowed for transmission purposes, but may ultimately decide that all of its aliases must be ASCII or whatever (and therefore a subset of unicode).

So with that being said, I think we can take out all the encoding and decoding text proposed here, particularly the NFC requirement (unless there's something I'm missing about having it here?).

In terms of going through the list of potential things to forbid, I'm including some replies here rather than starting however many threads on github, muddying the document. Sorry.

Invalid utf-8 is probably a given in terms of being disallowed. Saying it must be valid feels a bit like saying "this must be URL-encoded".

Preventing the 107 blacklisted characters ( http://kb.mozillazine.org/Network.IDN.blacklist_chars ) seems sensible. Same with NAMEPREP as it feels in the same sort of line. Preventing multiple languages in the same string seems sensible as well.

RFC5829 is a bit complicated, and the disallowed characters generally look sane minus the emoji stuff - a more fine tooth comb is probably needed to go through it. I think the CONTEXTO and CONTEXTJ stuff should be safe to ban.

I'm not overly convinced that we should allow anything more than a plain sever name in an alias. Being able to find the server to throw the byte sequence at is more important in my opinion than forcing servers to distill the server name out of an alias.

Unrelated: what happened with the tests? Apparently there are none?

@turt2live
Copy link
Member

Also, whatever ends up being accepted here should probably be translated to the event validation spec as allowed unicode because aliases show up in events.

@ara4n
Copy link
Member

ara4n commented Sep 11, 2018

Something generally feels wrong with this whole approach - listing every possible ambiguous utf8 feature feels like a nightmare for servers to implement and always a bit of a lost cause. What about combining diacritics, for instance, which can be used for homoglyphs (especially if the client applies a bounding box to the IDs, as Riot and friends do to stop zalgos from overflowing everywhere).

Perhaps we should be considering a whitelist rather than a blacklist?

Or finding some social way to mitigate the risks of phishing attacks by malicious aliases? (e.g. reputational scores to see whether the people in the room are the people are plausible and trustworthy or not)?

@neilisfragile neilisfragile moved this from To Do to In Progress: Planned Project Work in Superceded by https://github.com/orgs/matrix-org/projects/8 Sep 21, 2018
@richvdh richvdh moved this from In Progress: Planned Project Work to To Do S2S r0 in Superceded by https://github.com/orgs/matrix-org/projects/8 Oct 12, 2018
@turt2live
Copy link
Member

@richvdh you have some conflicts on this. Also, it is considered a build failure because the branch is too old for the build system.

@anoadragon453 anoadragon453 added proposal A matrix spec change proposal and removed T-Core labels Jan 4, 2019

User ID Localparts:
- MUST NOT contain a ``:`` or start with a ``@`` or ``.``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably keep at least the : restriction.

@turt2live turt2live removed the r0 P2 label Jun 3, 2019
@richvdh richvdh removed the proposal A matrix spec change proposal label Oct 10, 2019
@turt2live turt2live changed the title MSC1608: Proposal for room alias grammar MSC1607: Proposal for room alias grammar Apr 6, 2021
@turt2live turt2live added client-server Client-Server API and removed proposal-pr labels Apr 6, 2021
@turt2live turt2live added kind:maintenance MSC which clarifies/updates existing spec proposal A matrix spec change proposal labels Apr 6, 2021
@turt2live turt2live marked this pull request as draft April 6, 2021 02:31
@turt2live
Copy link
Member

FYI I've rolled the number over to the PR number given this is the only remaining example of an ancient process. There's no actual effect to the process.

@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:maintenance MSC which clarifies/updates existing spec needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
No open projects
August 2018 r0
  
In review (just the PRs)
Development

Successfully merging this pull request may close these issues.

None yet

5 participants