-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for human ID rules. #3
Conversation
Includes handling of namespaces for bots, handing of capitalisation, spoof checks and escape sequences.
Clarify position on capitalisation.
Moar clarify.
Mention case canonicalisation on registration.
Thoughts:
|
I agree.
This can be left to individual HS policy. If there are particular sigils that are likely to be confusing then we should recommend that homeservers ban them. We could recommend that homeservers ban "@!#$:" and any other charaters that have special meaning for ids.
We want to stop people creating IDs or aliases that differ only by case. We apply the checks when the room or ID is created. We have used SHOULD because the protocol as currently written will continue to function if the homeserver disobeys the recommendation. It is the responsibility of the homeserver's to manage the IDs it owns and we are making recommendations about how it does so.
We should definitely apply unicode canonicalisation. (I think DNS uses NFC)?
I think the options that work from a technical perspective are:
I'm not sure how we would determine a suitable alternative for the room aliases.
I'm not sure about bridging Arathorn into @.matthew:matrix.org. If we are going to have decent crypto and security it seems wrong to encourage insecure protocols to masqurade as you. (Then again if you trust freenode, connect over SSL and have registered your nick it might not be too bad). I think we need to be able to bridge IRC users that don't have existing matrix accounts. I'd like it if we had a way to namespace those accounts. I guess dots match how we namespace event types. I personally don't mind using "-" like we currently do in the IRC bridge. |
In that case, we'd really need to use OAuth2 to provide limited access for a given third party to access the matrix account. Without that, it's obviously a huge security risk if any bridge can act freely on your behalf without your permission. I think we should be providing OAuth2 anyway, given "acting on behalf of a client" is a recurring problem. The limited access token would say only be able to send m.room.message in rooms you're in, and would not be able to invite / join / leave rooms. |
I'm pretty sure we want to be able to link random services (and apps) to our main account, otherwise we're going to end up with a billion fragmented accounts, which could be a terrible experience (unless we fix portability of accounts, or make multi-account clients really painless). I'm not sure that OAuth2 is good enough for this. See https://matrix.org/jira/browse/SPEC-79 for more discussion on this problem. |
In what way is OAuth2 deficient? You are free to choose an arbitrary set of "scope" permissions which can be granted via OAuth2, e.g. this is Google's: https://www.googleapis.com/discovery/v1/apis/oauth2/v2/rest?fields=auth(oauth2(scopes)) A good writeup on OAuth scopes: https://brandur.org/oauth-scope I would prefer we didn't re-invent the wheel on this... |
Am I hallucinating that we made some progress in finally locking this down? I'd much rather that we got this one landed and implemented in Synapse than adding more bells & whistles like #93... |
I don't think we've made any more progress on this. I'll try to push forward a revised edition which incorporates the past 10 months ❗ of work. This is a tricky one to do correctly because it can get tangled up with "should messages to @ foo:bar go to @ Foo:bar" - I will attempt to avoid this conflict as much as possible to increase the odds of us getting consensus. This does mean that the revised edition will not cover all bases, but that means it shouldn't be blocked on things like the "business card lookup API" or "how do we resolve case-mappings", etc. |
Done. With respect to the exemplar characters on the CLDR datasets, there are Python bindings for the library International Components for Unicode (ICU) which exposes the CLDR datasets. https://pypi.python.org/pypi/PyICU/ ICU itself exposes functions to do mappings to and from punycode: http://icu-project.org/apiref/icu4c/uidna_8h.html There's also a handy spoofing library: http://icu-project.org/apiref/icu4c/uspoof_8h.html which has the handy test of |
These checks are: | ||
|
||
User ID Localparts: | ||
- MUST NOT contain a ``:`` or start with a ``@`` or ``.`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't they start with @s or .s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the @ is described below (an arbitrary choice I guess) - I still don't know about the .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale for forbidding a . prefix was because at one point we were
going to namespace gateway'd user IDs as
@.irc.whatever.nickname:foo.com. This has been lost as we namespace
bridges any old way nowadays. Personally I still think it'd be useful
to reserve some prefixes as secondary sigils (twigils) if needed.
On 19/10/2015 16:14, Daniel Wagner-Hall wrote:
In drafts/human-id-rules.rst
#3 (comment):-- Error message MAY go into further information about which
characters were - rejected and why. -- Error message SHOULD
contain afailed_keys
key which contains an array - of strings
which represent the keys which failed the check e.g:: - -
failed_keys: [ user_id, room_alias ] - -Other considerations
--------------------- -- Basic security: Informational key on the
event attached by HS to say "unsafe +User IDs and Room Aliases MUST
be Unicode as UTF-8. Checks are performed on +these IDs by
homeservers to protect users from phishing/spoofing attacks. +These
checks are: + +User ID Localparts: + - MUST NOT contain a:
or
start with a@
or.
Oh, the @ is described below (an arbitrary choice I guess) - I still
don't know about the .— Reply to this email directly or view it on GitHub
https://github.com/matrix-org/matrix-doc/pull/3/files#r42383296.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I 100% agree with Ara on this: some future-proofing by reserving prefixes we may want in the future is a Good Thing imo.
After a discussion on #matrix:matrix.org about SPEC-1 i've gone and read through the latest proposal here. In general it feels good - but I have some concerns:
I'm wondering whether avoiding homograph attacks is a bit of a fool's quest, especially given simple ambiguities between I's and l's and 1's etc, and instead Kegan's suggestion of basically copying the IDN behaviour that Chrome uses is good enough for basic use of room aliases and user IDs. Meanwhile we'd rely more heavily in future on a reputation score to differentiate the real Slim Shady from SIim Shady. Another consideration is that user IDs, room aliases and user display names / room names all have subtly different uses:
The reason to try harder to disambiguate user IDs & aliases is because they MUST be unique, and they may be used out-of-band, and so a homograph attack which subverts that uniqueness is slightly more malicious than one applied to display names. For display names, we can hopefully rely on social means to generally keep people honest and avoid trying to impersonate one another within a room. If "fiona" is already in the room and a "fiona" (with a fi unicode ligature) joins with the same avatar and starts speaking, hopefully someone in the room will realise there's a doppleganger going on and check user IDs and call foul. Meanwhile, invites would always disambiguate using both reputation and user ID to avoid phishing attacks from an ambiguous display name & avatar. (We can also disambiguate within a room by using the date the user joined the room - "Matthew (joined Tue)" v. "Matthew (joined Feb)" etc. This disambiguation would be rendered clientside, the server just putting an advisory 'ambiguous' flag on the member so the client knows to warn somehow.) If this sounds sane, we should add a section about display names & disambiguation to the spec - presumably in the same place as vdH's rules on how to calculate display names. A silly suggestion that came up was to render IDs in a bunch of different fonts and check for sufficient structural dissimilarity to other IDs in the system, but this is pretty daft. |
I'd come to much the same conclusion (viz: that trying to prevent homograph attacks is a fool's errand; and that attempting to do so is likely to result in a situation where people mistakenly trust our ability to do so and then get caught out by the cases when we can't). As such, it's better to provide alternative means for users to establish trust where it matters: visual hashes, reputation scores, etc. I'd also come to the conclusion that one size does not necessarily fit all. Display names
hrm. If I were the second fiona, I'd join quietly and let my presence go unnoticed for a couple of days, and then start speaking. The chances of someone noticing this are minimal. So any attempt to disambiguate is only going to be a courtesy, where we have two non-malicious Matthews in a room and you'd like to keep track of which is which. I'm not even sure how effective that will be, particularly if people come and go from rooms, and Matthew in room X is different from Matthew in room Y. I'm tempted to give up any technological attempt to disambiguate and rely on some social means ("can one of you change your displayname?"); though of course that risks a "I was Matthew first!"/"I'm always Matthew" scenario. So practically speaking, as a service to the user, we should try to identify visually-similar display names and provide a means of disambiguation. What exactly that means is somewhat open to debate. The unicode consortium have some recommendations in this area: http://www.unicode.org/reports/tr36/#Visual_Spoofing_Recommendations.
IDs
|
Are there any actionable items from your discussion? I'm seeing lots of wandering through the woods but nothing concrete which I can add to the proposal. How do we feel about the mechanics that have been outlined in the proposal? That is:
The debate then revolves around what the recommendations are, and how far we go wrt homograph attacks, case mappings, etc. |
Sorry, this ended up with a bit of a stream of conciousness from me, regarding things which are only tangentially related to the PR. I suspect Matthew did the same.
I think these are excellent principles. As you say, the debate is around exactly what the restrictions/identity mappings are. |
drafts/human-id-rules.rst
Outdated
- MUST NOT contain a ``:`` or start with a ``@`` or ``.`` | ||
- MUST NOT contain one of the 107 blacklisted characters on this list: | ||
http://kb.mozillazine.org/Network.IDN.blacklist_chars | ||
- After stripping " 0-9, +, -, [, ], _, and the space character it MUST NOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong test - or at least misworded. 'A' is in the exemplar characters for both English and French, for example...
I think what you're after is that it only contains characters from one script, after ignoring Common and Inherited script characters, as per http://unicode.org/reports/tr39/#Mixed_Script_Detection. That may be overly restrictive, but it's probably easier to have something restrictive and relax it later, than vice versa.
(ditto for room aliases)
After 3 years, it's sadly pretty clear that we're not going to progress this as it currently stands. I think the |
hang on... rich: i thought this was obsoleted generally by the mxid formatting stuff you landed? |
I'd love to know how github decides whether or not it's going to email me when someone comments on a PR I'm subscribed to. Anyway:
There's more here than mxids: It also addresses room aliases. Though yes, on looking at it, much of it does seem to be obsolete now. Maybe we should just get rid of this file, then. |
…ng-improvements Async uploads rate limiting improvements
Includes handing of capitalisation, spoof checks and escape sequences.
Rendered: https://github.com/matrix-org/matrix-doc/blob/human-id-rules/drafts/human-id-rules.rst