Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2265: Proposal for mandating lowercasing when processing e-mail address localparts #2265

Open
wants to merge 8 commits into
base: master
from

Conversation

@babolivier
Copy link
Member

commented Aug 30, 2019

Rendered

@babolivier babolivier changed the title Proposal for mandating lowercasing when processing e-mail address localparts MSC2265: Proposal for mandating lowercasing when processing e-mail address localparts Aug 30, 2019

@turt2live turt2live self-requested a review Aug 30, 2019

@turt2live
Copy link
Member

left a comment

seems sane to me

This proposal suggests changing the specification of the e-mail 3PID type in
[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types)
to mandate that any e-mail address must be entirely converted to lowercase
before any processing, instead of only its domain.

This comment has been minimized.

Copy link
@KitsuneRal

KitsuneRal Aug 31, 2019

Member

I wonder how much of complication is to mandate lower-case processing (such as lookup and hashing) but case-preserve storing addresses.

This comment has been minimized.

Copy link
@turt2live

turt2live Aug 31, 2019

Member

we ultimately can't tell implementations how to store their data (if they want to spend the extra time converting things to uppercase they can), but the requirement for lookups being lowercase is a fairly strong argument imo

proposals/2265-email-lowercase.md Outdated Show resolved Hide resolved
proposals/2265-email-lowercase.md Outdated Show resolved Hide resolved
proposals/2265-email-lowercase.md Show resolved Hide resolved
proposals/2265-email-lowercase.md Outdated Show resolved Hide resolved
proposals/2265-email-lowercase.md Outdated Show resolved Hide resolved
proposals/2265-email-lowercase.md Show resolved Hide resolved
babolivier and others added 6 commits Sep 2, 2019
Wording
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Wording
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Typo
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Wording
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
@anoadragon453

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Seems like general consensus overall.

@mscbot fcp merge

@mscbot

This comment has been minimized.

Copy link
Collaborator

commented Sep 3, 2019

Team member @anoadragon453 has proposed to merge this. The next step is review by the rest of the tagged people:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@richvdh
Copy link
Member

left a comment

So some questions on this of the sort that arise whenever case mapping comes up:

Are you sure that lower-casing, as opposed to casefolding, is the right thing to do? Examples of the difference:

  • ß (german lower-case long 's', upper-case equivalent 'SS') case-folds to 'ss', so that 'hans.voß' matches 'HANS.VOSS'. (On the other hand: it's not entirely obvious that they should be treated the same)
  • ς (greek lower-case sigma, when used at the end of the word) case-folds to 'σ' (regular lower-case sigma), so that 'ΣΊΣΥΦΟΣ' matches 'σίσυφος'.

Relatedly: should we consider unicode normalisation, so that (for example) 'ê' (U+00EA, e with circumflex) is treated the same as 'ê' (U+0065 U+0302, e followed by circumflex combining character)?

Neither of the above solve the 'French problem' where (traditionally) accents are omitted on upper-case characters, so 'COTE' should be equivalent to 'côté'...

@babolivier

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

Relatedly: should we consider unicode normalisation, so that (for example) 'ê' (U+00EA, e with circumflex) is treated the same as 'ê' (U+0065 U+0302, e followed by circumflex combining character)?

I guess it depends on whether common email providers treat both characters as the same. I'll do some investigation around that.

Neither of the above solve the 'French problem' where (traditionally) accents are omitted on upper-case characters, so 'COTE' should be equivalent to 'côté'...

Should it, though, keeping in mind we're only looking at email addresses here? I just checked on both Gmail and Hotmail and neither of them consider bréndan.abolivier@... as being the same as brendan.abolivier@..., and I'm not aware of any provider that does.

Otherwise, yes, casefold is probably the way to go, I'll update the proposal to reflect that.

@richvdh

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

tbh I wasn't aware that gmail let you use non-ascii localparts at all. Certainly being guided by the behaviour of common providers seems like a sensible idea. (also: sorry for not starting a thread.)


## Proposal

This proposal suggests changing the specification of the e-mail 3PID type in

This comment has been minimized.

Copy link
@richvdh

richvdh Sep 10, 2019

Member

what should be the behaviour of (a) an ID server when I try to /bind an email address that contains un-casefolded characters (b) a HS when I call an endpoint that proxies such a request?

This comment has been minimized.

Copy link
@babolivier

babolivier Sep 10, 2019

Author Member

Always lowercase/casefold in both cases, which is what I meant by "before any processing". I'll try to reword this expression to make it less vague.

@babolivier

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

tbh I wasn't aware that gmail let you use non-ascii localparts at all

Neither was I. For full context, I've tried Thunderbird (+OVH), Roundcube (+OVH), Hotmail and Gmail and only this last one accepted a non-ascii localpart in the recipient's address.

@neilisfragile neilisfragile moved this from In progress to Review in Homeserver Task Board Sep 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.