Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2265: Proposal for mandating lowercasing when processing e-mail address localparts #2265

Open
wants to merge 9 commits into
base: master
from
@@ -0,0 +1,65 @@
# Proposal for mandating lowercasing when processing e-mail address localparts

[RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that
localparts in e-mail addresses must be processed with the original case
preserved. [The Matrix spec](https://matrix.org/docs/spec/appendices#pid-types)
doesn't mandate anything about processing e-mail addresses, other than the fact
that the domain part must be converted to lowercase, as domain names are case
insensitive.

On the other hand, most major e-mail providers nowadays process the localparts
of e-mail addresses as case insensitive. Therefore, most users expect localparts
to be treated case insensitively, and get confused when it's not. Some users,
for example, get confused over the fact that registering a 3PID association for
`john.doe@example.com` doesn't mean that the association is valid for
`John.Doe@example.com`, and don't expect to be expected to remember the exact
This conversation was marked as resolved by babolivier

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Sep 2, 2019

Member
Suggested change
`John.Doe@example.com`, and don't expect to be expected to remember the exact
`John.Doe@example.com`, and don't expect to have to remember the exact
case they used to initially register the association (and sometimes get locked
out of their account because of that). So far we've seen that confusion occur
and lead to troubles of various degrees over several deployments of Synapse and
Sydent.

## Proposal

This proposal suggests changing the specification of the e-mail 3PID type in
This conversation was marked as resolved by babolivier

This comment has been minimized.

Copy link
@richvdh

richvdh Sep 10, 2019

Member

what should be the behaviour of (a) an ID server when I try to /bind an email address that contains un-casefolded characters (b) a HS when I call an endpoint that proxies such a request?

This comment has been minimized.

Copy link
@babolivier

babolivier Sep 10, 2019

Author Member

Always lowercase/casefold in both cases, which is what I meant by "before any processing". I'll try to reword this expression to make it less vague.

[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types)
to mandate that any e-mail address must be entirely converted to lowercase
before any processing, instead of only its domain.

This comment has been minimized.

Copy link
@KitsuneRal

KitsuneRal Aug 31, 2019

Member

I wonder how much of complication is to mandate lower-case processing (such as lookup and hashing) but case-preserve storing addresses.

This comment has been minimized.

Copy link
@turt2live

turt2live Aug 31, 2019

Member

we ultimately can't tell implementations how to store their data (if they want to spend the extra time converting things to uppercase they can), but the requirement for lookups being lowercase is a fairly strong argument imo


## Other considered solutions

A first look at this issue concluded that there was no need to add such a
mention to the spec, and that it can be considered as an implementation detail.
This conversation was marked as resolved by babolivier

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Sep 2, 2019

Member
Suggested change
mention to the spec, and that it can be considered as an implementation detail.
mention to the spec, and that it can be considered an implementation detail.
However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes
this: because hashing functions are case sensitive, we need both clients and
identity servers to follow the same policy regarding case sensitivity.
This conversation was marked as resolved by anoadragon453

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Sep 2, 2019

Member

Additionally, we're looking to make everything hash-based in the future, which will make this proposal even more important.


## Tradeoffs

Implementing this MSC in identity servers would require the databases of
existing identity servers to be updated in a large part to convert the email
addresses of existing associations to lowercase, in order to avoid conflicts.
However, most of this update can usually be done by a single database query (or
a background job running at startup), so the UX improvement outweights this
This conversation was marked as resolved by babolivier

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Sep 2, 2019

Member
Suggested change
a background job running at startup), so the UX improvement outweights this
a background job running at startup), so the UX improvement outweighs this
trouble.

## Potential issues
This conversation was marked as resolved by anoadragon453

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Sep 2, 2019

Member

What about the case where two email addresses john.doe@example.com and John.Doe@example.com are registered to two separate user accounts on a homeserver? Is this is problem in the current homeserver landscape or can it just be a "tough luck and ignore"?

This comment has been minimized.

Copy link
@babolivier

babolivier Sep 2, 2019

Author Member

Oooh good catch. I think it's going to be as much as an issue for homeservers as for identity servers, and remediation solutions would look the same. I'll spell it out in the MSC.

This comment has been minimized.

Copy link
@babolivier

babolivier Sep 2, 2019

Author Member

Done in 520c76a

This comment has been minimized.

Copy link
@anoadragon453

Some users might already have two different accounts associated with the same
e-mail address but with different cases. This appears to happen in a small
number of cases, however, and can be dealt by the identity server's maintainer.
This conversation was marked as resolved by babolivier

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Sep 2, 2019

Member
Suggested change
number of cases, however, and can be dealt by the identity server's maintainer.
number of cases, however, and can be dealt with by the identity server's maintainer.

For example, with Sydent, the process of dealing with such cases could look
like:

1. list all MXIDs associated with a variant of the email address, and the
timestamp of that association
2. delete all associations except for the most recent one [0]
3. inform the user of the deletion by sending them an email notice to the email
address

## Footnotes

[0]: This is specific to Sydent because of a bug it has where v1 lookups are
already processed case insensitively, which means it will return the most recent
association for any case of the given email address, therefore keeping only this
association won't change the result of v1 lookups.
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.