Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2265: Proposal for mandating case folding when processing e-mail address localparts #2265

Merged
merged 10 commits into from Jun 7, 2020
65 changes: 65 additions & 0 deletions proposals/2265-email-lowercase.md
@@ -0,0 +1,65 @@
# Proposal for mandating lowercasing when processing e-mail address localparts

[RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that
localparts in e-mail addresses must be processed with the original case
preserved. [The Matrix spec](https://matrix.org/docs/spec/appendices#pid-types)
doesn't mandate anything about processing e-mail addresses, other than the fact
that the domain part must be converted to lowercase, as domain names are case
insensitive.

On the other hand, most major e-mail providers nowadays process the localparts
of e-mail addresses as case insensitive. Therefore, most users expect localparts
to be treated case insensitively, and get confused when it's not. Some users,
for example, get confused over the fact that registering a 3PID association for
`john.doe@example.com` doesn't mean that the association is valid for
`John.Doe@example.com`, and don't expect to be expected to remember the exact
babolivier marked this conversation as resolved.
Show resolved Hide resolved
case they used to initially register the association (and sometimes get locked
out of their account because of that). So far we've seen that confusion occur
and lead to troubles of various degrees over several deployments of Synapse and
Sydent.

## Proposal

This proposal suggests changing the specification of the e-mail 3PID type in
babolivier marked this conversation as resolved.
Show resolved Hide resolved
[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types)
to mandate that any e-mail address must be entirely converted to lowercase
before any processing, instead of only its domain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much of complication is to mandate lower-case processing (such as lookup and hashing) but case-preserve storing addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we ultimately can't tell implementations how to store their data (if they want to spend the extra time converting things to uppercase they can), but the requirement for lookups being lowercase is a fairly strong argument imo


## Other considered solutions

A first look at this issue concluded that there was no need to add such a
mention to the spec, and that it can be considered as an implementation detail.
babolivier marked this conversation as resolved.
Show resolved Hide resolved
However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes
this: because hashing functions are case sensitive, we need both clients and
identity servers to follow the same policy regarding case sensitivity.
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved

## Tradeoffs

Implementing this MSC in identity servers would require the databases of
existing identity servers to be updated in a large part to convert the email
addresses of existing associations to lowercase, in order to avoid conflicts.
However, most of this update can usually be done by a single database query (or
a background job running at startup), so the UX improvement outweights this
babolivier marked this conversation as resolved.
Show resolved Hide resolved
trouble.

## Potential issues
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved

Some users might already have two different accounts associated with the same
e-mail address but with different cases. This appears to happen in a small
number of cases, however, and can be dealt by the identity server's maintainer.
babolivier marked this conversation as resolved.
Show resolved Hide resolved

For example, with Sydent, the process of dealing with such cases could look
like:

1. list all MXIDs associated with a variant of the email address, and the
timestamp of that association
2. delete all associations except for the most recent one [0]
3. inform the user of the deletion by sending them an email notice to the email
address

## Footnotes

[0]: This is specific to Sydent because of a bug it has where v1 lookups are
already processed case insensitively, which means it will return the most recent
association for any case of the given email address, therefore keeping only this
association won't change the result of v1 lookups.