Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2211: Identity Servers Storing Threepid Hashes at Rest #2211

Open
wants to merge 3 commits into
base: old_master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions proposals/2211-store-3pids-hashed-at-rest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Identity Servers Storing Threepid Hashes at Rest

The purpose of an identity server is to store mappings between third-party
identities (3PIDs) and Matrix User IDs. This allows users to associate an
email or a phone number with their Matrix account, for the purpose of letting
people who already know their phone number/email address find them on Matrix.

Since the inception of identity servers, 3PIDs have always been stored as
plaintext addresses. Due to protocol endpoints requiring plaintext addresses,
major implementations have always stored 3PID data as plaintext at rest. An
example is the [GET
/_matrix/identity/api/v1/3pid/getValidated3pid](https://matrix.org/docs/spec/identity_service/unstable#get-matrix-identity-api-v1-3pid-getvalidated3pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stick this endpoint in a code tag to prevent italics for the rest of the doc please. This is a theme throughout the proposal.

endpoint, which accepts lookups by users sending over plaintext mediums and
addresses. The identity server thus needs to store those plaintext values in
order to compare them.

Plaintext 3PIDs are a massive liability. If the database of the identity
server is ever compromised, 3PID addresses and mediums, as well as the Matrix
IDs they are associated with, are immediately compromised. If 3PIDs were
stored as hashes, attackers would need to first build a rainbow table to
reverse them, thus increasing the expense of compromising user's personal
information.

Storing 3PIDs as hashes at rest can be accomplished with a few protocol
changes. As recently done with [GET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link to the MSC would be great here.

/_matrix/identity/api/v1/lookup](https://matrix.org/docs/spec/identity_service/unstable#get-matrix-identity-api-v1-lookup),
endpoints can be modified to only accept hashes.

## Proposal

The following endpoints would need to be modified for identity servers to be
able to store 3PID hashes at rest:

* [POST /_matrix/identity/api/v1/validate/email/requestToken](https://matrix.org/docs/spec/identity_service/unstable#post-matrix-identity-api-v1-validate-email-requesttoken)

This endpoint needs a plaintext 3PID to send an email, but while waiting it
can store the address hashed.

* [GET /_matrix/identity/api/v1/3pid/getValidated3pid](https://matrix.org/docs/spec/identity_service/unstable#get-matrix-identity-api-v1-3pid-getvalidated3pid)

This endpoint needs to be changed to return a hash instead of `medium` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify how this is hashed? Do callers of this endpoint depend on it in order to retrieve the 3pid info?

`address` parameters.

* [POST /_matrix/identity/api/v1/3pid/unbind](https://matrix.org/docs/spec/identity_service/unstable#post-matrix-identity-api-v1-3pid-unbind)

This endpoint needs to be changed to have `threepid` be a hash instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly needs the hash method to be specified, so that the client knows how to hash the threepid to send the correct information. Though I'm not sure that it does need to be changed, since the client can send the information in the clear, and the server can just apply the hash.


* [POST /_matrix/identity/api/v1/store-invite](https://matrix.org/docs/spec/identity_service/unstable#post-matrix-identity-api-v1-store-invite)

This endpoint needs to be changed to remove parameters `medium`, and
`address`, and instead just have a new field containing a hash value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we can replace medium and address with a hash here, because this endpoint is supposed to send an email to the invited address, and if the address is hashed, then the server doesn't have access to the address.


Each of these endpoints will need to be changed to `v2`, and at the same time
we should drop the `/api/` part, since it is redundant. This lines up with
what was done for `/_matrix/identity/v2/lookup` in
[MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134).

Thus, the new endpoints should be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we're here, can we fix the casing to match up with our style doc?

  • request_token
  • store_invite
  • /3pid/validated


* POST /_matrix/identity/v2/validate/email/requestToken
* POST /_matrix/identity/v2/store-invite
* POST /_matrix/identity/v2/3pid/unbind
* GET /_matrix/identity/v2/3pid/getValidated3pid

It could probably be argued that `.../getValidated3pid` should just be `GET
/_matrix/identity/v2/3pid/getValidated` instead.

The `v1` versions of these endpoints should continue to work but be
deprecated, and eventually removed once clients/identity servers have
sufficiently implemented them.

Endpoints that would already work in this new hash-filled world are:

* [GET/POST /_matrix/identity/api/v1/validate/(email|phone)/submitToken](https://matrix.org/docs/spec/identity_service/unstable#post-matrix-identity-api-v1-validate-email-submittoken)
* [POST /_matrix/identity/api/v1/3pid/bind](https://matrix.org/docs/spec/identity_service/unstable#post-matrix-identity-api-v1-3pid-bind)

These endpoints just take token/session information, so no changes are
needed. All other endpoints would not need to be changed.

## Tradeoffs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other PR, how do we rotate hashes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be an implementation detail, no? I can mention it in the doc as such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how an IS can rotate hashes if it doesn't have the original 3PID, so I think this would require some funky stuff to happen to make it possible (or to just drop all existing values).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. This isn't spelled out anywhere but I think the idea was that the server would have a lifetime salt (probably just generated on first-run), and then rotated peppers for lookups.

So clients would have to hash with a salt then with a pepper.

I think the only benefit there would be in the case of MITM, or for storing invites where the sent hash is stored temporarily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more thinking of the case where we store hashes as SHA-1 then discover that its vulnerable and we need to use SHA-256 instead. What do we do?


There's still the GDPR concern that if an identity server does get
compromised, the administrators are obligated to notify everyone that hashes
were taken. Either Matrix can be used as the communication medium (does the
law disallow this?) or identity servers could send a message to homeservers,
which do have the plaintext 3PIDs, that they should send an email (this could
be horribly abused by an evil IS though, and not all homeservers have email
settings configured).

## Potential issues

Another sticking point to consider is identity servers that hook into
third-party data sources, such as LDAP, may have trouble answering requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like supporting LDAP is a hard requirement for us to be able to achieve this. I'm not sure how much of this API LDAP-enabled identity servers actually use though - I can imagine they'd just abuse /lookup.

that only feature a hash value. This may be solvable in implementation but
requires futher thought.

## Conclusion

With a few endpoint changes, we can enable identity servers to store user
contact information in a hashed format, thereby reducing the impact of a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a mention somewhere early in this proposal that we want to encourage identity servers to store hashed identifiers, but cannot enforce it. We can't define how implementations store data, but we sure can make it obvious how we want them to work.

compromised database.

While it can be argued that plaintext 3PIDs could be recovered from these
hashes, doing so is more effort for an attacker than simply gleaming a large
database of plaintext addresses.