Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2134: Identity Hash Lookups #2134

Open
wants to merge 43 commits into
base: master
from

Conversation

@Half-Shot
Copy link
Contributor

commented Jun 15, 2019

To fix #2130

Rendered


The room to discuss this MSC is #hashing-msc:amorgan.xyz.


FCP: #2134 (comment) ~ Travis


We're aware that this MSC has quite a few comments under its belt. New readers should only be concerned with the current state of the proposal (i.e what's under the Rendered link) and the unresolved comments below. ~ anoa

@Half-Shot Half-Shot added the proposal label Jun 15, 2019

@turt2live turt2live changed the title MSC 2134: Identity Hash Lookups MSC2134: Identity Hash Lookups Jun 15, 2019

Half-Shot added some commits Jun 15, 2019

@ara4n

This comment has been minimized.

Copy link
Member

commented Jun 15, 2019

thanks for the proposal - looks generally sane.

we need to consider salts; i’d have thought we should include a random salt in the hash and alongside the hash to make rainbow table lookups slightly harder?


```python
address = "willh@matrix.org"
digest = hashlib.sha256(address.encode()).digest()

This comment has been minimized.

Copy link
@ara4n

ara4n Jun 15, 2019

Member

this should prolly be {H(email + salt), salt} to make rainbow table attacks slightly harder

This comment has been minimized.

Copy link
@ara4n

ara4n Jun 15, 2019

Member

(although salt would have to be constant for everyone, so it doesn’t buy much)

This comment has been minimized.

Copy link
@Half-Shot

Half-Shot Jun 15, 2019

Author Contributor

Would this require us to do a negotiation step where the server sends it's unique salt, and then hash it using that salt? Bit more involved but would ensure uniqueness?

(Alternatively, use the server's server_name as the salt?)

This comment has been minimized.

Copy link
@ara4n

ara4n Jun 15, 2019

Member

the question is how bound 3pids are stored. if they are stored as hashes, then we have no choice but have a predictable salt in the hash, which doesn’t achieve much other than stop arbitrary pregenerated hash tables of mail addresses from working. instead, you’d have to pregenerate H(email + “foo”) tables specifically for matrix, where foo is defined in this MSC.. We can’t use server_name as when you’re doing a lookup you don’t know any server_names for the target.

if we store the 3pids as plain text equivalent in the IS, then the salt could be random per lookup, as the server can calculate its own hashes to compare against the uploaded ones. however, this would be pretty inefficient, plus we probably don’t want to be storing unhashed 3pids. it would mean an attacker would have to calculate hash tables for every salt (ie per lookup item) to crack the hash.

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 17, 2019

Member

Specifying a salt in the spec does at least drive away lazy attackers who want to quickly use their existing tables. It's a bit of a weak argument, but if we can force people to do even a little bit of Matrix-specific logic to exploit a vulnerability then we are in a slightly safer position.

We can also specify a SHOULD for servers to hash and salt the hashes of 3PIDs when persisting them, making rainbow tables much harder to use at the persistence level. It does a look a bit dirty (sha256(sha256(3pid), salt)), but it might be enough to ward off attacks on the persisted values.

@Half-Shot

This comment has been minimized.

Copy link
Contributor Author

commented Jun 16, 2019

After much thought (and @KitsuneRal 's well made point), I don't think this MSC is going to add any substantial security or privacy to the current endpoints. The more I look at this issue, the more I realize that lookups are inherently flawed as they will always leak your contacts.

There is possibly some small gain for hashing strings with more variance such as email or social handles, but zero gain for a phone number. I would argue that for private deployments this feature is surplus to requirement, as you would be expected to trust your identity server.

So in short, I'm not sure if this MSC is the right thing to fix the problem. If anything this looks more like a problem for app developers and transparency of their UX.

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated

```python
address = "willh@matrix.org"
digest = hashlib.sha256(address.encode()).digest()

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 17, 2019

Member

Specifying a salt in the spec does at least drive away lazy attackers who want to quickly use their existing tables. It's a bit of a weak argument, but if we can force people to do even a little bit of Matrix-specific logic to exploit a vulnerability then we are in a slightly safer position.

We can also specify a SHOULD for servers to hash and salt the hashes of 3PIDs when persisting them, making rainbow tables much harder to use at the persistence level. It does a look a bit dirty (sha256(sha256(3pid), salt)), but it might be enough to ward off attacks on the persisted values.

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
- `/_matrix/identity/api/v2/bulk_lookup`

The parameters will remain the same, but `address` should no longer be in a plain-text
format. `address` will now take a SHA-256 format hash value, and the resulting digest should

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Jun 17, 2019

Member

Why not a truncated hash for extra security + smaller network load?

This comment has been minimized.

Copy link
@anoadragon453

anoadragon453 Jun 18, 2019

Member

I would do this, but I'm not sure what's safe to truncate by...

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
@anoadragon453

This comment has been minimized.

Copy link
Member

commented Jun 17, 2019

I still think this proposal has much value for non-msisdn 3PIDs, and a little value for phone numbers. Imagine if you had a service where users could only use passwords that were 6 chars long. It would still be valuable not to store those passwords in plain text.

I do wonder though, could bcrypt not be used for this? Does it provide any extra security?

Half-Shot and others added some commits Jun 17, 2019

Update proposals/2134-identity-hash-lookup.md
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
@anoadragon453

This comment has been minimized.

Copy link
Member

commented Jun 18, 2019

I have taken over this MSC and added salt pepper and other details.

anoadragon453 added some commits Jun 18, 2019

@anoadragon453

This comment has been minimized.

Copy link
Member

commented Jun 19, 2019

Adding a mechanism for identity servers to use their own salt by suggestion of @mathijs:matrix.vgorcum.com.

@mvgorcum

This comment has been minimized.

Copy link

commented Jun 19, 2019

modulo my small comments lgtm.

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md
@anoadragon453

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

As the community seems largely happy with this in it's current state, and in the interest of bringing more Spec Core Team eyes on it:

@mscbot fcp merge

/me realises we've already called for FCP on this but it's been a very long time since.

changes addressed

@anoadragon453 anoadragon453 requested a review from matrix-org/spec-core-team Jul 3, 2019

@turt2live turt2live self-requested a review Jul 3, 2019

@richvdh
Copy link
Member

left a comment

generally seems very sensible. I have some quibbling about error cases though.

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
`v1` versions of these endpoints may be disabled at the discretion of the
implementation, and should return a HTTP 404 if so.

If an identity server is too old and a HTTP 404, 405 or 501 is received when

This comment has been minimized.

Copy link
@richvdh

richvdh Jul 5, 2019

Member

https://matrix.org/docs/spec/server_server/unstable#put-matrix-federation-v2-invite-roomid-eventid says "400 or 404".

501 means "I don't understand POST at all" ), which is different to "I don't support POST at this endpoint".

405 means "I support this endpoint for other methods (eg GET), but you can't POST to it".

Neither seems relevant here.

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
@turt2live
Copy link
Member

left a comment

regarding the code block comments: although it renders in a code block, 3/4 of the diff is italicized because of the lack of code blocks. Please use backticks so the diff is legible.

Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
Show resolved Hide resolved proposals/2134-identity-hash-lookup.md Outdated
## Conclusion
This proposal outlines a simple method to stop bulk collection of user's
contact lists and their social graphs without any disastrous side effects. All

This comment has been minimized.

Copy link
@turt2live

turt2live Jul 5, 2019

Member

This point of the conclusion needs to be much higher in the proposal. After reading the entire thing I was left with "but why are we doing this? The IS would still be storing 3PIDs in plain text, so what's the point in hashing only lookups?" - this sentence answers this question: because the IS may not know about a particular address and could be maliciously collecting it. This does subsequently raise the question of why people are sending their address books to untrusted sources though, and how that fits into the security model for Matrix.

So I raise the question again: what's the point of this MSC? What practical problems does it solve?

This comment has been minimized.

Copy link
@mvgorcum

mvgorcum Jul 5, 2019

The current form of this MSC, as far as I can see, does two things:

  • It signals that the identity server is not meant to be used to collect the full content of all address books upon a bulk lookup. Unfortunately that's not possible without trusting that the identity server doesn't try to collect all data, even with this proposal. The only real value it has in this regard is in the signal it sends.
  • Raise the bar of obtaining the full list of mixds registered at an identity server, by requiring the attacker to build a custom rainbow table for said IS. This is still not adequate protection against such a data extraction, but may help deter lazy attackers.
@turt2live

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

@mscbot concern Overly restrictive hash algorithm negotiation

@anoadragon453 anoadragon453 requested review from turt2live and richvdh Jul 8, 2019

@anoadragon453

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

@mscbot concern Overly restrictive hash algorithm negotiation

Hopefully this is now solved?

While this reduces the resources necessary to generate a rainbow table for
attackers, a fast hash is necessary if particularly slow mobile clients are
going to be hashing thousands of contact details. Other algorithms can be
negotiated by the client and server at their discretion.

This comment has been minimized.

Copy link
@turt2live

turt2live Jul 9, 2019

Member

I realize my comments are getting extremely nitpicky, but: it's not at their discretion. I'd just take off the last 3 words of the sentence and leave it at that.

@@ -212,6 +215,10 @@ Phone numbers (with their relatively short possible address space of 12
numbers), short email addresses, and addresses of both type that have been
leaked in database dumps are more susceptible to hash reversal.

Mediums and peppers are appended to the address as to prevent a common prefix
for each plain-text string, which prevents attackers from pre-computing bits

This comment has been minimized.

Copy link
@turt2live

turt2live Jul 9, 2019

Member

This still poses a problem: now attackers can determine the approximate length of 3pid because patterns are really easy to find in strings (the pepper/type being in the middle/end helps figure out where the boundary is).

This comment has been minimized.

Copy link
@uhoreg

uhoreg Jul 10, 2019

Member

I don't see how an attacker can determine the length of the 3pid, given the hash

@turt2live

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

@mscbot resolve Overly restrictive hash algorithm negotiation
@mscbot concern I have overarching concerns with this proposal

(I also need something to block FCP because I can't uncheck my damn box)

@turt2live

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

@mscbot concern Transparent identity servers are no longer possible with this MSC.

(please can we fix the bot so I can stop doing this?)

@anoadragon453 anoadragon453 requested a review from turt2live Jul 12, 2019

@turt2live

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

@mscbot resolve Transparent identity servers are no longer possible with this MSC.

I still have concerns with the overall complexity of this MSC being introduced to solve a problem we don't really have. The complexity outweighs the gain, given the limited scope of the problem (which is still super vague in this proposal - this also needs fixing).

@anoadragon453

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

@turt2live Really all this helps solve is that a percentage of non-matrix contacts that are sent to the IS will be obscured, such that the excess data that it and a MITM attacker can gain is reduced.

It's not a huge step but the implementation required is fairly simple and a small but noticeable step in the right direction. I think it's worthwhile.

I attended a conference this weekend and one of the panels was arguing over the rules of children viewing unsightly things on the Internet that had passed in the UK. They were fishing from the audience for solutions to the problem of being able to verify someone's age online without causing too much invasion of privacy.

During the talk, it clicked that the solution here really shouldn't be a technical one, which kids would be incentivized to skirt around, but rather a matter of simply educating children about the dangers of the Internet, and let them find their own path instead.

I think this is somewhat relevant here, and indeed exactly what you were saying a week ago, where we should be very clear about what will be sent to the server so that the user can decide whether or not to consent. Of course, this puts a damper on the fluidity of our mobile apps, as we'd rather not throw a wall of text at the user when they first open it up. But informing the user about what it means and letting them make their own decision is the simplest and most elegant solution in my opinion.

I still think it's important to perform some hashing though, to thwart off the most lazy of attackers. Because nothing can ever be made 100% secure. You just want to make it inconvenient enough for most people to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.