Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Cache HashMap keys in TLS #33318

Merged
merged 1 commit into from May 20, 2016

Conversation

Projects
None yet
9 participants
@alexcrichton
Copy link
Member

alexcrichton commented May 1, 2016

This is a rebase and extension of #31356 where we not only cache the keys in
thread local storage but we also bump each key every time a new HashMap is
created. This should give us a nice speed bost in creating hash maps along with
retaining the property that all maps have a nondeterministic iteration order.

Closes #27243

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented May 1, 2016

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented May 10, 2016

ping?

@briansmith

This comment has been minimized.

Copy link

briansmith commented May 10, 2016

Why is it safe to key different HashMaps with keys that are known to each differ by only 1?

@eternaleye

This comment has been minimized.

Copy link
Contributor

eternaleye commented May 10, 2016

@briansmith: Because if it was not safe that would constitute a related-key attack, and none is known for SipHash?

@briansmith

This comment has been minimized.

Copy link

briansmith commented May 10, 2016

briansmith: Because if it was not safe that would constitute a related-key attack, and none is known for SipHash?

  1. The hash function isn't necessarily SipHash. It's pluggable per-hashmap and also the default may be changed.
  2. I don't find "we don't currently know of a related-key attack on SipHash" very encouraging. There are lots of ways to derive N secret keys from an initial secret key that are fast and don't depend on the related-key-security of the hashing function.
@eternaleye

This comment has been minimized.

Copy link
Contributor

eternaleye commented May 10, 2016

@briansmith: Fair enough on both points - Not being assured of SipHash means that it may well be an issue, and I agree that the margin of security even so is unnecessarily thin.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented May 10, 2016

This logic is only used by the SipHash hasher.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented May 10, 2016

r=me on the code and perf front. But I don't feel qualified to judge the security/DDoS protection side of this.

@briansmith

This comment has been minimized.

Copy link

briansmith commented May 10, 2016

But I don't feel qualified to judge the security/DDoS protection side of this.

Others have quoted this, but again, this is what the SipHash paper says:

On startup a program reads a secret SipHash key from the operating system’s cryptographic random-number generator; the program then uses SipHash for all of its hash tables.

Thus, the easiest thing to do is what the SipHash authors recommend: Use std::sync::Once to generate one key using the OsRng, and then use that same key for every hash table.

Think about the threat model: We assume that the attacker can add or remove arbitrary (key, value) entries from any hash table used in the program. From this, it follows that we assume the attacker can change any hash table A into any other hash table B by removing all the items from A and then copying all the entries from B into A. Thus, it seems to not help if A or B have different keys, at least under this threat model. If you have a different threat model, it would be a good idea to document it.

More generally, crypto people never generate a secret key by adding a constant value to another secret key. See https://en.wikipedia.org/wiki/Related-key_attack for an introduction to why. tl;dr: Knowing the difference of two secrets can help an attacker find the value of one (usually both) secrets, even if they wouldn't be able to find the values any other way. Because no crypto people would do this, it is unlikely that somebody will seriously study the problems that may or may not occur when somebody does what is proposed in this PR because we generally assume it is a-priori wrong to do.

HTH.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented May 17, 2016

@briansmith Thanks for the comment! That does indeed help highlight some of the tradeoffs here.

AIUI, the motivation for using these distinct (but related) keys is just to avoid clients of the default hashmap from accidentally assuming that all instances share a common key -- a behavior we could conceivably want to change in the future. But it could easily be that this cure is worse than the disease, and we'd be better off just very clearly documenting that you cannot rely on the apparent determinism. We just risk de facto lock-in to that behavior, but that seems (to me) better than taking a step that could easily end up revealing hashmap keys.

@rust-lang/libs Thoughts here?

@alexcrichton alexcrichton added the T-libs label May 18, 2016

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented May 18, 2016

Yeah I'm not too worried about switching to a per-process key with the risk of relying on a per-process deterministic iteration order. It's just a "nice to have" to make everything nondeterministic really I think.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented May 18, 2016

It seems to me that we could have this per-thread without the adjustment, but maybe having inter-thread differences isn't worth the slightly higher complexity vs. just being uniform through a process.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented May 19, 2016

Per-thread sounds like a good compromise across the board. @alexcrichton, want to update accordingly?

std: Cache HashMap keys in TLS
This is a rebase and extension of #31356 where we cache the keys in thread local
storage. This should give us a nice speed bost in creating hash maps along with
mostly retaining the property that all maps have a nondeterministic iteration
order.

Closes #27243

@alexcrichton alexcrichton force-pushed the alexcrichton:hashmap-seed branch from d7503b2 to eaeef3d May 19, 2016

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented May 20, 2016

Sounds like a plan to me, I've updated the PR, the comment, and I also tweaked to use OsRng directly. The thread_rng isn't necesssary when we cache per-thread b/c we're gonna hit OsRng for the first time on each thread anyway.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented May 20, 2016

Thanks!

@bors: r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 20, 2016

📌 Commit eaeef3d has been approved by aturon

@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 20, 2016

⌛️ Testing commit eaeef3d with merge 179539f...

bors added a commit that referenced this pull request May 20, 2016

Auto merge of #33318 - alexcrichton:hashmap-seed, r=aturon
std: Cache HashMap keys in TLS

This is a rebase and extension of #31356 where we not only cache the keys in
thread local storage but we also bump each key every time a new `HashMap` is
created. This should give us a nice speed bost in creating hash maps along with
retaining the property that all maps have a nondeterministic iteration order.

Closes #27243

@bors bors merged commit eaeef3d into rust-lang:master May 20, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@bluss bluss added the relnotes label May 21, 2016

@alexcrichton alexcrichton deleted the alexcrichton:hashmap-seed branch May 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.