Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upstd: Cache HashMap keys in TLS #33318
Conversation
rust-highfive
assigned
aturon
May 1, 2016
This comment has been minimized.
This comment has been minimized.
|
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
alexcrichton
referenced this pull request
May 1, 2016
Closed
Seed HashMaps thread-locally, straight from the OS. #31356
This comment has been minimized.
This comment has been minimized.
|
ping? |
This comment has been minimized.
This comment has been minimized.
briansmith
commented
May 10, 2016
|
Why is it safe to key different HashMaps with keys that are known to each differ by only 1? |
This comment has been minimized.
This comment has been minimized.
|
@briansmith: Because if it was not safe that would constitute a related-key attack, and none is known for SipHash? |
This comment has been minimized.
This comment has been minimized.
briansmith
commented
May 10, 2016
|
This comment has been minimized.
This comment has been minimized.
|
@briansmith: Fair enough on both points - Not being assured of SipHash means that it may well be an issue, and I agree that the margin of security even so is unnecessarily thin. |
This comment has been minimized.
This comment has been minimized.
|
This logic is only used by the SipHash hasher. |
This comment has been minimized.
This comment has been minimized.
|
r=me on the code and perf front. But I don't feel qualified to judge the security/DDoS protection side of this. |
This comment has been minimized.
This comment has been minimized.
briansmith
commented
May 10, 2016
Others have quoted this, but again, this is what the SipHash paper says:
Thus, the easiest thing to do is what the SipHash authors recommend: Use Think about the threat model: We assume that the attacker can add or remove arbitrary (key, value) entries from any hash table used in the program. From this, it follows that we assume the attacker can change any hash table A into any other hash table B by removing all the items from A and then copying all the entries from B into A. Thus, it seems to not help if A or B have different keys, at least under this threat model. If you have a different threat model, it would be a good idea to document it. More generally, crypto people never generate a secret key by adding a constant value to another secret key. See https://en.wikipedia.org/wiki/Related-key_attack for an introduction to why. tl;dr: Knowing the difference of two secrets can help an attacker find the value of one (usually both) secrets, even if they wouldn't be able to find the values any other way. Because no crypto people would do this, it is unlikely that somebody will seriously study the problems that may or may not occur when somebody does what is proposed in this PR because we generally assume it is a-priori wrong to do. HTH. |
This comment has been minimized.
This comment has been minimized.
|
@briansmith Thanks for the comment! That does indeed help highlight some of the tradeoffs here. AIUI, the motivation for using these distinct (but related) keys is just to avoid clients of the default hashmap from accidentally assuming that all instances share a common key -- a behavior we could conceivably want to change in the future. But it could easily be that this cure is worse than the disease, and we'd be better off just very clearly documenting that you cannot rely on the apparent determinism. We just risk de facto lock-in to that behavior, but that seems (to me) better than taking a step that could easily end up revealing hashmap keys. @rust-lang/libs Thoughts here? |
alexcrichton
added
the
T-libs
label
May 18, 2016
This comment has been minimized.
This comment has been minimized.
|
Yeah I'm not too worried about switching to a per-process key with the risk of relying on a per-process deterministic iteration order. It's just a "nice to have" to make everything nondeterministic really I think. |
This comment has been minimized.
This comment has been minimized.
|
It seems to me that we could have this per-thread without the adjustment, but maybe having inter-thread differences isn't worth the slightly higher complexity vs. just being uniform through a process. |
This comment has been minimized.
This comment has been minimized.
|
Per-thread sounds like a good compromise across the board. @alexcrichton, want to update accordingly? |
alexcrichton
force-pushed the
alexcrichton:hashmap-seed
branch
from
d7503b2
to
eaeef3d
May 19, 2016
This comment has been minimized.
This comment has been minimized.
|
Sounds like a plan to me, I've updated the PR, the comment, and I also tweaked to use |
This comment has been minimized.
This comment has been minimized.
|
Thanks! @bors: r+ |
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton commentedMay 1, 2016
This is a rebase and extension of #31356 where we not only cache the keys in
thread local storage but we also bump each key every time a new
HashMapiscreated. This should give us a nice speed bost in creating hash maps along with
retaining the property that all maps have a nondeterministic iteration order.
Closes #27243