Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upReplace HashMap implementation with SwissTable #56241
Conversation
rust-highfive
assigned
sfackler
Nov 26, 2018
This comment has been minimized.
This comment has been minimized.
|
r? @sfackler (rust_highfive has picked a reviewer for you, use r? to override) |
rust-highfive
added
the
S-waiting-on-review
label
Nov 26, 2018
This comment has been minimized.
This comment has been minimized.
|
@bors try cc @rust-lang/infra, this should get a perf run as soon as the try build finishes (it's needed for the second day of impl days at rustfest rome) |
bors
added a commit
that referenced
this pull request
Nov 26, 2018
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment has been minimized.
This comment has been minimized.
|
@bors try delegate+ |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Nov 26, 2018
This comment was marked as outdated.
This comment was marked as outdated.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment was marked as outdated.
This comment was marked as outdated.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment has been minimized.
This comment has been minimized.
|
|
bors
added
S-waiting-on-author
and removed
S-waiting-on-review
labels
Nov 26, 2018
This comment was marked as outdated.
This comment was marked as outdated.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Amanieu
referenced this pull request
Nov 26, 2018
Open
compare compiler performance with `hashbrown` #55514
This comment has been minimized.
This comment has been minimized.
|
It seems that the new |
This comment has been minimized.
This comment has been minimized.
|
@Amanieu The workaround is to remove |
This comment has been minimized.
This comment has been minimized.
|
Oh, wait a second, this PR doesn't actually modify |
This comment has been minimized.
This comment has been minimized.
|
Did the previous |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov There are no ordering guarantees in I am currently trying to narrow down the exact hash map that is causing the issue. I can confirm that switching only Note that it is also possible that there is a bug in the new |
This comment has been minimized.
This comment has been minimized.
Fix: order of elements with a hash collision, not the general order.
Yes, it's better to bisect all uses of |
gnzlbg
reviewed
Nov 27, 2018
gnzlbg
reviewed
Nov 27, 2018
gnzlbg
reviewed
Nov 27, 2018
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
(my review is currently on hold as i deal with some medical issues, sorry) |
Amanieu
referenced this pull request
Dec 24, 2018
Closed
Drop impl for RawTable should be #[may_dangle] #39
mati865
referenced this pull request
Dec 29, 2018
Closed
internal lint: forbid HashMap and HashSet and suggest the FxHash* variants #2853
This comment has been minimized.
This comment has been minimized.
stokhos
commented
Jan 14, 2019
|
Ping from triage @sfackler have you had time to review this pr? |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton suggested that instead of copying hashbrown into libstd, we could have libstd import hashbrown directly as an |
This comment has been minimized.
This comment has been minimized.
|
Ping from triage @Amanieu / @alexcrichton: What are the plans for this PR? |
This comment has been minimized.
This comment has been minimized.
|
(still on medical leave) i do not think we should extern hashbrown without the review we would require to merge this code |
This comment has been minimized.
This comment has been minimized.
|
I would prefer to:
@Gankro get well soon and don't stress yourself about the review ! |
This comment has been minimized.
This comment has been minimized.
|
I'm going to leave the PR as it is until the review is finished. I will be integrating any feedback back into the hashbrown repo. |
This comment has been minimized.
This comment has been minimized.
panaman67
commented
Jan 24, 2019
|
Really excited for this commit. Great work! |
TimNN
added
S-blocked
and removed
S-waiting-on-author
labels
Jan 29, 2019
This comment has been minimized.
This comment has been minimized.
Thanks for the update!
Couldn't agree more, get better soon! I'm marking this as blocked for the moment, so it doesn't show up during the regular PR triage. |
alkis
reviewed
Feb 4, 2019
| } | ||
| fn make_hash<K: Hash + ?Sized>(hash_builder: &impl BuildHasher, val: &K) -> u64 { | ||
| let mut state = hash_builder.build_hasher(); | ||
| val.hash(&mut state); |
This comment has been minimized.
This comment has been minimized.
alkis
Feb 4, 2019
Contributor
One potential mitigation to the quadradic insertion bug is to always hash the pointer to the buckets as well as the key. This will completely eliminate the quadradic behavior on the merge_dos benchmark but it will slow down resizing the buckets array because it will effectively reshuffle all elements on every resize and this will have worse cache performance than the current approach.
This comment has been minimized.
This comment has been minimized.
Amanieu
Feb 5, 2019
Author
Contributor
I considered that but I don't feel comfortable mixing address bits into the hash key. I think that this could allow an attacker to defeat address randomization by inferring address bits from looking at the iteration order of a hash table.
| // Gotta resize now. | ||
| // Ideally we would put this in VacantEntry::insert, but Entry is not | ||
| // generic over the BuildHasher and adding a generic parameter would be | ||
| // a breaking change. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Amanieu
Feb 4, 2019
Author
Contributor
Because the reserve could invalidate the result of the lookup if it rehashes the table.
This comment has been minimized.
This comment has been minimized.
alkis
Feb 4, 2019
•
Contributor
It is ok if it does. In most cases there is no need to rehash. If a rehash happens, the cost of an additional probe round is insignificant. Right now this reserve(1) is breaking the API guarantees. reserve(N) followed by N entry(K).insert(V) will cause a rehash even though it is specified that it will not.
Edit: also the way this is coded you can defer reserve(1) until the entry is inserted, right?
This comment has been minimized.
This comment has been minimized.
Amanieu
Feb 5, 2019
Author
Contributor
Sorry, you were right, this reserve call should indeed be in the else block.
We can't defer the reserve call until the entry is inserted because Entry would need to hold a reference to the hasher, which is not possible since in libstd Entry is Entry<K, V> instead of Entry<K, V, S> and changing that would be a breaking change.
| key: K, | ||
| elem: VacantEntryState<K, V, &'a mut RawTable<K, V>>, | ||
| table: &'a mut RawTable<(K, V)>, |
This comment has been minimized.
This comment has been minimized.
alkis
Feb 4, 2019
Contributor
To get the VacantEntry we already searched the table and we know exactly in which Bucket we will insert. I think this struct needs to hold a Bucket otherwise insert() will need to probe again.
This comment has been minimized.
This comment has been minimized.
Amanieu
Feb 4, 2019
Author
Contributor
There are different constraints when searching for an element vs searching for an insertion slot. In particular, we ignore (skip over) deleted buckets in the former case while in the latter case we can insert on top of a deleted bucket.
This comment has been minimized.
This comment has been minimized.
alkis
Feb 4, 2019
Contributor
Apologies for the confusing comment. I think the Entry API allows two implementations here. On return:
VacantEntryalready points to the actual bucket andinsert()is trivial.VacantEntrycaches the key and hash andinsert()does the actual insert/grow dance.
In either case the optimal way to implement find_or_insert semantics needed by entry() -> insert() is:
- find probe, if found return it (
insert()won't be called) - insert probe, if bucket is deleted or if
growth_left > 0return it (this probe costs very little because we are touching at most the same cachelines we touched in the previous probe - a deleted bucket can only make us return earlier) - rehash and do another insert probe to find where to insert it (the cost of an additional probe here is insignificant - we already did N probes to rehash the container and copied N key, value tuples)
| // This may panic. | ||
| let hash = hasher(item.as_ref()); | ||
|
|
||
| // We can use a simpler version of insert() here since there are no |
This comment has been minimized.
This comment has been minimized.
alkis
Feb 4, 2019
Contributor
The reason find_insert_slot() can be used is because 1) we know there is enough space for all entries and 2) we know all items are unique.
Amanieu
referenced this pull request
Feb 21, 2019
Open
Replace HashMap implementation with SwissTable (as an external crate) #58623
This comment has been minimized.
This comment has been minimized.
|
I opened a new PR (#58623) which makes libstd depend on hashbrown from crates.io. |
bors
added a commit
that referenced
this pull request
Feb 25, 2019
steveklabnik
referenced this pull request
Mar 1, 2019
Open
Exposure of HashMap iteration order allows for O(n²) blowup. #36481
Gankro
requested changes
Mar 4, 2019
|
I am back from leave and am resuming the review. This batch represents my review of RawTable, which contains a few serious correctness concerns. |
| #[inline] | ||
| pub unsafe fn drop(&self) { | ||
| self.ptr.as_ptr().drop_in_place(); | ||
| } |
This comment has been minimized.
This comment has been minimized.
Gankro
Mar 4, 2019
Contributor
Unless you need some kind of interior mutability, you should make the mutating methods take &mut self just so &SomethingContainingABucket can't accidentally do interior mutability things. Also just makes it clear to the reader that there isn't anything weird going on here.
| } | ||
|
|
||
| // Branch prediction hint. This is currently only available on nightly but it | ||
| // consistently improves performance by 10-15%. |
This comment has been minimized.
This comment has been minimized.
Amanieu
added a commit
to Amanieu/hashbrown
that referenced
this pull request
Mar 5, 2019
This comment has been minimized.
This comment has been minimized.
|
I have addressed most of the feedback in this commit: Amanieu/hashbrown@5d1d4e9 Note that I'm not updating this PR any more, all future development towards integrating hashbrown into libstd is in these two PRs: |
This comment has been minimized.
This comment has been minimized.
|
Since this has gotten pretty out-of-sync with the actual code we should probably drop this review, and I'll just do an "offline" review of map.rs (since I expect it to be pretty straightforward, esp. since you moved most logic into the raw_table). Also I'm not sure how important handling size_of<(K, V)>() == 0 particularly gracefully can matter? There's no way to feed in state for hashing to be anything other than random, right? Like sure we shouldn't do UB, but otherwise I think we can spin our wheels or maybe even just panic? |
This comment has been minimized.
This comment has been minimized.
|
The handling of ZSTs should be fixed now (see my changes to I'm going to close this PR, any further reviews should be on one of the two PRs linked above. |
Amanieu commentedNov 26, 2018
•
edited
The implementation is from the hashbrown crate.
This is mostly complete, however it is missing 2 features:
try_reserve-- DONEcc @pietroalbini @Gankro