Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed HashMap's internal layout. Cleanup. #21973

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
10 participants
@pczarn
Copy link
Contributor

pczarn commented Feb 5, 2015

Changes HashMap's memory layout from [hhhh...KKKK...VVVV...] to [KVKVKVKV...hhhh...]. This makes in-place growth easy to implement (and efficient).

The removal of find_with_or_insert_with has made more cleanup possible.

20 benchmark runs, averaged:

                           before         after
bench::find_existing       40573.50 ns    41227.65 ns
bench::find_nonexisting    41815.45 ns    42362.60 ns
bench::get_remove_insert     197.85 ns      198.60 ns
bench::grow_by_insertion     171.05 ns      154.05 ns
bench::hashmap_as_queue      112.85 ns      112.65 ns
bench::new_drop               79.40 ns       79.20 ns
bench::new_insert_drop       179.40 ns      149.05 ns

thanks to @Gankro for the Entry interface, and to @thestinger for improving jemalloc's in-place realloc!
cc @cgaebel
r? @Gankro

@pczarn pczarn force-pushed the pczarn:hash_map-mem-layout branch from dc6bc2d to 64e8b91 Feb 5, 2015

// middle of a cache line, this strategy pulls in one cache line of hashes on
// most lookups (64-byte cache line with 8-byte hash). I think this choice is
// pretty good, but α could go up to 0.95, or down to 0.84 to trade off some
// space.
//
// > Wait, what? Where did you get 1-α^k from?

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

This should be updated, too.

@@ -126,23 +150,10 @@ fn test_resize_policy() {
// α^3, etc. Therefore, the odds of colliding k times is α^k. The odds of NOT
// colliding after k tries is 1-α^k.
//
// The paper from 1986 cited below mentions an implementation which keeps track

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

Why was this removed?

@@ -220,11 +231,12 @@ fn test_resize_policy() {
///
/// Relevant papers/articles:
///
/// 1. Pedro Celis. ["Robin Hood Hashing"](https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf)

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

Why remove this citation?

This comment has been minimized.

@pczarn

pczarn Feb 5, 2015

Author Contributor

To avoid confusion. It doesn't seem relevant to this particular implementation

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

I think it's relevant. It's the seminal paper on robin hood hashing, no?

@pczarn pczarn force-pushed the pczarn:hash_map-mem-layout branch from 64e8b91 to 407572c Feb 5, 2015

let size = table.size();
let mut probe = Bucket::new(table, hash);
let mut probe = if let Some(probe) = Bucket::new(table, hash) {

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

s/if let/match

if is_match(full.read().0) {
return FoundExisting(full);
if is_match(bucket.read().1) {
return InternalEntry::Occupied(OccupiedEntryState {

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

I don't know what the changes in this function have to do with in-place hashmap growth.


loop {
let (old_hash, old_key, old_val) = bucket.replace(hash, k, v);
let (old_hash, old_key, old_val) = {

This comment has been minimized.

@cgaebel

cgaebel Feb 5, 2015

Contributor

What's wrong with what was there before?

This comment has been minimized.

@pczarn

pczarn Feb 5, 2015

Author Contributor

I tried to get rid of non-essential methods in the table module. Besides, replace was used only from here.

@cgaebel

This comment has been minimized.

Copy link
Contributor

cgaebel commented Feb 5, 2015

Can this PR be split up into "In-place growth for HashMap" and "Cleanup"?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 5, 2015

Just leaving a note: I am concerned about how this change will negatively affect memory consumption for certain choices of K and V.

That said, I think speed is more important than memory consumption, to some limit.

@cgaebel

This comment has been minimized.

Copy link
Contributor

cgaebel commented Feb 5, 2015

It will also affect the performance of the "keys" and "values" iterators.

@pczarn pczarn force-pushed the pczarn:hash_map-mem-layout branch from 49bfe41 to f495fb8 Feb 7, 2015

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Feb 12, 2015

Don't the posted benchmark results indicate that insertion (grow_by_insertion and new_insert_drop) have become faster at the expense of making lookup (find_existing, find_nonexisting) slower?

(This outcome makes some sense to me, at least for a heavily loaded table, since the unused interleaved values for non-matching keys are going to occupy portions of the cache line when we are doing a probe sequence over a series of keys.)

  • (( Well, maybe this explanation is a little too simplistic; the sequence of hhhh ... at the end i guess indicates that in the common case, we should need to only look at a series of hashcodes before we start inspecting the keys themselves, so its not quite as dire as the above explanation made it out to be. ))

I don't know how to evaluate whether the gain is worth the cost here. I just want to make sure that everyone is on board for this shift (or find out if my interpretation of the results is wrong).

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 12, 2015

Just a note that I believe @pczarn is currently reworking this PR.

@pczarn pczarn force-pushed the pczarn:hash_map-mem-layout branch from f495fb8 to 6218462 Feb 16, 2015

@pczarn

This comment has been minimized.

Copy link
Contributor Author

pczarn commented Feb 16, 2015

I did the reworking. Some small details still need attention.

It will also affect the performance of the "keys" and "values" iterators.

Sounds bad. Can you find a real world example of this? To keep the performance the same, I could make hashmap use two allocations, [VVVV…hhhh…] and [KKKK…].

Here's a relevant post on data layout: http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/
Unfortunately, it's of little use for us, because it's not about Robin Hood (and for C++, not Rust).

This is how benchmark results have changed:

[40.573, 41.706, 209.1, 179.4, 122.8, 78.4, 182.7]
  Cleanup.
[41.989, 43.615, 212.4, 167.8, 125.0, 78.7, 163.3]
  Changed HashMap's internal layout.
[41.145, 42.298, 206.8, 162.5, 121.7, 78.9, 159.3]
  In-place growth for HashMap.
[41.055, 42.427, 208.3, 156.0, 124.2, 78.4, 161.6]

@pnkfelix: Lookup has become slower after the first commit because of refactoring. Keep in mind that the benchmark does 1000 lookups per iteration and the difference between 41.7ns and 42.4ns per lookup is small.

The improvement from in-place growth is suprisingly low. I'll have to check why.

@cgaebel

This comment has been minimized.

Copy link
Contributor

cgaebel commented Feb 16, 2015

@pczarn The performance of key/value iterators is just because with the current design they walk over a compact array, and with your proposed design they eat twice as much cache as they do this. If doing a small per-key or per-value operation, this essentially halves memory bandwidth.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 16, 2015

Well strictly speaking it's already walking over the hashes checking for hash != 0 at the same time.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 21, 2015

Oh shoot, I let this slip through the cracks. Needs a rebase (hopefully the last one, since we should be done with crazy API churn).

@pczarn pczarn force-pushed the pczarn:hash_map-mem-layout branch from 6218462 to 316b300 Feb 21, 2015

@pczarn

This comment has been minimized.

Copy link
Contributor Author

pczarn commented Feb 21, 2015

Updated. I'm going to test and make corrections when new snapshots land.

So iteration over small keys/values is already 2x-4x more cache-intensive than in an array. With larger values, like in HashMap<usize, (String, String)>, it gets much worse.

To avoid the issue, keys/values can be stored in an array such as [([K; 16], [V; 16]); n].

@pczarn

This comment has been minimized.

Copy link
Contributor Author

pczarn commented Feb 27, 2015

Done, tested.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 27, 2015

Great! Will review tonight.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Feb 27, 2015

Ack, a few of my issues are addressed by later commits. Doing this commit-by-commit isn't the right strategy here. Shifting gears.

// inform rustc that in fact instances of K and V are reachable from here.
marker: marker::PhantomData<(K,V)>,
// NB. The table will probably need manual impls of Send and Sync if this
// field ever changes.

This comment has been minimized.

@Gankro

Gankro Feb 27, 2015

Contributor

I don't think that needs saying? That's true for pretty much all Uniques.

@@ -69,50 +67,42 @@ const EMPTY_BUCKET: u64 = 0u64;
pub struct RawTable<K, V> {
capacity: usize,
size: usize,
hashes: Unique<u64>,

This comment has been minimized.

@Gankro

Gankro Feb 27, 2015

Contributor

interest piqued

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 22, 2015

☔️ The latest upstream changes (presumably #24674) made this pull request unmergeable. Please resolve the merge conflicts.

Cleanup. Changed HashMap's internal layout.
* use of NonZero hashes
* refactoring
* correct explanation of the load factor
* better nomenclature
* 'probe distance' -> 'displacement'

@pczarn pczarn force-pushed the pczarn:hash_map-mem-layout branch from 3f67ec1 to 776d23e Apr 29, 2015

@Manishearth

This comment has been minimized.

Copy link
Member

Manishearth commented May 10, 2015

@Gankro (this was rebased and needs review)

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 10, 2015

@Manishearth Yes I tried to get someone else to review over two months ago. :/

Today's that last day of my pseudo-vacation so I'm free to tackle this again this week.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 12, 2015

Alright I've started just reading the full source at the last commit's hash just because the changes have been so comprehensive that it borders on a rewrite.

Was Deref vs Borrow ever fully addressed? Borrow is generally just a trait for Deref-like+Hash/Eq/Ord equivalence.

pub fn into_bucket(self) -> Bucket<K, V, M> {
/// Duplicates the current position. This can be useful for operations
/// on two or more buckets.
pub fn stash(self) -> Bucket<K, V, Bucket<K, V, M, S>, S> {

This comment has been minimized.

@Gankro

Gankro May 12, 2015

Contributor

What about -> Bucket<K, V, Self, S>?

}
/// Pointer to one-past-the-last key-value pair.
pub fn as_mut_ptr(&self) -> *mut (K, V) {
unsafe { self.middle.get() as *const _ as *mut _ }

This comment has been minimized.

@Gankro

Gankro May 15, 2015

Contributor

Is the as *const _ necessary?

fn checked_size_generic<K, V>(capacity: usize) -> usize {
let size = size_generic::<K, V>(capacity);
let elem_size = size_of::<(K, V)>() + size_of::<SafeHash>();
assert!(size >= capacity.checked_mul(elem_size).expect("capacity overflow"),

This comment has been minimized.

@Gankro

Gankro May 15, 2015

Contributor

I'm confused why this assert instead of just constructing size using checked ops?


#[inline]
fn align<K, V>() -> usize {
cmp::max(mem::min_align_of::<(K, V)>(), mem::min_align_of::<u64>())

This comment has been minimized.

@Gankro

Gankro May 15, 2015

Contributor

u64 -> SafeHash?

}

/// A newtyped RawBucket. Not copyable.
pub struct RawFullBucket<K, V, M>(RawBucket<K, V>, PhantomData<M>);

This comment has been minimized.

@Gankro

Gankro May 15, 2015

Contributor

So many new types @_@

let mut m = HashMap::new();

for i in range_inclusive(1, 1000) {
m.insert(i, (String::new(), String::new()));

This comment has been minimized.

@Gankro

Gankro May 18, 2015

Contributor

PR #999999: Optimize HashMap by reducing size of String

impl<K, V, M> InternalEntry<K, V, M> {
fn into_option(self) -> Option<FullBucket<K, V, M>> {
match self {
InternalEntry::Occupied(bucket) => Some(bucket.elem),

This comment has been minimized.

@Gankro

Gankro May 18, 2015

Contributor

Oh lord, a bucket's elem is a bucket?

}

// If the hash doesn't match, it can't be this one..
if hash == full.hash() {
if hash == *bucket.read().0 {

This comment has been minimized.

@Gankro

Gankro May 18, 2015

Contributor

full.hash() seemed like a lot clearer of an API...

TableRef(_) => None
}
// Performs insertion with relaxed requirements.
// The caller should ensure that invariants of Robin Hood linear probing hold.

This comment has been minimized.

@Gankro

Gankro May 18, 2015

Contributor

Is this the only requirement that's relaxed? Unclear.

let mut buckets = Bucket::new(table, *hash as usize).unwrap();
let ib = buckets.index();

while buckets.index() != ib + cap {

This comment has been minimized.

@Gankro

Gankro May 18, 2015

Contributor

This is equivalent to buckets.displacement() != cap, right?

This comment has been minimized.

@Gankro

Gankro May 19, 2015

Contributor

Scratch that, obviously not available on a "pure" bucket.

@@ -596,6 +590,8 @@ impl<K, V, S> HashMap<K, V, S>
}

/// Returns the number of elements the map can hold without reallocating.
/// This value may be lower than the real number of elements the map will
/// hold before reallocating.

This comment has been minimized.

@Gankro

Gankro May 18, 2015

Contributor

👍

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 19, 2015

So, to the best of my knowledge this code seems to be correct, but I have strong reservations about where we're going with respect to the proliferation of type-complexity. I used to be able to grok this code pretty ok: We have HashMap, RawTable, and some Buckets. Now everything's super generic over some "M" type and there's Partial* types all over. It's not clear that these abstractions are pulling their weight: are they preventing real bugs or enabling simpler or more maintainable code?

CC @nikomatsakis @huonw @cgaebel

@cgaebel

This comment has been minimized.

Copy link
Contributor

cgaebel commented May 19, 2015

I share that sentiment.

@alexcrichton alexcrichton added the T-libs label May 26, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 28, 2015

It's been a couple weeks with no response on any of my comments, and I'm not a huge fan of the general design changes. As such I'm closing this for now. We can continue discussion on the PR and maybe re-open if we get somewhere.

@Gankro Gankro closed this May 28, 2015

jonathandturner added a commit to jonathandturner/rust that referenced this pull request Oct 11, 2016

Rollup merge of rust-lang#36692 - arthurprs:hashmap-layout, r=alexcri…
…chton

Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at rust-lang#21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```

bors added a commit that referenced this pull request Oct 12, 2016

Auto merge of #36692 - arthurprs:hashmap-layout, r=alexcrichton
Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```

bors added a commit that referenced this pull request Oct 13, 2016

Auto merge of #36692 - arthurprs:hashmap-layout, r=alexcrichton
Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```

bors added a commit that referenced this pull request Oct 14, 2016

Auto merge of #36692 - arthurprs:hashmap-layout, r=alexcrichton
Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```

bors added a commit that referenced this pull request Oct 14, 2016

Auto merge of #36692 - arthurprs:hashmap-layout, r=alexcrichton
Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.