Amortize the cost of small tables #47

Amanieu · 2019-02-25T19:41:40Z

When growing a table, hashbrown will start with a table of size 2 (which can hold 1 element due to load factor) and then double every time the load factor is reached.

Table size	Capacity	Memory usage (`sizeof(T) = 4`)	Memory usage (`sizeof(T) = 8`)	Memory usage (`sizeof(T) = 16`)
(empty)	0	0	0	0
2	1	26 (32)	34 (40)	50 (56)
4	3	36 (40)	52 (56)	84 (88)
8	7	56	88	152
16	14	96	160	288
32	28	176	304	560

(rounded up to 8 for allocator alignment)

In comparison, the current std HashMap starts off empty but then immediately grows to a table size of 32 (capacity 29) as soon as the first element is added.

We want to try to balance memory usage for small tables with the allocation overhead when growing tables up to their final size.

The text was updated successfully, but these errors were encountered:

flip111 · 2019-02-25T19:52:42Z

Exponential strategy (doubling) works well to prevent reallocation. However when memory space is limited doubling might not be possible. Is it possible to provide (and change) a maximum growth size?

Amanieu · 2019-02-25T19:54:44Z

Unlike a vector, the size of the hash table must be a power of two since we use a bit mask on the hash to map it to a table index. There are different table designs which use non-power-of-two sizes but they are generally much slower.

ghost · 2019-02-25T20:41:13Z

One thing we can infer here is that doubling the table size of a small hash table does not really double its size in bytes. This is very much unlike Vec, where doubling the buffer always makes it exactly twice as large in bytes.

I think it would be more appropriate to skip some powers of two when growing small hash tables. Growing table size as 0 -> 4 -> 16 -> 32 -> 64 -> 128 -> ... seems like a fine strategy to me, for example.

Amanieu · 2019-02-25T20:49:52Z

The reason why the size doesn't exactly double is that there is a fixed cost of 16 bytes in each allocation. There are 16 + N control bytes and N element slots. So the size is calculated as 16 + N + sizeof(T) * N.

Amanieu · 2019-02-25T20:59:39Z

By comparison, here is what the std HashMap does:

Table size	Capacity	Memory usage (`sizeof(T) = 4`)	Memory usage (`sizeof(T) = 8`)	Memory usage (`sizeof(T) = 16`)
(empty)	0	0	0	0
32	29	384	512	768

nnethercote · 2020-05-15T09:19:12Z

I think jumping from 0 to 4 or even 8 would be fine.

workingjubilee · 2020-05-15T15:59:40Z

From my perspective as a mentor on Exercism, and my observations of new Rustaceans:

Hashmaps, probably due to having no literal syntax, seem to rarely be declared for just 2 values. Even people who come from langs which throw hashmaps everywhere will not use them as often because it requires the mental overhead of the use. When people want a tiny amount of k/v pairs, I often see them juggle some tuples or even declare tiny structs. HashMaps are favored when (from the logical perspective) you would have potentially Many and they want to do dynamic access.

So: yeah, 0 -> (4|8) seems fine.

Amanieu mentioned this issue Feb 25, 2019

Replace HashMap implementation with SwissTable (as an external crate) rust-lang/rust#58623

Merged

Amanieu mentioned this issue May 15, 2020

Tiny Vecs are dumb. rust-lang/rust#72227

Merged

Amanieu mentioned this issue Jun 17, 2020

Avoid creating small tables with a capacity of 1 #162

Merged

Amanieu closed this as completed in #162 Jun 17, 2020

Amanieu mentioned this issue Jun 17, 2024

perf: increase min buckets on very small types #524

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amortize the cost of small tables #47

Amortize the cost of small tables #47

Amanieu commented Feb 25, 2019 •

edited

Loading

flip111 commented Feb 25, 2019

Amanieu commented Feb 25, 2019

ghost commented Feb 25, 2019 •

edited by ghost

Loading

Amanieu commented Feb 25, 2019

Amanieu commented Feb 25, 2019

nnethercote commented May 15, 2020

workingjubilee commented May 15, 2020

Amortize the cost of small tables #47

Amortize the cost of small tables #47

Comments

Amanieu commented Feb 25, 2019 • edited Loading

flip111 commented Feb 25, 2019

Amanieu commented Feb 25, 2019

ghost commented Feb 25, 2019 • edited by ghost Loading

Amanieu commented Feb 25, 2019

Amanieu commented Feb 25, 2019

nnethercote commented May 15, 2020

workingjubilee commented May 15, 2020

Amanieu commented Feb 25, 2019 •

edited

Loading

ghost commented Feb 25, 2019 •

edited by ghost

Loading