Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure the memory usage of HashMap better #6908

Open
jdm opened this issue Aug 3, 2015 · 9 comments
Open

Measure the memory usage of HashMap better #6908

jdm opened this issue Aug 3, 2015 · 9 comments
Labels

Comments

@jdm
Copy link
Member

@jdm jdm commented Aug 3, 2015

Currently we estimate it at capacity() * (size_of::<K>() + size_of::<V>()). I briefly looked into duplicating the internals in mem.rs, but it was looked really complicated.

@jdm jdm added the I-perf-bloat label Aug 3, 2015
@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Aug 3, 2015

Should we file an issue on Rust to add some API?

@Gankra
Copy link
Contributor

@Gankra Gankra commented Aug 3, 2015

As of today (and for the last ~year) it's basically (cap() * 11 / 10) * (size(K) + size(V) + size(u64)), but there's some minor loss due to alignment and integer arith.

Basically a HashMap is [hash, hash, hash, ..., key, key, key, ... value, value, value...] allocated as a single buffer via jemalloc.

But the reported capacity is real_cap * 10 / 11 because we don't let the table get totally full (~91% max load factor). The internal capacity is always a power of two, so some careful math can let you figure out the "true" inner capacity from the reported capacity if you really care (round to the nearest power of two should be sufficient).

In practice I wouldn't expect any meaningful wasteage from alignment/padding. At most it would basically be the wasteage in a (u64, K, V) tuple, I think.

@Gankra
Copy link
Contributor

@Gankra Gankra commented Aug 3, 2015

As an aside, @pczarn had a PR to change this to [hash, hash, hash,... (K,V), (K,V), (K,V), ...] which shuffled around some perf characteristics, but we didn't go forward with it.

@pczarn
Copy link

@pczarn pczarn commented Aug 4, 2015

Exactly as above, and (capacity() * 11 / 10).next_power_of_two() * (...) will correct off-by-one loss.

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Aug 8, 2017

This issue is now blocking https://bugzilla.mozilla.org/show_bug.cgi?id=1387958. Stylo has some significant HashMaps which I want to measure the size of. I can estimate the size, but (a) it might be inaccurate (i.e. it won't measure slop), and (b) DMD doesn't get told that the block has been reported.

It would be nice if HashMap had a function exposing the pointer to the storage. (That's assuming there is a single pointer; it's conceivable that the internal representation could use 2 or 3 separate chunks of storage.)

What I'll probably end up doing is cloning HashMap for Stylo. Stylo already has its own copy of Arc, and we've been considering cloning Vec and HashMap anyway to add fallible growth methods.

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Aug 9, 2017

Cloning (forking) HashMap turns out to be difficult, because it uses various unstable Rust features and we are using stable Rust only for Firefox.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Aug 9, 2017

We could consider an alternative hash map like https://crates.io/crates/ordermap. Adding memory tracking might be easier there.

@dralley
Copy link
Contributor

@dralley dralley commented Mar 24, 2020

Still applicable after HashBrown-based HashMaps?

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Mar 24, 2020

The change to HashBrown doesn't affect this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.