Add benchmarks from dashmap and hashbrown #43

domenicquirl · 2020-01-29T18:31:57Z

This addresses #8. The added files have all the original benchmarks from dashmap and hashbrown to compare against (this includes several other hashmap implementations), as well as implementations of the dashmap- and hashbrown-specific tests for flurry. I also tried to add an implementation of the dashmap tests for flurry that doesn't pin() the epoch for every operation (the original implementation uses rayon's par_iter().for_each(), which requires the executed closure to be Send + Sync), however this requires setting up the threads differently and I'm not sure how much overhead this introduces. So have this in mind when using this comparison.

The main reason this is a draft is that I wanted to make sure about the licenses. MIT license, as far as I understand, requires license and copyright notices to be present in derived works. Am I correct in that this is required? If so, where exactly do we add them?

jonhoo · 2020-01-29T22:18:38Z

For licenses, I think the right thing to do is to include the license file from the repositories you copied from into benches/ for example, with names like dashmap.LICENSE.

It would also be good to include a short comment at the top of each copied file saying where it was copied from.

Separately, I don't know that we need to pull in the benchmarks for other maps into here. benches/chashmap for example. I think the trick is going to be to just have benches/dashmap_flurry.rs, which is the dashmap benchmark implemented for flurry. When we eventually implement our own benchmark suits (like #30), then we will also implement that for other implementations, but I don't think there's an upside to copy-pasting existing benchmarks into here. Maybe just point users to them in the README and include instructions on how to run them there?

I would also like us to have separate benchmark files for the flurry version of different benchmarks. I'm imagining benches/hashbrown_like.rs and benches/dashmap_like.rs.

domenicquirl · 2020-01-29T22:33:54Z

Wouldn't we want to compare our results against the other hashmaps? Of course one can run their benchmarks separately, I just thought it would be nice to have them available without needing separate projects. For comparing performance across releases, one could still execute only the flurry benchmarks. Or do you want this comparison only for such tests as #30? Since the execution is exactly the same between original (dashmap/hashbrown) and flurry benchmark code, it felt natural to at least include these two as a reference.

Regarding the license, I will update the branch accordingly. I can also separate flurry.rs into two files 👍

jonhoo · 2020-01-29T22:45:38Z

We do want to compare against other hashmaps, and while it is convenient to copy the benchmarks from there to here, I don't think it's worth it. Better to avoid keeping multiple copies, and instead just refer people to where the master copy of the benchmarks for the other backends live.

benches/flurry_hashbrown.rs

benches/flurry_dashmap.rs

benches/README.md

jonhoo · 2020-01-30T20:33:09Z

Have you tried actually running the benchmarks? I'm curious what kind of numbers you get and how they compare?

domenicquirl · 2020-01-30T21:06:33Z

They seem... slow? Inserting in particular is way slower than the other maps. Basically...

against hashbrown, insert and insert_erase run in the order of on ms, compared to 10-20 us in hashbrown. get holds up reasonably well though, still ~20 us to ~5-6, but given the added concurrency this seems much better than insert. Both of these use one guard across all operations of one benchmark run and relatively consistent across input distributions.
against dashmap, dashmap inserts in something between 1 ms with 1 thread and .8 ms for multiple threads, however what's probably more important is throughput. This for some reason is not in the report produced by criterion, but was on the order of 100 MElements/s. We compare with about 25-12 ms depending on thread count, which is so slow it takes several minutes to run any benchmark and puts throughput at something like 5, maybe 10 MElements/s. This is using 1 guard per map operation, which admittedly is not what you would do in a scenario such as the benchmark.
against dashmap, but with only one guard per thread (the different setup), we insert in 14-11 ms, which is better, but still considerably slower than dashmap. Note that this does not compare directly due to the different setup of the threads.
get against dashmap seems really good! dashmap gets 700-500 us, while we get 1.5 ms on one thread, 800us on 2 threads and go down to below 300 us on 8 threads, even with pin() for every call to get. With only one guard, this improves to 800 to a bit more than 200 us, depending on thread count.

So yeah, overall get is promising but insert is much slower against both references :/
Other maps were similar and generally beat us in throughput, with get being close and insert... not being close.

domenicquirl · 2020-01-30T21:14:28Z

It's worth noting that I ran this on my personal machine while doing other tasks, so this is not the cleanest setup. I don't think that this would be responsible for differences of this magnitude.

Is there a Rust profiling tool which could tell which part of insert takes so much time? That would be an interesting question.

jonhoo · 2020-01-30T21:19:22Z

Oh, that is very interesting indeed! Let's land this and then do some profiling.

For profiling, my recommendation is to add:

[profile.release]
debug = true
[profile.bench]
debug = true

to Cargo.toml, start benchmark, and then use perf record -g --graph dwarf -p $(pgrep -af target/release) to record some profiling information. Then feed the results into a flamegraph and see what sticks out at you!

domenicquirl · 2020-01-30T21:57:43Z

I might have to defer this to someone else. I'm on a Windows machine at the moment and will be a little busy the next few weeks anyways. Perhaps it's better to open an issue for this, so someone can really take a good look at this?

jonhoo · 2020-01-30T22:33:47Z

Yes, that makes sense! Want to open an issue and include your preliminary results?

domenicquirl added 4 commits January 28, 2020 00:34

add dashmap benchmarks

2b80a6c

Merge remote-tracking branch 'upstream/master'

f47709d

add flurry benchmarks from hashbrown

732d29b

add hashbrown benchmarks

999f1d6

domenicquirl added 3 commits January 30, 2020 17:38

remove benchmarks of other maps, separate flurry benches

97ee7af

add readme for benchmarks

426a3dd

Merge remote-tracking branch 'upstream/master'

bc927d7

jonhoo reviewed Jan 30, 2020

View reviewed changes

benches/flurry_hashbrown.rs Outdated Show resolved Hide resolved

jonhoo reviewed Jan 30, 2020

View reviewed changes

benches/flurry_dashmap.rs Outdated Show resolved Hide resolved

jonhoo reviewed Jan 30, 2020

View reviewed changes

benches/README.md Outdated Show resolved Hide resolved

formatting pass

bef6a41

domenicquirl marked this pull request as ready for review January 30, 2020 16:56

jonhoo reviewed Jan 30, 2020

View reviewed changes

benches/README.md Show resolved Hide resolved

add instructions for benchmark usage

3393606

jonhoo approved these changes Jan 30, 2020

View reviewed changes

jonhoo merged commit 208cfcf into jonhoo:master Jan 30, 2020

This was referenced Jan 30, 2020

Add benchmarks #8

Closed

Profile and improve map performance #50

Open

Use clear() in benchmarks #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks from dashmap and hashbrown #43

Add benchmarks from dashmap and hashbrown #43

domenicquirl commented Jan 29, 2020 •

edited

Loading

jonhoo commented Jan 29, 2020 •

edited

Loading

domenicquirl commented Jan 29, 2020

jonhoo commented Jan 29, 2020

jonhoo commented Jan 30, 2020

domenicquirl commented Jan 30, 2020 •

edited

Loading

domenicquirl commented Jan 30, 2020 •

edited

Loading

jonhoo commented Jan 30, 2020

domenicquirl commented Jan 30, 2020

jonhoo commented Jan 30, 2020

Add benchmarks from dashmap and hashbrown #43

Add benchmarks from dashmap and hashbrown #43

Conversation

domenicquirl commented Jan 29, 2020 • edited Loading

jonhoo commented Jan 29, 2020 • edited Loading

domenicquirl commented Jan 29, 2020

jonhoo commented Jan 29, 2020

jonhoo commented Jan 30, 2020

domenicquirl commented Jan 30, 2020 • edited Loading

domenicquirl commented Jan 30, 2020 • edited Loading

jonhoo commented Jan 30, 2020

domenicquirl commented Jan 30, 2020

jonhoo commented Jan 30, 2020

domenicquirl commented Jan 29, 2020 •

edited

Loading

jonhoo commented Jan 29, 2020 •

edited

Loading

domenicquirl commented Jan 30, 2020 •

edited

Loading

domenicquirl commented Jan 30, 2020 •

edited

Loading