Deduplicate `do_get_with_hash` #204

Swatinem · 2022-11-19T18:47:54Z

Due to the generic read_recorder closure, the do_get_with_hash function was generated twice in cargo-llvm-lines for a simple test.

This is the first improvement to #203 I have found

Due to the generic `read_recorder` closure, the `do_get_with_hash` function was generated twice in `cargo-llvm-lines` for a simple test.

tatsuya6502 · 2022-11-20T03:34:02Z

Thank you for tackling to this.

I actually wrote the original code by purpose for performance; I was hoping rustc is smart enough to do some extra optimization for the duplicate function when it is given a closure that does nothing:

// Define a closure that skips to record a read op.
let record = |_op, _now| {};

Maybe I was right. I ran benchmark and found the change in this PR slowed down the benchmark by 1.23% on my x86_64 Linux environment. (The number on the right bottom)

Environment
- CPU: Intel Core i7-12700F
- OS: WSL2 Ubuntu 22.04 on Windows 11
- Rust: 1.65.0 (897e37553 2022-11-02)

But I believe what you said on #203 is also true:

This is suboptimal for a couple of reasons:

it generates multiple copied of functions that have to be compiled, hurting compile times.

may lead to binary bloat

may even hurt performance as the instruction cache is not used optimally

So maybe this is just one result and I may get opposite result with different benchmark on different environment?

tatsuya6502 · 2022-11-20T03:47:34Z

Here is how I ran the benchmark:

$ git clone https://github.com/moka-rs/mokabench.git
$ cd mokabench

## Checkout the current head of `mini-moka` branch.
## (It will be merged into `master` soon)
$ git checkout acd6cde157c5f6e633a83d5a9514fb7bbc7b8617

## Follow [the README](https://github.com/moka-rs/mokabench/#download-the-trace-datasets)
## and place the ARC S3 trace dataset into `./datasets` directory.

## Apply the patch attached at the bottom of this comment.
$ git apply mokabench.patch

## Check what Moka version is used:
$ cargo tree -i moka

## Run the benchmark:
$ cargo run --release -- --insert-once

## Edit `Cargo.toml` to use the code for this PR, and run the benchmark again.

mokabench.patch

diff --git a/Cargo.toml b/Cargo.toml
index ecf3266..db73ffd 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -44,6 +44,8 @@ optional = true
 # version = "0.10.0"
 git = "https://github.com/moka-rs/moka"
 branch = "master"
+# git = "https://github.com/Swatinem/moka"
+# branch = "do_get_with_hash"
 features = ["future"]
 
 [dependencies.moka09]
diff --git a/src/main.rs b/src/main.rs
index 2ec5981..ac4aae8 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -29,7 +29,8 @@ async fn main() -> anyhow::Result<()> {
 }
 
 async fn run_with_capacity(config: &Config, capacity: usize) -> anyhow::Result<()> {
-    const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[16, 24, 32, 40, 48];
+    // const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[16, 24, 32, 40, 48];
+    const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[4, 8, 16, 32];
 
     let num_clients_slice: &[u16] = if let Some(n) = &config.num_clients {
         &n
diff --git a/src/trace_file.rs b/src/trace_file.rs
index 638eefe..61f5fe7 100644
--- a/src/trace_file.rs
+++ b/src/trace_file.rs
@@ -61,7 +61,8 @@ impl TraceFile {
 
     pub fn default_capacities(&self) -> &[usize] {
         match self {
-            Self::S3 => &[100_000, 400_000, 800_000],
+            // Self::S3 => &[100_000, 400_000, 800_000],
+            Self::S3 => &[100_000, 800_000],
             Self::Ds1 => &[1_000_000, 4_000_000, 8_000_000],
             Self::Oltp => &[256, 512, 1_000, 2_000],
             Self::Loop => &[256, 512, 768, 1024],

tatsuya6502 · 2022-11-20T04:05:12Z

I wonder what the performance would be if we take your change, but change it further to avoid TrioArc::clone here when record_read is false:

moka/src/sync_base/base_cache.rs

Line 297 in bf5b57f

Some((maybe_key, TrioArc::clone(entry), now))

Maybe one way to achieve this is to change the type of the variable maybe_entry

Option<(Option<Arc<K>>, TrioArc<ValueEntry<K, V>>)>

to something like the followings?

Option<(Option<Arc<K>>, V, Option<TrioArc<ValueEntry<K, V>>>)>

(I have to go now. I will try it later today or tomorrow. timezone:UTC+0800)

Swatinem · 2022-11-20T08:56:59Z

Wow, that is a very thorough analysis.

I think I underestimated how well this crate is tuned for performance!

I applied your suggestion, but I doubt that will help. Now that I have understood the code paths a bit more, I believe its okay to have an extremely tuned/inlined version of the code for the ideal case of a cache hit, vs another cold copy that is only rarely hit.

Swatinem · 2023-02-13T09:38:01Z

The recent work for #212 I believe has changed the structure of the code a lot, and also removed a bunch of functions that were creating tons of code.

I won’t be directly persuing this PR anymore, though I believe there might still be some compile-time and code-size gains to be had with some work.

Deduplicate do_get_with_hash

bf5b57f

Due to the generic `read_recorder` closure, the `do_get_with_hash` function was generated twice in `cargo-llvm-lines` for a simple test.

tatsuya6502 self-requested a review November 20, 2022 00:33

tatsuya6502 added the enhancement New feature or request label Nov 20, 2022

Try optionally cloning the TrioArc

6aeda57

Swatinem closed this Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate `do_get_with_hash` #204

Deduplicate `do_get_with_hash` #204

Swatinem commented Nov 19, 2022

tatsuya6502 commented Nov 20, 2022 •

edited

Loading

tatsuya6502 commented Nov 20, 2022

tatsuya6502 commented Nov 20, 2022

Swatinem commented Nov 20, 2022

Swatinem commented Feb 13, 2023

Deduplicate do_get_with_hash #204

Deduplicate do_get_with_hash #204

Conversation

Swatinem commented Nov 19, 2022

tatsuya6502 commented Nov 20, 2022 • edited Loading

tatsuya6502 commented Nov 20, 2022

tatsuya6502 commented Nov 20, 2022

Swatinem commented Nov 20, 2022

Swatinem commented Feb 13, 2023

Deduplicate `do_get_with_hash` #204

Deduplicate `do_get_with_hash` #204

tatsuya6502 commented Nov 20, 2022 •

edited

Loading