Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate do_get_with_hash #204

Closed
wants to merge 2 commits into from

Conversation

Swatinem
Copy link
Contributor

Due to the generic read_recorder closure, the do_get_with_hash function was generated twice in cargo-llvm-lines for a simple test.

This is the first improvement to #203 I have found

Due to the generic `read_recorder` closure, the `do_get_with_hash`
function was generated twice in `cargo-llvm-lines` for a simple test.
@tatsuya6502 tatsuya6502 self-requested a review November 20, 2022 00:33
@tatsuya6502 tatsuya6502 added the enhancement New feature or request label Nov 20, 2022
@tatsuya6502
Copy link
Member

tatsuya6502 commented Nov 20, 2022

Thank you for tackling to this.

I actually wrote the original code by purpose for performance; I was hoping rustc is smart enough to do some extra optimization for the duplicate function when it is given a closure that does nothing:

// Define a closure that skips to record a read op.
let record = |_op, _now| {};

Maybe I was right. I ran benchmark and found the change in this PR slowed down the benchmark by 1.23% on my x86_64 Linux environment. (The number on the right bottom)

mokabench-dedup-2022-11-20

  • Environment
    • CPU: Intel Core i7-12700F
    • OS: WSL2 Ubuntu 22.04 on Windows 11
    • Rust: 1.65.0 (897e37553 2022-11-02)

But I believe what you said on #203 is also true:

This is suboptimal for a couple of reasons:

  • it generates multiple copied of functions that have to be compiled, hurting compile times.
  • may lead to binary bloat
  • may even hurt performance as the instruction cache is not used optimally

So maybe this is just one result and I may get opposite result with different benchmark on different environment?

@tatsuya6502
Copy link
Member

Here is how I ran the benchmark:

$ git clone https://github.com/moka-rs/mokabench.git
$ cd mokabench

## Checkout the current head of `mini-moka` branch.
## (It will be merged into `master` soon)
$ git checkout acd6cde157c5f6e633a83d5a9514fb7bbc7b8617

## Follow [the README](https://github.com/moka-rs/mokabench/#download-the-trace-datasets)
## and place the ARC S3 trace dataset into `./datasets` directory.

## Apply the patch attached at the bottom of this comment.
$ git apply mokabench.patch

## Check what Moka version is used:
$ cargo tree -i moka

## Run the benchmark:
$ cargo run --release -- --insert-once

## Edit `Cargo.toml` to use the code for this PR, and run the benchmark again.

mokabench.patch

diff --git a/Cargo.toml b/Cargo.toml
index ecf3266..db73ffd 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -44,6 +44,8 @@ optional = true
 # version = "0.10.0"
 git = "https://github.com/moka-rs/moka"
 branch = "master"
+# git = "https://github.com/Swatinem/moka"
+# branch = "do_get_with_hash"
 features = ["future"]
 
 [dependencies.moka09]
diff --git a/src/main.rs b/src/main.rs
index 2ec5981..ac4aae8 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -29,7 +29,8 @@ async fn main() -> anyhow::Result<()> {
 }
 
 async fn run_with_capacity(config: &Config, capacity: usize) -> anyhow::Result<()> {
-    const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[16, 24, 32, 40, 48];
+    // const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[16, 24, 32, 40, 48];
+    const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[4, 8, 16, 32];
 
     let num_clients_slice: &[u16] = if let Some(n) = &config.num_clients {
         &n
diff --git a/src/trace_file.rs b/src/trace_file.rs
index 638eefe..61f5fe7 100644
--- a/src/trace_file.rs
+++ b/src/trace_file.rs
@@ -61,7 +61,8 @@ impl TraceFile {
 
     pub fn default_capacities(&self) -> &[usize] {
         match self {
-            Self::S3 => &[100_000, 400_000, 800_000],
+            // Self::S3 => &[100_000, 400_000, 800_000],
+            Self::S3 => &[100_000, 800_000],
             Self::Ds1 => &[1_000_000, 4_000_000, 8_000_000],
             Self::Oltp => &[256, 512, 1_000, 2_000],
             Self::Loop => &[256, 512, 768, 1024],

@tatsuya6502
Copy link
Member

I wonder what the performance would be if we take your change, but change it further to avoid TrioArc::clone here when record_read is false:

Some((maybe_key, TrioArc::clone(entry), now))

Maybe one way to achieve this is to change the type of the variable maybe_entry

  • Option<(Option<Arc<K>>, TrioArc<ValueEntry<K, V>>)>

to something like the followings?

  • Option<(Option<Arc<K>>, V, Option<TrioArc<ValueEntry<K, V>>>)>

(I have to go now. I will try it later today or tomorrow. timezone:UTC+0800)

@Swatinem
Copy link
Contributor Author

Wow, that is a very thorough analysis.

I think I underestimated how well this crate is tuned for performance!

I applied your suggestion, but I doubt that will help. Now that I have understood the code paths a bit more, I believe its okay to have an extremely tuned/inlined version of the code for the ideal case of a cache hit, vs another cold copy that is only rarely hit.

@Swatinem
Copy link
Contributor Author

The recent work for #212 I believe has changed the structure of the code a lot, and also removed a bunch of functions that were creating tons of code.

I won’t be directly persuing this PR anymore, though I believe there might still be some compile-time and code-size gains to be had with some work.

@Swatinem Swatinem closed this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants