-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate do_get_with_hash
#204
Conversation
Due to the generic `read_recorder` closure, the `do_get_with_hash` function was generated twice in `cargo-llvm-lines` for a simple test.
Thank you for tackling to this. I actually wrote the original code by purpose for performance; I was hoping // Define a closure that skips to record a read op.
let record = |_op, _now| {}; Maybe I was right. I ran benchmark and found the change in this PR slowed down the benchmark by 1.23% on my x86_64 Linux environment. (The number on the right bottom)
But I believe what you said on #203 is also true:
So maybe this is just one result and I may get opposite result with different benchmark on different environment? |
Here is how I ran the benchmark: $ git clone https://github.com/moka-rs/mokabench.git
$ cd mokabench
## Checkout the current head of `mini-moka` branch.
## (It will be merged into `master` soon)
$ git checkout acd6cde157c5f6e633a83d5a9514fb7bbc7b8617
## Follow [the README](https://github.com/moka-rs/mokabench/#download-the-trace-datasets)
## and place the ARC S3 trace dataset into `./datasets` directory.
## Apply the patch attached at the bottom of this comment.
$ git apply mokabench.patch
## Check what Moka version is used:
$ cargo tree -i moka
## Run the benchmark:
$ cargo run --release -- --insert-once
## Edit `Cargo.toml` to use the code for this PR, and run the benchmark again.
mokabench.patch diff --git a/Cargo.toml b/Cargo.toml
index ecf3266..db73ffd 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -44,6 +44,8 @@ optional = true
# version = "0.10.0"
git = "https://github.com/moka-rs/moka"
branch = "master"
+# git = "https://github.com/Swatinem/moka"
+# branch = "do_get_with_hash"
features = ["future"]
[dependencies.moka09]
diff --git a/src/main.rs b/src/main.rs
index 2ec5981..ac4aae8 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -29,7 +29,8 @@ async fn main() -> anyhow::Result<()> {
}
async fn run_with_capacity(config: &Config, capacity: usize) -> anyhow::Result<()> {
- const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[16, 24, 32, 40, 48];
+ // const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[16, 24, 32, 40, 48];
+ const DEFAULT_NUM_CLIENTS_ARRAY: &[u16] = &[4, 8, 16, 32];
let num_clients_slice: &[u16] = if let Some(n) = &config.num_clients {
&n
diff --git a/src/trace_file.rs b/src/trace_file.rs
index 638eefe..61f5fe7 100644
--- a/src/trace_file.rs
+++ b/src/trace_file.rs
@@ -61,7 +61,8 @@ impl TraceFile {
pub fn default_capacities(&self) -> &[usize] {
match self {
- Self::S3 => &[100_000, 400_000, 800_000],
+ // Self::S3 => &[100_000, 400_000, 800_000],
+ Self::S3 => &[100_000, 800_000],
Self::Ds1 => &[1_000_000, 4_000_000, 8_000_000],
Self::Oltp => &[256, 512, 1_000, 2_000],
Self::Loop => &[256, 512, 768, 1024], |
I wonder what the performance would be if we take your change, but change it further to avoid moka/src/sync_base/base_cache.rs Line 297 in bf5b57f
Maybe one way to achieve this is to change the type of the variable
to something like the followings?
(I have to go now. I will try it later today or tomorrow. timezone:UTC+0800) |
Wow, that is a very thorough analysis. I think I underestimated how well this crate is tuned for performance! I applied your suggestion, but I doubt that will help. Now that I have understood the code paths a bit more, I believe its okay to have an extremely tuned/inlined version of the code for the ideal case of a cache hit, vs another cold copy that is only rarely hit. |
The recent work for #212 I believe has changed the structure of the code a lot, and also removed a bunch of functions that were creating tons of code. I won’t be directly persuing this PR anymore, though I believe there might still be some compile-time and code-size gains to be had with some work. |
Due to the generic
read_recorder
closure, thedo_get_with_hash
function was generated twice incargo-llvm-lines
for a simple test.This is the first improvement to #203 I have found