Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double the performance of the archive parsing code #301

Closed
wants to merge 2 commits into from

Conversation

nagisa
Copy link
Contributor

@nagisa nagisa commented May 8, 2021

Started at

  1,061.09 msec     task-clock:u              #    0.998 CPUs utilized
 3,374,141,510      cycles:u                  #    3.180 GHz
12,012,570,139      instructions:u            #    3.56  insn per cycle

after these commits we get

    580.57 msec     task-clock:u              #    0.997 CPUs utilized
 1,843,957,595      cycles:u                  #    3.176 GHz
 5,901,570,558      instructions:u            #    3.20  insn per cycle

A 54% of the original wall-clock runtime, executing slightly less than half the instructions.

The test case was is effectively just

diff --git a/examples/ar.rs b/examples/ar.rs
index 22c8553..064c9b1 100644
--- a/examples/ar.rs
+++ b/examples/ar.rs
@@ -34,13 +34,15 @@ fn main() {
             return;
         }
     };
-    match op.chars().next().unwrap() {
-        't' => {
-            for member in archive.members() {
-                let member = member.unwrap();
-                println!("{}", String::from_utf8_lossy(member.name()));
+
+    for _ in 0..1_000_000 {
+        match op.chars().next().unwrap() {
+            't' => {
+                for member in archive.members() {
+                    let member = member.unwrap();
+                }
             }
+            op => println!("Invalid operation: {}", op),
         }
-        op => println!("Invalid operation: {}", op),
     }
 }

memchr is a well regarded crate, used by the standard library core, so the concern over adding this dependency should be minimal, I hope.

See the commit descriptions for slightly more information.

nagisa added 2 commits May 9, 2021 00:56
This makes parsing of the archive headers significantly faster. The `ar`
example adjusted to parse the same archive 1 million times, when run
with the rlib of the `object` crate itself produces the following
metrics:

    788.19 msec     task-clock:u              #    0.998 CPUs utilized
 2,502,967,113      cycles:u                  #    3.176 GHz
 7,780,571,392      instructions:u            #    3.11  insn per cycle

In contrast to the following for the old code:

  1,061.09 msec     task-clock:u              #    0.998 CPUs utilized
 3,374,141,510      cycles:u                  #    3.180 GHz
12,012,570,139      instructions:u            #    3.56  insn per cycle

This results in a reduction of about 1B cycles, or 25% reduction in wall
clock time.

Originally `perf` would show a heavy hotspot (in the area of 50% of the
total runtime) in `parse_sysv_extended_name`.
Here instead of figuring out the extents of the integer ahead of time we
check for the spaces while we compute the number itself. This further
reduces the runtime of the beforementioned case (see previous commit) to:

    580.57 msec     task-clock:u              #    0.997 CPUs utilized
 1,843,957,595      cycles:u                  #    3.176 GHz
 5,901,570,558      instructions:u            #    3.20  insn per cycle

`perf report` still shows that the most of the time is spent parsing
sysv archive names (which makes sense – its pretty much all the program
does after all!).
@nagisa
Copy link
Contributor Author

nagisa commented May 10, 2021

Now that I've done this, I'm thinking whether it wouldn't make sense to try and skip parsing of the filenames while just iterating through members in the first place.

@nagisa nagisa closed this May 10, 2021
@philipc
Copy link
Contributor

philipc commented May 10, 2021

Is there a use case where you want to iterate without getting the filename of every member? Parsing the names as we iterate is simpler (correct file size is known, and no need to pass the names buffer for parsing the filename later). Also, while I'm happy with this improvement, I'm unsure how relevant it is compared to the file read itself, which you weren't benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants