Double the performance of the archive parsing code #301

nagisa · 2021-05-08T21:46:53Z

Started at

  1,061.09 msec     task-clock:u              #    0.998 CPUs utilized
 3,374,141,510      cycles:u                  #    3.180 GHz
12,012,570,139      instructions:u            #    3.56  insn per cycle

after these commits we get

    580.57 msec     task-clock:u              #    0.997 CPUs utilized
 1,843,957,595      cycles:u                  #    3.176 GHz
 5,901,570,558      instructions:u            #    3.20  insn per cycle

A 54% of the original wall-clock runtime, executing slightly less than half the instructions.

The test case was is effectively just

diff --git a/examples/ar.rs b/examples/ar.rs
index 22c8553..064c9b1 100644
--- a/examples/ar.rs
+++ b/examples/ar.rs
@@ -34,13 +34,15 @@ fn main() {
             return;
         }
     };
-    match op.chars().next().unwrap() {
-        't' => {
-            for member in archive.members() {
-                let member = member.unwrap();
-                println!("{}", String::from_utf8_lossy(member.name()));
+
+    for _ in 0..1_000_000 {
+        match op.chars().next().unwrap() {
+            't' => {
+                for member in archive.members() {
+                    let member = member.unwrap();
+                }
             }
+            op => println!("Invalid operation: {}", op),
         }
-        op => println!("Invalid operation: {}", op),
     }
 }

memchr is a well regarded crate, used by the standard library core, so the concern over adding this dependency should be minimal, I hope.

See the commit descriptions for slightly more information.

This makes parsing of the archive headers significantly faster. The `ar` example adjusted to parse the same archive 1 million times, when run with the rlib of the `object` crate itself produces the following metrics: 788.19 msec task-clock:u # 0.998 CPUs utilized 2,502,967,113 cycles:u # 3.176 GHz 7,780,571,392 instructions:u # 3.11 insn per cycle In contrast to the following for the old code: 1,061.09 msec task-clock:u # 0.998 CPUs utilized 3,374,141,510 cycles:u # 3.180 GHz 12,012,570,139 instructions:u # 3.56 insn per cycle This results in a reduction of about 1B cycles, or 25% reduction in wall clock time. Originally `perf` would show a heavy hotspot (in the area of 50% of the total runtime) in `parse_sysv_extended_name`.

Here instead of figuring out the extents of the integer ahead of time we check for the spaces while we compute the number itself. This further reduces the runtime of the beforementioned case (see previous commit) to: 580.57 msec task-clock:u # 0.997 CPUs utilized 1,843,957,595 cycles:u # 3.176 GHz 5,901,570,558 instructions:u # 3.20 insn per cycle `perf report` still shows that the most of the time is spent parsing sysv archive names (which makes sense – its pretty much all the program does after all!).

nagisa · 2021-05-10T12:55:36Z

Now that I've done this, I'm thinking whether it wouldn't make sense to try and skip parsing of the filenames while just iterating through members in the first place.

philipc · 2021-05-10T13:29:36Z

Is there a use case where you want to iterate without getting the filename of every member? Parsing the names as we iterate is simpler (correct file size is known, and no need to pass the names buffer for parsing the filename later). Also, while I'm happy with this improvement, I'm unsure how relevant it is compared to the file read itself, which you weren't benchmarking.

nagisa added 2 commits May 9, 2021 00:56

nagisa force-pushed the nagisa/perf-archive branch from 1468c7b to 5bb34e7 Compare May 8, 2021 21:56

nagisa mentioned this pull request May 8, 2021

Replace other iter().position() instances with memchr #302

Merged

nagisa closed this May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double the performance of the archive parsing code #301

Double the performance of the archive parsing code #301

nagisa commented May 8, 2021

nagisa commented May 10, 2021

philipc commented May 10, 2021

Double the performance of the archive parsing code #301

Double the performance of the archive parsing code #301

Conversation

nagisa commented May 8, 2021

nagisa commented May 10, 2021

philipc commented May 10, 2021