Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace other iter().position() instances with memchr #302

Merged
merged 3 commits into from
May 10, 2021

Conversation

nagisa
Copy link
Contributor

@nagisa nagisa commented May 8, 2021

This might provide similar improvements to those seen in #301, depending on how hot the relevant code is.. This makes memchr a non-optional dependency, due to its use in read_string method of pod::Bytes. If this is a problem it is still possible to make it an optional dependency used by coff and macho code.

Copy link
Contributor

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks. If the dependency is ever a problem we can make it optional with a naive implementation when disabled.

Let's merge this instead of #301. Can you delete the unneeded fixme?

src/read/archive.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Contributor

philipc commented May 10, 2021

I experimented with using reads instead of mmap (https://github.com/philipc/object/tree/pr302) and it seems promising:

mmap:
          5,324.10 msec task-clock                #    0.997 CPUs utilized          
    18,797,166,250      cycles                    #    3.531 GHz                    
    28,849,521,693      instructions              #    1.53  insn per cycle         

read:
          2,655.91 msec task-clock                #    0.983 CPUs utilized          
     9,361,529,424      cycles                    #    3.525 GHz                    
     7,898,562,868      instructions              #    0.84  insn per cycle         

Note that I changed the loop to cover the file open too (just the inner loop is slower, but that's expected to be better for mmap since it's already in memory after the first loop). It's still not ideal because it does a hash lookup and allocation for every read, but fixing that is harder and something I'll think about. Also not sure how well it translates to real workloads.

This makes parsing of the archive headers significantly faster. The `ar`
example adjusted to parse the same archive 1 million times, when run
with the rlib of the `object` crate itself produces the following
metrics:

    788.19 msec     task-clock:u              #    0.998 CPUs utilized
 2,502,967,113      cycles:u                  #    3.176 GHz
 7,780,571,392      instructions:u            #    3.11  insn per cycle

In contrast to the following for the old code:

  1,061.09 msec     task-clock:u              #    0.998 CPUs utilized
 3,374,141,510      cycles:u                  #    3.180 GHz
12,012,570,139      instructions:u            #    3.56  insn per cycle

This results in a reduction of about 1B cycles, or 25% reduction in wall
clock time.

Originally `perf` would show a heavy hotspot (in the area of 50% of the
total runtime) in `parse_sysv_extended_name`.
Here instead of figuring out the extents of the integer ahead of time we
check for the spaces while we compute the number itself. This further
reduces the runtime of the beforementioned case (see previous commit) to:

    580.57 msec     task-clock:u              #    0.997 CPUs utilized
 1,843,957,595      cycles:u                  #    3.176 GHz
 5,901,570,558      instructions:u            #    3.20  insn per cycle

`perf report` still shows that the most of the time is spent parsing
sysv archive names (which makes sense – its pretty much all the program
does after all!).
This makes `memchr` a non-optional dependency, due to its use in
`read_string` method of `pod::Bytes`.
@nagisa
Copy link
Contributor Author

nagisa commented May 10, 2021

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants