Skip to content
master
Go to file
Code

Files

Permalink
Failed to load latest commit information.

README.md

cargo-profiler

Build Status Crates.io license

Cargo subcommand to profile binaries

To install

NOTE: This subcommand can only be used on Linux machines.

First install valgrind:

$ sudo apt-get install valgrind

Then you can install cargo-profiler via cargo install.

$ cargo install cargo-profiler

Alternatively, you can clone this repo and build the binary from the source.

$ cargo build --release

Now, copy the built binary to the same directory as cargo.

$ sudo cp ./target/release/cargo-profiler $(dirname $(which cargo))/

To run

Cargo profiler currently supports callgrind and cachegrind.

You can call cargo profiler anywhere in a rust project directory with a Cargo.toml.

$ cargo profiler callgrind
$ cargo profiler cachegrind --release

You can also specify a binary directly:

$ cargo profiler callgrind --bin $PATH_TO_BINARY

To specify command line arguments to the executable being profiled, append them after a --:

$ cargo profiler callgrind --bin $PATH_TO_BINARY -- -a 3 --like this

You can chose to keep the callgrind/cachegrind output files using the --keep option

$ cargo profiler callgrind --keep

You can limit the number of functions you'd like to look at:

$ cargo profiler callgrind --bin ./target/debug/rsmat -n 10

Profiling rsmat with callgrind...

Total Instructions...198,466,456

78,346,775 (39.5%) dgemm_kernel.rs:matrixmultiply::gemm::masked_kernel
-----------------------------------------------------------------------
23,528,320 (11.9%) iter.rs:_..std..ops..Range..A....as..std..iter..Iterator..::next
-----------------------------------------------------------------------
16,824,925 (8.5%) loopmacros.rs:matrixmultiply::gemm::masked_kernel
-----------------------------------------------------------------------
10,236,864 (5.2%) mem.rs:core::mem::swap
-----------------------------------------------------------------------
7,712,846 (3.9%) memset.S:memset
-----------------------------------------------------------------------
7,197,344 (3.6%) ???:core::cmp::impls::_..impl..cmp..PartialOrd..for..usize..::lt
-----------------------------------------------------------------------
6,979,680 (3.5%) ops.rs:_..usize..as..ops..Add..::add
-----------------------------------------------------------------------

With cachegrind, you can also sort the data by a particular metric column:

$ cargo profiler cachegrind --bin ./target/debug/rsmat -n 10 --sort dr

Profiling rsmat with cachegrind...

Total Memory Accesses...320,385,356

Total L1 I-Cache Misses...371 (0%)
Total LL I-Cache Misses...308 (0%)
Total L1 D-Cache Misses...58,549 (0%)
Total LL D-Cache Misses...8,451 (0%)

 Ir  I1mr ILmr  Dr  D1mr DLmr  Dw  D1mw DLmw
0.40 0.18 0.21 0.35 0.93 1.00 0.38 0.00 0.00 dgemm_kernel.rs:matrixmultiply::gemm::masked_kernel
-----------------------------------------------------------------------
0.08 0.04 0.05 0.12 0.00 0.00 0.02 0.00 0.00 loopmacros.rs:matrixmultiply::gemm::masked_kernel
-----------------------------------------------------------------------
0.12 0.02 0.02 0.10 0.00 0.00 0.15 0.00 0.00 iter.rs:_std..ops..RangeAasstd..iter..Iterator::next
-----------------------------------------------------------------------
0.05 0.01 0.01 0.07 0.00 0.00 0.08 0.00 0.00 mem.rs:core::mem::swap
-----------------------------------------------------------------------
0.03 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 ???:core::cmp::impls::_implcmp..PartialOrdforusize::lt
-----------------------------------------------------------------------
0.03 0.01 0.01 0.04 0.00 0.00 0.03 0.00 0.00 ops.rs:_busizeasops..Addausize::add
-----------------------------------------------------------------------
0.04 0.01 0.01 0.04 0.00 0.00 0.03 0.00 0.00 ptr.rs:core::ptr::_implconstT::offset
-----------------------------------------------------------------------
0.02 0.01 0.00 0.03 0.00 0.00 0.01 0.00 0.00 ???:_usizeasops..Add::add
-----------------------------------------------------------------------
0.01 0.01 0.01 0.02 0.00 0.00 0.01 0.00 0.00 mem.rs:core::mem::uninitialized
-----------------------------------------------------------------------
0.02 0.01 0.01 0.02 0.00 0.00 0.04 0.00 0.00 wrapping.rs:_XorShiftRngasRng::next_u32
-----------------------------------------------------------------------


What are the cachegrind metrics?

  • Ir -> Total Instructions
  • I1mr -> Level 1 I-Cache misses
  • ILmr -> Last Level I-Cache misses
  • Dr -> Total Memory Reads
  • D1mr -> Level 1 D-Cache read misses
  • DLmr -> Last Level D-cache read misses
  • Dw -> Total Memory Writes
  • D1mw -> Level 1 D-Cache write misses
  • DLmw -> Last Level D-cache write misses

TODO

  • cmp subcommand - compare binary profiles
  • profiler macros
  • better context around expensive functions
  • support for more profiling tools

About

Cargo subcommand to profile binaries

Topics

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.