Welcome to perf-cpp, a robust C++ library designed to streamline the use of the Linux perf subsystem, providing direct access to hardware performance counters. Many modern profiling tools fail to offer precise profiling of specific code segments and to associate profiled data like memory addresses with application-specific details. With perf-cpp, you can manage profiling directly within your application and handle the profiled data seamlessly.
- Count Hardware Events: Integrate performance monitoring seamlessly into your development process. Directly interact with hardware counters to focus on critical code segments.
- Sampling: Leverage sampling to gather performance data periodically, e.g., instruction pointers, memory addresses, load and store latency, branches, registers, and more.
- Customizable Event Configuration: Extend the built-in hardware events (e.g., cache-misses) with those specific to your underlying hardware substrate.
- Practical Examples: Jumpstart your implementation with our diverse collection of examples that demonstrate practical applications of the library.
Get up and running with perf-cpp in seconds:
# Clone the repository
git clone https://github.com/jmuehlig/perf-cpp.git
# Switch to the repository folder
cd perf-cpp
# Optional: Switch to the latest stable version
git checkout v0.8.1
# Build the library (in build/)
cmake . -B build -DBUILD_EXAMPLES=1
cmake --build build
# Optional: Build examples (in build/examples/bin)
cmake --build build --target examples
For detailed building instructions, including how to integrate perf-cpp into your CMake projects, visit our build guide.
Quickly set up hardware event monitoring:
#include <perfcpp/event_counter.h>
/// Initialize the counter
auto counters = perf::CounterDefinition{};
auto event_counter = perf::EventCounter{ counters };
/// Specify hardware events to count
event_counter.add({"instructions", "cycles", "cache-misses"});
/// Run the workload
event_counter.start();
your_workload(); /// <-- Your code to profile
event_counter.stop();
/// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
std::cout << event_name << ": " << value << std::endl;
}
Possible output:
instructions: 5.97298e+07
cycles: 5.02462e+08
cache-misses: 1.36517e+07
For further details, including how to count events in parallel settings, visit our guide on recording events.
Implement detailed sampling with control over the recorded content:
#include <perfcpp/sampler.h>
/// Create the sampler
auto counters = perf::CounterDefinition{};
auto sampler = perf::Sampler{ counters };
/// Specify when a sample is recorded: every 4000th cycle
sampler.trigger("cycles", perf::Period{4000U});
/// Specify what metadata is included into a sample: time, CPU ID, instruction
sampler.values()
.time(true)
.cpu_id(true)
.instruction_pointer(true);
/// Run the workload
sampler.start();
your_workload(); /// <-- Your code to profile
sampler.stop();
/// Print the samples to the console
const auto samples = sampler.result();
for (const auto& sample_record : samples)
{
const auto time = sample_record.time().value();
const auto cpu_id = sample_record.cpu_id().value();
const auto instruction = sample_record.instruction_pointer().value();
std::cout
<< "Time = " << time << " | CPU = " << cpu_id
<< " | Instruction = 0x" << std::hex << instruction << std::dec
<< std::endl;
}
Possible output:
Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c
For further details, for example, which metrics can be included into samples, visit our sampling guide.
We include a comprehensive collection of examples demonstrating the advanced capabilities of perf-cpp, including, for example, counting events in parallel settings and sampling memory accesses.
All code examples are available in the examples/ folder.
- Full Documentation: Explore detailed guides on every feature of perf-cpp.
- Examples: Learn how to set up different features from code-examples.
- Changelog: Stay updated with the latest changes and improvements.
- Minimum Linux Kernel version:
>= 4.0
- Recommended Linux Kernel version:
>= 5.13
(older Kernels might not implement all features like sampling for latency). - Installed
perf
(check ifperf stat -- ls
provides any output, otherwise follow the instructions)
We welcome contributions and feedback to make perf-cpp even better. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.
Alternatively, you can email me: jan.muehlig@tu-dortmund.de
.
While perf-cpp is dedicated to providing developers with clear insights into application performance, it is part of a broader ecosystem of tools that facilitate performance analysis. Below is a non-exhaustive list of some other valuable profiling projects:
- PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
- Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
- PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
- For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.
This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).
- Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis (2017)
- Analyzing memory accesses with modern processors (2020)
- Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison (2023)
- Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE (2024)