Skip to content

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

License

Notifications You must be signed in to change notification settings

jmuehlig/perf-cpp

Repository files navigation

perf-cpp: Access Performance Counters from C++ Applications

Welcome to perf-cpp, a robust C++ library designed to streamline the use of the Linux perf subsystem, providing direct access to hardware performance counters. Many modern profiling tools fail to offer precise profiling of specific code segments and to associate profiled data like memory addresses with application-specific details. With perf-cpp, you can manage profiling directly within your application and handle the profiled data seamlessly.

Key Features

  • Count Hardware Events: Integrate performance monitoring seamlessly into your development process. Directly interact with hardware counters to focus on critical code segments.
  • Sampling: Leverage sampling to gather performance data periodically, e.g., instruction pointers, memory addresses, load and store latency, branches, registers, and more.
  • Customizable Event Configuration: Extend the built-in hardware events (e.g., cache-misses) with those specific to your underlying hardware substrate.
  • Practical Examples: Jumpstart your implementation with our diverse collection of examples that demonstrate practical applications of the library.

Quick Start

Get up and running with perf-cpp in seconds:

# Clone the repository
git clone https://github.com/jmuehlig/perf-cpp.git

# Switch to the repository folder
cd perf-cpp

# Optional: Switch to the latest stable version
git checkout v0.8.1

# Build the library (in build/)
cmake . -B build -DBUILD_EXAMPLES=1
cmake --build build

# Optional: Build examples (in build/examples/bin)
cmake --build build --target examples

For detailed building instructions, including how to integrate perf-cpp into your CMake projects, visit our build guide.

Usage Examples

Count Hardware Events

Quickly set up hardware event monitoring:

#include <perfcpp/event_counter.h>

/// Initialize the counter
auto counters = perf::CounterDefinition{};
auto event_counter = perf::EventCounter{ counters };

/// Specify hardware events to count
event_counter.add({"instructions", "cycles", "cache-misses"});

/// Run the workload
event_counter.start();
your_workload(); /// <-- Your code to profile
event_counter.stop();

/// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
    std::cout << event_name << ": " << value << std::endl;
}

Possible output:

instructions: 5.97298e+07
cycles: 5.02462e+08
cache-misses: 1.36517e+07

For further details, including how to count events in parallel settings, visit our guide on recording events.

Record Samples

Implement detailed sampling with control over the recorded content:

#include <perfcpp/sampler.h>

/// Create the sampler
auto counters = perf::CounterDefinition{};
auto sampler = perf::Sampler{ counters };

/// Specify when a sample is recorded: every 4000th cycle
sampler.trigger("cycles", perf::Period{4000U});

/// Specify what metadata is included into a sample: time, CPU ID, instruction
sampler.values()
    .time(true)
    .cpu_id(true)
    .instruction_pointer(true);

/// Run the workload
sampler.start();
your_workload(); /// <-- Your code to profile
sampler.stop();

/// Print the samples to the console
const auto samples = sampler.result();
for (const auto& sample_record : samples)
{
    const auto time = sample_record.time().value();
    const auto cpu_id = sample_record.cpu_id().value();
    const auto instruction = sample_record.instruction_pointer().value();
    
    std::cout 
        << "Time = " << time << " | CPU = " << cpu_id
        << " | Instruction = 0x" << std::hex << instruction << std::dec
        << std::endl;
}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c 

For further details, for example, which metrics can be included into samples, visit our sampling guide.

Advanced Examples

We include a comprehensive collection of examples demonstrating the advanced capabilities of perf-cpp, including, for example, counting events in parallel settings and sampling memory accesses.

All code examples are available in the examples/ folder.

Further Reading

  • Full Documentation: Explore detailed guides on every feature of perf-cpp.
  • Examples: Learn how to set up different features from code-examples.
  • Changelog: Stay updated with the latest changes and improvements.

System Requirements

  • Minimum Linux Kernel version: >= 4.0
  • Recommended Linux Kernel version: >= 5.13 (older Kernels might not implement all features like sampling for latency).
  • Installed perf (check if perf stat -- ls provides any output, otherwise follow the instructions)

Contribute and Engage with Us

We welcome contributions and feedback to make perf-cpp even better. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.


Other Noteworthy Profiling Projects

While perf-cpp is dedicated to providing developers with clear insights into application performance, it is part of a broader ecosystem of tools that facilitate performance analysis. Below is a non-exhaustive list of some other valuable profiling projects:

  • PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
  • Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
  • PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
  • For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts