Skip to content

Cpu profiler#4

Merged
olaugh merged 11 commits intomainfrom
cpu-profiler
Jan 20, 2026
Merged

Cpu profiler#4
olaugh merged 11 commits intomainfrom
cpu-profiler

Conversation

@olaugh
Copy link
Owner

@olaugh olaugh commented Jan 20, 2026

Summary

  • Add 68k CPU cycle profiler that hooks into Genesis Plus GX's native CPU execution
  • Support for symbol loading from ELF files via nm
  • Sample-based profiling option (1/N sampling) for reduced overhead
  • Function attribution with call counting and cycles per call
  • Simple and CallStack profiling modes

Features

  • Profiler::AddFunction() - manually add function ranges
  • Profiler::LoadSymbolsFromELF() - auto-load symbols from ELF
  • Profiler::Start(ProfileOptions) - start with optional sampling rate
  • Profiler::GetTotalCycles() - total 68k cycles counted
  • Profiler::GetAllStats() - per-function cycle and call stats
  • Profiler::PrintReport() - formatted profiling report

Test coverage

Added 13 tests covering:

  • Start/stop/reset lifecycle
  • Cycle counting accuracy
  • Function stats population
  • Sample-based profiling accuracy (within 5% of full profiling)
  • Deterministic results across repeated runs
  • Edge cases (no symbols, double start, stop without start)

Other changes

  • Fix command injection vulnerability in LoadSymbolsFromELF (shell-quote paths)
  • Platform-specific -lm linking (Linux only, not needed on macOS)

🤖 Generated with Claude Code

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a 68k CPU cycle profiler for the Genesis Plus GX emulator test harness. The profiler hooks into the emulator's native CPU execution to collect per-function cycle counts and call statistics without requiring ROM modifications.

Changes:

  • Added CPU cycle profiler with support for symbol loading from ELF files
  • Implemented sample-based profiling for reduced overhead (configurable 1/N sampling rate)
  • Added 13 comprehensive tests covering lifecycle, accuracy, sampling, and edge cases

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
vendor/genplusgx/debug/cpuhook.h Changed cpu_hook from implicit definition to extern declaration for proper linkage
include/profiler.h New profiler API header with documentation for ProfileMode, ProfileOptions, and Profiler class
src/profiler.cpp Core profiler implementation with symbol loading, cycle tracking, and reporting
tests/profiler_test.cpp Comprehensive test suite covering basic functionality, sampling, accumulation, consistency, and edge cases
BUILD.bazel Build configuration updates to include CPU hooks, profiler sources, and platform-specific linking

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

olaugh and others added 6 commits January 20, 2026 12:19
- Add ProfileOptions struct with mode and sample_rate
- Add Start(ProfileOptions) overload
- When sample_rate > 1, only do function lookup every Nth instruction
- Cycles are scaled up to account for skipped samples

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests for CPU cycle profiler using prime_sieve ROM:
- Start/stop state management
- Cycle counting accuracy
- Function stats population
- Reset clears stats
- Sample-based profiling (rate reporting, cycle accuracy, overhead reduction)
- Accumulation across multiple runs
- Consistent deterministic results
- Edge cases (no symbols, double start, stop without start)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add validation for invalid function ranges (end_addr <= start_addr)
- Fix off-by-one error in ROM boundary check for ReadWord
- Add overflow guards for address arithmetic
- Fix sampling to accumulate actual cycles instead of scaling
- Add max call stack depth (256) to prevent unbounded growth
- Shell-quote paths in popen calls to prevent command injection
- Update Bazel deps for Bazel 9.0 compatibility (rules_cc load statement)
- Add noinline to prime_sieve functions for real symbol addresses
- Update profiler_test to use real ELF-extracted function addresses
- Fix documentation (LoadSymbolsFromELF method name, format comments)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

src/profiler.cpp:199

  • The Reset() method should also reset sample_counter_ to ensure consistent behavior when profiling is restarted after a reset. Without this, the sampling phase will be out of sync if Reset() is called during profiling.
void Profiler::Reset() {
    for (auto& kv : stats_) {
        kv.second = FunctionStats();
    }
    call_stack_.clear();
    total_cycles_ = 0;
    pending_cycles_ = 0;
    last_pc_ = 0;
    if (running_) {
        last_cycles_ = m68k.cycles;
    }
}

src/profiler.cpp:271

  • When sampling is enabled, skipping instructions without updating last_pc_ can cause incorrect call count attribution. The call detection logic at line 268 will compare the current PC with a PC from multiple instructions ago, potentially counting normal control flow within a function as function entries. This could artificially inflate call_count values. Consider either: (1) updating last_pc_ on every call to OnExecute regardless of sampling, or (2) disabling call counting when sampling is active.
    // Sampling: only do expensive work every Nth instruction
    if (sample_rate_ > 1) {
        sample_counter_++;
        if (sample_counter_ < sample_rate_) {
            return;  // Don't update last_pc_ - preserve it for call detection
        }
        sample_counter_ = 0;
        // Use accumulated cycles since last sample (more accurate than scaling)
        delta = pending_cycles_;
        pending_cycles_ = 0;
    }

    // Attribute cycles to current function
    const FunctionDef* func = LookupFunction(pc);
    if (func) {
        auto& s = stats_[func->start_addr];
        s.cycles_exclusive += delta;

        // Count function entry (PC moved into this function from outside)
        if (last_pc_ != 0) {
            const FunctionDef* last_func = LookupFunction(last_pc_);
            if (last_func != func) {
                s.call_count++;
            }
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Bazel 9.0 removed built-in cc_library/cc_test rules, but googletest
1.14.0 hasn't been updated yet. Pin to Bazel 8.5.1 until the ecosystem
catches up.

- Add .bazelversion file pinning to 8.5.1
- Revert MODULE.bazel to original dependency versions
- Remove rules_cc load statement (not needed in Bazel 8.x)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

olaugh and others added 2 commits January 20, 2026 13:36
- Move pending_cycles_ accumulation inside sampling branch to prevent
  unbounded growth when sampling is disabled (sample_rate_ == 1)
- Update last_pc_ on every instruction, even skipped samples, so
  CallStack mode can correctly read opcodes for call/return detection
- Add comments documenting that call counts and CallStack inclusive
  timing are less accurate when sampling is enabled

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Explicitly include headers for types directly used:
- <chrono> for std::chrono
- <iostream> for std::cout
- <vector> for std::vector

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/profiler.cpp Outdated
Comment on lines +129 to +130
// Note: This differs from nm output (used by LoadSymbolsFromELF) which uses
// hex for both address and size.
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about parsing format differs between LoadSymbolsFromELF and LoadSymbolsFromFile. In LoadSymbolsFromELF (line 91), the comment says size is hex, while in LoadSymbolsFromFile (lines 124-130), it explicitly states size is decimal. However, LoadSymbolsFromFile uses %u (decimal) while LoadSymbolsFromELF uses %x (hex) for size. The documentation is correct but could be clearer to emphasize this critical difference to avoid confusion when maintaining or using these functions.

Suggested change
// Note: This differs from nm output (used by LoadSymbolsFromELF) which uses
// hex for both address and size.
//
// Parsing note:
// - This function parses the address as hex (%x) and the size as decimal (%u).
// - This intentionally differs from LoadSymbolsFromELF (nm output), which uses
// hex for both address and size (%x for each).

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

olaugh and others added 2 commits January 20, 2026 14:03
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The comment now explicitly notes that LoadSymbolsFromFile uses hex for
address and decimal for size, while LoadSymbolsFromELF (nm output) uses
hex for both.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@olaugh olaugh merged commit 0353e24 into main Jan 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants