A safepoint-based sampling performance profiler for Ruby. Uses actual time deltas as sample weights to correct safepoint bias.
- Requires Ruby >= 3.4.0
- Output: pprof protobuf, collapsed stacks, or text report
- Modes: CPU time (per-thread) and wall time (with GVL/GC tracking)
- Online manual | GitHub
gem install sperf
# Performance summary (wall mode, prints to stderr)
sperf stat ruby app.rb
# Profile to file
sperf record ruby app.rb # → sperf.data (pprof, cpu mode)
sperf record -m wall -o profile.pb.gz ruby server.rb # wall mode, custom output
# View results (report/diff require Go: https://go.dev/dl/)
sperf report # open sperf.data in browser
sperf report --top profile.pb.gz # print top functions to terminal
# Compare two profiles
sperf diff before.pb.gz after.pb.gz # open diff in browser
sperf diff --top before.pb.gz after.pb.gz # print diff to terminalrequire "sperf"
# Block form — profiles and saves to file
Sperf.start(output: "profile.pb.gz", frequency: 500, mode: :cpu) do
# code to profile
end
# Manual start/stop
Sperf.start(frequency: 1000, mode: :wall)
# ...
data = Sperf.stop
Sperf.save("profile.pb.gz", data)Profile without code changes (e.g., Rails):
SPERF_ENABLED=1 SPERF_MODE=wall SPERF_OUTPUT=profile.pb.gz ruby app.rbRun sperf help for full documentation, or see the online manual.
Inspired by Linux perf — familiar subcommand interface for profiling workflows.
| Command | Description |
|---|---|
sperf record |
Profile a command and save to file |
sperf stat |
Profile a command and print summary to stderr |
sperf report |
Open pprof profile with go tool pprof (requires Go) |
sperf diff |
Compare two pprof profiles (requires Go) |
sperf help |
Show full reference documentation |
Ruby's sampling profilers collect stack traces at safepoints, not at the exact timer tick. Traditional profilers assign equal weight to every sample, so if a safepoint is delayed 5ms, that delay is invisible.
sperf uses time deltas as sample weights:
Timer (signal or thread) VM thread (postponed job)
──────────────────────── ────────────────────────
every 1/frequency sec: at next safepoint:
rb_postponed_job_trigger() → sperf_sample_job()
time_now = read_clock()
weight = time_now - prev_time
record(backtrace, weight)
On Linux, the timer uses timer_create + signal delivery (no extra thread).
On other platforms, a dedicated pthread with nanosleep is used.
If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
| Mode | Clock | What it measures |
|---|---|---|
cpu (default) |
CLOCK_THREAD_CPUTIME_ID |
CPU time consumed (excludes sleep/I/O) |
wall |
CLOCK_MONOTONIC |
Real elapsed time (includes everything) |
Use cpu to find what consumes CPU. Use wall to find what makes things slow (I/O, GVL contention, GC).
sperf hooks GVL and GC events to attribute non-CPU time:
| Frame | Meaning |
|---|---|
[GVL blocked] |
Off-GVL time (I/O, sleep, C extension releasing GVL) |
[GVL wait] |
Waiting to reacquire the GVL (contention) |
[GC marking] |
Time in GC mark phase |
[GC sweeping] |
Time in GC sweep phase |
- Safepoint-based, but accurate: Unlike signal-based profilers (e.g., stackprof), sperf samples at safepoints. Safepoint sampling is safer — no async-signal-safety constraints, so backtraces and VM state (GC phase, GVL ownership) can be inspected reliably. The downside is less precise sampling timing, but sperf compensates by using actual time deltas as sample weights — so the profiling results faithfully reflect where time is actually spent.
- GVL & GC visibility (wall mode): Attributes off-GVL time, GVL contention, and GC phases to the responsible call stacks with synthetic frames.
- Low overhead: No extra thread on Linux (signal-based timer). Sampling overhead is ~1-5 us per sample.
- pprof compatible: Output works with
go tool pprof, speedscope, and other standard tools. - No code changes required: Profile any Ruby program via CLI (
sperf stat ruby app.rb) or environment variables (SPERF_ENABLED=1). - perf-like CLI: Familiar subcommand interface —
record,stat,report,diff— inspired by Linux perf.
- Method-level only: Profiles at the method level, not the line level. You can see which method is slow, but not which line within it.
- Ruby >= 3.4.0: Requires recent Ruby for the internal APIs used (postponed jobs, thread event hooks).
- POSIX only: Linux, macOS, etc. No Windows support.
- Safepoint sampling: Cannot sample inside C extensions or during long-running C calls that don't reach a safepoint. Time spent there is attributed to the next sample.
| Format | Extension | Use case |
|---|---|---|
| pprof (default) | .pb.gz |
sperf report, go tool pprof, speedscope |
| collapsed | .collapsed |
FlameGraph (flamegraph.pl), speedscope |
| text | .txt |
Human/AI-readable flat + cumulative report |
Format is auto-detected from extension, or set explicitly with --format.
MIT