Skip to content

imperatormk/metal-debug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

metal-debug

Printf-style debugging for Metal compute shaders. No Xcode GPU debugger, no buffer dumps, no guessing.

Add #include "metal_debug.h" to your shader, drop in a debug buffer, see what every thread computed.

#include "metal_debug.h"

kernel void my_kernel(
    device float *A       [[buffer(0)]],
    device float *B       [[buffer(1)]],
    device float *C       [[buffer(2)]],
    device uint  *dbg_buf [[buffer(30)]],
    uint id [[thread_position_in_grid]]
) {
    float a = A[id], b = B[id];

    dbg_printf(dbg_buf, id, 0, a);       // log input A
    dbg_printf(dbg_buf, id, 1, b);       // log input B

    float result = a * b;
    dbg_watch_nan(dbg_buf, id, 2, result);  // only logs if NaN/Inf
    dbg_assert(dbg_buf, id, 3, result > 0); // GPU-side assertion

    C[id] = result;
}

Host side (ObjC):

MetalDebugSession *dbg = [[MetalDebugSession alloc] initWithDevice:device maxEntries:4096];
[encoder setBuffer:dbg.buffer offset:0 atIndex:30];
// ... dispatch ...
[dbg dump];

Output:

[metal-debug] 24 entries
  thread[0] 0: 3.5
  thread[0] 1: 2.0
  thread[1] 0: 1.2
  thread[1] 1: -0.5
  thread[1] 3: ASSERTION FAILED

Features

Feature GPU API Description
Printf dbg_printf(buf, tid, tag, val) Log float/int/uint/half/vec values
Conditional dbg_printf_if(buf, cond, tid, tag, val) Only log when condition is true
NaN watchpoint dbg_watch_nan(buf, tid, tag, val) Log only NaN/Inf values
Range watchpoint dbg_watch_range(buf, tid, tag, val, lo, hi) Log values outside range
Assertions dbg_assert(buf, tid, tag, cond) Record assertion failures
Breakpoints dbg_break(buf, tid, tag, cond) Set flag for host to detect
Stats dbg_stats(buf, tag, val) Cross-thread min/max/mean/count
Histogram dbg_histogram(buf, tag, val, lo, hi) Value distribution with bar chart
Named tags Preprocessor or host-side "loss" instead of tag=42
2D grid view Host-side Display values as threadgroup grid
Diff mode Host-side Compare two kernel runs
Zero-overhead disable #define METAL_DEBUG_DISABLE Compiles out all debug calls

How it works

  1. GPU side: metal_debug.h is a single header. Debug calls write (thread_id, tag, type, value) entries into a device buffer using atomic counters.

  2. Host side: MetalDebugSession allocates the buffer, binds it at slot 30, and reads/formats entries after execution.

  3. No recompilation needed when changing buffer size — max_entries is stored in the buffer itself and read by the GPU at runtime.

Build & test

git clone <this repo>
cd metal-debug
make test    # compiles + runs 9 test kernels on your GPU

Integration

ObjC / C++

Copy src/metal_debug.h into your project. Link runtime/MetalDebugSession.{h,m} into your app.

#import "MetalDebugSession.h"

MetalDebugSession *dbg = [[MetalDebugSession alloc]
    initWithDevice:device maxEntries:4096];
[dbg setName:@"loss" forTag:0];

[encoder setBuffer:dbg.buffer offset:0 atIndex:30];
// dispatch kernel...

[dbg dump];                    // all entries, sorted by thread
[dbg dumpTag:0];               // filter by tag
[dbg dumpGrid:0 width:8 height:8]; // 2D threadgroup view
[dbg dumpStats:0];             // min/max/mean
[dbg dumpHistogram:0 lo:0 hi:1]; // value distribution

if ([dbg breakpointHit])
    [dbg dumpBreakpoint];      // what went wrong

[dbg reset];                   // reuse for next dispatch

Swift

import Metal

let dbg = MetalDebugSession(device: device, maxEntries: 4096)
encoder.setBuffer(dbg.buffer, offset: 0, index: 30)
// dispatch kernel...
dbg.dump()

See examples/SwiftDemo/ for a complete Swift example.

Python (PyTorch MPS / Triton)

from metal_debug import MetalDebugSession

dbg = MetalDebugSession(max_entries=4096)
# pass dbg.tensor as buffer(30) to your Metal/Triton kernel
torch.mps.synchronize()
dbg.dump()

Interactive TUI

Explore debug traces interactively — filter, navigate, see grid views and stats live:

pip install textual

# Launch with demo data
python python/tui.py --demo

# Launch with a debug buffer dump
python python/tui.py trace.bin

Or from Python after a kernel dispatch:

dbg.explore(grid_width=8, grid_height=8)

Keyboard shortcuts:

Key Action
↑/↓ Navigate entries
g Show 2D grid for selected tag
a Show assertions only
b Jump to breakpoint thread
c Clear filters
m Toggle mouse (enable copy/paste)
escape Focus table from filter input
q Quit

Source preprocessor

Auto-inject the debug buffer parameter into kernel signatures and use string tags:

python3 src/metal_debug_preprocess.py my_kernel.metal -o my_kernel_debug.metal
xcrun metal -I path/to/metal-debug/src -o out.metallib my_kernel_debug.metal

Before:

kernel void foo(device float *A [[buffer(0)]], uint id [[thread_position_in_grid]]) {
    dbg(id, "value", A[id]);
}

After preprocessing:

kernel void foo(device float *A [[buffer(0)]], uint id [[thread_position_in_grid]],
                device uint *_dbg_buf [[buffer(30)]]) {
    dbg(id, 47248/*value*/, A[id]);
}

Convenience macros

If you use DBG_PARAM in your kernel signature, the short macros work:

kernel void my_kernel(device float *A [[buffer(0)]], DBG_PARAM,
                      uint id [[thread_position_in_grid]]) {
    dbg(id, 0, A[id]);              // printf
    dbg_if(id == 0, id, 1, A[id]);  // conditional
    dbg_nan(id, 2, A[id]);          // NaN watchpoint
    dbg_check(id, 3, A[id] > 0);   // assertion
    dbg_stat(0, A[id]);             // stats accumulator
    dbg_hist(0, A[id], 0, 100);     // histogram
    dbg_brk(id, 4, A[id] < 0);     // breakpoint
}

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors