ANE probe tests + training telemetry for M5 optimization by m0at · Pull Request #2 · maderix/ANE

m0at · 2026-03-02T06:55:19Z

Summary

4 standalone probe tests to characterize ANE behavior on M5 (H16 family), informing optimization paths to push utilization beyond the current 11.2%:
- test_weight_reload — tests if weights can be hot-swapped via unload+load without recompilation (would eliminate the compilation bottleneck entirely)
- test_perf_stats — enumerates _ANEPerformanceStats methods/properties and hardware counters
- test_qos_sweep — measures compile/load/eval latency across QoS 0-63
- test_ane_advanced — probes _ANESharedEvents, weightsBuffer IOSurface, procedureIndex, _ANEVirtualClient, _ANEChainingRequest
JSON telemetry on stderr from train_large.m — per-step timing breakdown and per-batch TFLOPS metrics, enabling real-time monitoring of ANE utilization during training
Makefile targets: make probes builds all tests, make clean removes them

Motivation

The current training pipeline achieves only 11.2% ANE utilization (1.78 of 15.8 TFLOPS) due to the compilation bottleneck — every weight update requires recompiling all 60 weight-bearing kernels via exec() restart. These probes determine whether faster paths exist (weight reload, weightsBuffer, async compile) before modifying the core training loop.

Test plan

make probes compiles all 4 test programs
make train_large compiles with telemetry additions
Each probe runs standalone and prints results to stdout
Telemetry output parseable as JSON lines on stderr: ./train_large 2>telem.jsonl
Existing training behavior unchanged (telemetry only adds to stderr)

🤖 Generated with Claude Code

Four standalone probe tests to characterize the M5 ANE: - test_weight_reload: Can weights be hot-swapped via unload+load without recompilation? - test_perf_stats: Enumerate _ANEPerformanceStats methods/properties and hardware counters - test_qos_sweep: Measure compile/load/eval latency across QoS 0-63 - test_ane_advanced: Probe SharedEvents, weightsBuffer IOSurface, procedureIndex, VirtualClient Training telemetry (train_large.m): - JSON lines to stderr with per-step timing breakdown and per-batch TFLOPS metrics - Enables external monitoring tools to visualize ANE utilization in real-time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… found Key findings from running all 4 probes on Apple M5: - Weight reload (unload+load after file overwrite) does NOT work — weights are baked at compile time, output is identical regardless of file changes - weightsBuffer IOSurface parameter also does not override compiled weights - All QoS values 0-63 work, no measurable latency difference (~0.07ms/eval) - _ANEPerformanceStats has hwExecutionTime (ns) + perfCounterData - _ANEChainingRequest supports loopback execution (output→input chaining) - _ANEClient has real-time eval path and chaining preparation methods - procedureIndex 0-15 all succeed on single-procedure models Fixed probe tests to use fp32 I/O with cast (matching inmem_peak pattern) and 64+ channel kernels (ANE minimum size requirement). Full analysis in training/m5result.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

maderix · 2026-03-02T09:17:37Z

Hi thanks for the probe tests contribution, can you please help a screenshot or output from your system as well.

Curious to know the results from your setup.

Edit my bad - just saw the results.md 😅

noreply and others added 2 commits March 1, 2026 22:54

maderix approved these changes Mar 2, 2026

View reviewed changes

maderix merged commit 893f58e into maderix:main Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANE probe tests + training telemetry for M5 optimization#2

ANE probe tests + training telemetry for M5 optimization#2
maderix merged 2 commits intomaderix:mainfrom
m0at:m5-maximized

m0at commented Mar 2, 2026

Uh oh!

maderix commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

m0at commented Mar 2, 2026

Summary

Motivation

Test plan

Uh oh!

maderix commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maderix commented Mar 2, 2026 •

edited

Loading