Skip to content

feat(bootstrap): host performance model and cycle estimator (Wave 42, R-HS-4)#793

Open
gHashTag wants to merge 3 commits into
masterfrom
feat/wave-42/host-perf
Open

feat(bootstrap): host performance model and cycle estimator (Wave 42, R-HS-4)#793
gHashTag wants to merge 3 commits into
masterfrom
feat/wave-42/host-perf

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

Wave 42 — R-HS-4: Host performance model and cycle estimator

Closes #791

What changed

  • New file bootstrap/src/host/perf.rs:

    • EngineConfig — cycles, DMA beats, BRAM utilization, throughput estimation
    • LayerEstimate — per-layer DMA prefetch + compute + DMA drain breakdown
    • PerformanceEstimate — full report across all layers
    • Constants: DATA_WIDTH=54, BRAM_DEPTH=4096, DDR_BEAT=64 bits
    • 19 inline unit tests
  • Updated bootstrap/src/host/mod.rs: pub mod perf + re-exports

  • Updated bootstrap/src/main.rs:

    • New CLI Commands::HostPerf with --clock-mhz flag (default 66 MHz)
    • run_host_perf() prints single-line estimate
  • New file bootstrap/tests/host_perf.rs: 23 integration tests

  • Updated docs/NOW.md: wave-42 entry

Test results

77/77 inline host tests pass
25/25 host_driver integration tests pass
21/21 host_irq integration tests pass
20/20 host_engine integration tests pass
23/23 host_perf integration tests pass
= 42 new tests, zero regressions

Output example

$ t27c host-perf --num-layers 3 --neurons 64 --chunks 8 --clock-mhz 66
OK layers=3 neurons=64 chunks=8 total_cycles=3072 total_weight_words=1536 bram_pct=37.5% dma_beats=3072 throughput=21484.4 inf/s @ 66.0 MHz

claude added 3 commits May 24, 2026 02:34
…-HS-2, Closes #786)

Wave 40 adds IrqHandler callback registry, IrqDrivenDriver with
wait_done_irq, and host-poll-vs-irq CLI that runs both completion
paths against MockMmio and compares write/read counts.

32 new tests (11 inline + 21 integration). Zero regressions.
…loses #789)

Wave 41 adds InferenceEngine with per-layer DMA prefetch -> inference ->
DMA drain cycle, wait_irq_mask generic IRQ wait, and host-inference CLI.

36 new tests (16 inline + 20 integration). Zero regressions.
…loses #791)

Wave 42 adds EngineConfig with cycle/DMA/BRAM/throughput estimation,
LayerEstimate per-layer breakdown, and host-perf CLI that prints
total_cycles, bram_pct, dma_beats, throughput at a given clock freq.

42 new tests (19 inline + 23 integration). Zero regressions.
This was referenced May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(bootstrap): host performance model and cycle estimator (Wave 42, R-HS-4)

2 participants