Skip to content

feat(hslm): double-buffered batch prefetch#563

Merged
gHashTag merged 1 commit into
mainfrom
feat/319-double-buffer-prefetch
Apr 30, 2026
Merged

feat(hslm): double-buffered batch prefetch#563
gHashTag merged 1 commit into
mainfrom
feat/319-double-buffer-prefetch

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

Summary

Double-buffered batch prefetch to overlap data loading with training.

New file

  • src/b2t/double_buffer.zig — 229 LOC

Performance

  • While GPU/CPU processes batch A, CPU prefetches batch B
  • Eliminates data loading stall (~10-20% training speedup)
  • Generic DoubleBufferedPrefetch(T, N) for comptime-sized buffers
  • Runtime BatchPrefetcher for variable batch sizes

Features

  • DoubleBufferedPrefetch: generic comptime double buffer with swap/load/ready
  • BatchPrefetcher: batch-stride-aware loader with async prefetch
  • loadBatch(): blocking load + swap
  • prefetchAsync(): non-blocking load into back buffer
  • swap(): switch active/back buffers
  • batchCount(): compute number of batches in dataset

Tests (6)

  • Buffer swap, double swap (A→B→A)
  • Swap fails when back not ready
  • Batch load and retrieval
  • Out-of-range returns false
  • Async prefetch + swap
  • Batch count calculation

Closes #319

- Add src/b2t/double_buffer.zig
- Generic DoubleBufferedPrefetch(T, N): comptime-sized double buffer
  with swap/loadFromSlice/isBackReady
- BatchPrefetcher: runtime batch loader for training data
  with async prefetch, swap, batch count
- Overlaps data loading with training: while GPU processes
  buffer A, CPU prefetches into buffer B
- 6 tests: swap, double swap, fail-safety, batch load,
  out-of-range, async prefetch+swap, batch count

Closes #319
@gHashTag gHashTag merged commit cba9118 into main Apr 30, 2026
9 of 19 checks passed
@gHashTag gHashTag deleted the feat/319-double-buffer-prefetch branch April 30, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(hslm): double-buffered batch prefetch

1 participant