Phase 0.5 — First H100 Run: Noise Floor, Calibration, and Karpathian-1 #4
bitzic
announced in
Phase Results
Replies: 1 comment
-
|
H100 spec I used for this run FYI
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
Date: May 26, 2026
Hardware: NVIDIA H100 PCIe 80GB (Shadeform / ShadeCloud)
Data: 1B tokens from FineWeb-Edu (sample-10BT), GPT-2 BPE tokenizer
Code:
KarpathianBase/karpathian@v0.5.0What was tested
Phase 0.5 is the first run of the Karpathian protocol on real hardware with real data. Three things were measured:
All runs used the unverified tier (α=0.5) since the H100 instance was not CC-capable. Verified-tier (α=1.0) testing with real TDX+nvtrust attestation is Phase 0.5c.
Results
H100 Calibration Benchmark
The calibration benchmark is a deterministic workload (matmul + attention + collective) that fingerprints the hardware. These timings become the reference for normalizing compute claims across different GPUs.
For reference: the same benchmark takes ~47ms on a CPU workstation (45× slower).
Noise Floor Calibration
10 runs of the unchanged baseline (125M params, 500 steps each) with different random seeds. This measures how much val_bpb varies from seed alone — the noise floor that a real improvement must exceed.
h100_proxy.json(125M params, 500 steps)What this means: A miner's patch needs to improve val_bpb by at least 0.013 to be considered a genuine improvement rather than seed noise. This is a tight noise floor — the protocol can detect small, real gains.
Per-seed breakdown:
Karpathian-1 Training
The first model trained on the canonical recipe — the proof artifact from whitepaper §6.8.
h100_default.json(dim=1024, 16 layers, 16 heads)Loss curve:
Note on throughput: This run used fp32 (no mixed precision). Adding bf16 autocast is expected to roughly double throughput to ~34K tok/s, cutting wall-clock to ~2 hours. This is a Phase 0.5b optimization — the architecture proof was the priority for this run.
What was verified
The full data pipeline works at scale. 1B tokens from FineWeb-Edu streamed, tokenized, and sharded in 7.4 minutes at 2.27M tok/s. Content-addressed manifest verified.
The noise floor is measurably tight. σ = 0.006 val_bpb means the network can detect genuine improvements as small as 0.013 val_bpb. This is the number the whitepaper §5.7 promised to measure empirically — now measured.
The canonical training loop runs end-to-end on GPU. Deterministic seeding, AdamW + cosine LR, gradient clipping, structured training log — all working on H100.
The calibration benchmark produces stable H100 reference timings. These anchor the hardware-independent compute unit for scoring (§5.5).
How to reproduce
git clone https://github.com/KarpathianBase/karpathian.git cd karpathian bash scripts/run_h100.shRequires an H100 GPU. The script handles everything: venv, dependencies, data prep, calibration, noise floor, and Karpathian-1 training. Expected wall-clock: ~6-7 hours total (fp32). Results land in
runs/.What's next
New infrastructure shipped with this milestone
--wandbflag inrecipe/train.pydashboard/app.pyminer/hub.pyKarpathianBase/proof-bundlesvalidator/scoring.py)validator/audit.py)🔗 Repo: github.com/KarpathianBase/karpathian
🏷️ This milestone:
v0.5.0— exact code snapshot for this phase📄 Whitepaper: v1.1 (available in repo)
Beta Was this translation helpful? Give feedback.
All reactions