-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Bug Report
Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL GPT-2 inference
Related: Follow-up to #233 (GPT-2 architecture support)
Description
After the GH-233 fix added GPT-2 architecture detection and tensor name mapping, inference still fails because the metadata reports hidden_dim=64 instead of hidden_dim=768. This is GPT-2's head_dim (768 / 12 heads = 64), not the actual hidden dimension.
The contract validator then expects embeddings of shape [50257, 64] (3.2M elements) but finds [50257, 768] (38.6M elements).
Error Output
[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[50257, 768], expected [vocab=50257, hidden=64]
[APR-LOAD] Token 0 embedding sample: [-0.1101, -0.0393, 0.0331, 0.1338, -0.0485] ← good data
[APR-LOAD] Embedding loaded: 38597376 elements (vocab=50257 x hidden=64) ← 38M is correct for 50257×768
error: F-LAYOUT-CONTRACT-001 Tensor 'token_embedding': Shape mismatch: got 38597376 elements, expected 3216448 (50257x64)
Root Cause
The infer_architecture() or metadata extraction path for GPT-2 is reading n_embd_head_k or similar per-head dimension field instead of n_embd (768). GPT-2 config:
| Field | Value |
|---|---|
n_embd (hidden_dim) |
768 |
n_head |
12 |
n_embd / n_head (head_dim) |
64 |
The import pipeline is storing 64 as hidden_dim in APR metadata.
Affected Models
| Model | Expected hidden_dim | Got hidden_dim | Architecture |
|---|---|---|---|
| GPT-2 124M | 768 | 64 | GPT-2 |
Both Int4 and Int8 are affected.
Fix
In the GPT-2 metadata extraction path (likely infer_architecture() or wherever hidden_dim is read from SafeTensors/HF config), ensure n_embd (768) is used, not n_embd / n_head (64).
Reproduction
cd tiny-model-ground-truth
make clean && make convert
apr run models/gpt2-124m-int4.apr -p "Hello" -n 32 --json
# → expected [vocab=50257, hidden=64] but tensor has 768 columnsEnvironment
aprv0.2.16 + Int8 quantization corrupts embedding tensors (NaN/Inf + shape mismatch) #231/232/233 fixes applied- Platform: Linux x86_64