-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Bug Report
Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL Int4 inference for LLaMA-style and Qwen architectures
Description
apr import --quantize int4 produces APR files where the embedding tensor (model.embed_tokens.weight) is entirely zeros. Inference fails with F-DATA-QUALITY-001 density check.
Affected Models
| Model | Architecture | Vocab | Hidden | Elements | Zero % |
|---|---|---|---|---|---|
| SmolLM-135M | LLaMA | 49152 | 576 | 28,311,552 | 100% |
| Qwen2-0.5B | Qwen/GQA | 151936 | 896 | 136,134,656 | 100% |
Error Output
[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576], expected [vocab=49152, hidden=576]
[APR-LOAD] Embedding dims=[49152, 576], using raw data (no transpose needed)
[APR-LOAD] WARNING: Token 0 embedding is all zeros - possible load failure
[APR-LOAD] Token 0 embedding sample: [0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
[APR-LOAD] Embedding loaded: 28311552 elements (vocab=49152 x hidden=576)
error: Inference failed: Format error: [F-DATA-QUALITY-001] Tensor 'token_embedding': DENSITY FAILURE: 100.0% zeros (max 50%). Data likely loaded from wrong offset!
Note: Int4 element count (28,311,552) is correct unlike Int8 (7,077,889) — see issue #231. The shape is right but the data is all zeros.
Root Cause Hypothesis
Int4 quantization writes the embedding tensor data at the wrong offset, or fails to write it entirely. The metadata (shape, dims) is correct, but the actual float data is zeroed. This may be a different manifestation of the same import pipeline bug as #231 (Int8 gets wrong count + corrupt data; Int4 gets right count + zero data).
Reproduction
cd tiny-model-ground-truth
apr pull hf://HuggingFaceTB/SmolLM-135M
apr import hf://HuggingFaceTB/SmolLM-135M --quantize int4 -o models/smollm-135m-int4.apr
apr run models/smollm-135m-int4.apr -p "Hello" -n 32 --json
# → F-DATA-QUALITY-001 density failureEnvironment
aprv0.2.16 (f39b7df)- Oracle: transformers 5.1.0, torch 2.10.0, float32, CPU, greedy
- Platform: Linux x86_64
Contract Reference
contracts/tensor-layout-v1.yamlrule F-DATA-QUALITY-001