Skip to content

Int4 quantization produces all-zero embedding tensors #232

@noahgift

Description

@noahgift

Bug Report

Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL Int4 inference for LLaMA-style and Qwen architectures

Description

apr import --quantize int4 produces APR files where the embedding tensor (model.embed_tokens.weight) is entirely zeros. Inference fails with F-DATA-QUALITY-001 density check.

Affected Models

Model Architecture Vocab Hidden Elements Zero %
SmolLM-135M LLaMA 49152 576 28,311,552 100%
Qwen2-0.5B Qwen/GQA 151936 896 136,134,656 100%

Error Output

[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[49152, 576], expected [vocab=49152, hidden=576]
[APR-LOAD] Embedding dims=[49152, 576], using raw data (no transpose needed)
[APR-LOAD] WARNING: Token 0 embedding is all zeros - possible load failure
[APR-LOAD] Token 0 embedding sample: [0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
[APR-LOAD] Embedding loaded: 28311552 elements (vocab=49152 x hidden=576)

error: Inference failed: Format error: [F-DATA-QUALITY-001] Tensor 'token_embedding': DENSITY FAILURE: 100.0% zeros (max 50%). Data likely loaded from wrong offset!

Note: Int4 element count (28,311,552) is correct unlike Int8 (7,077,889) — see issue #231. The shape is right but the data is all zeros.

Root Cause Hypothesis

Int4 quantization writes the embedding tensor data at the wrong offset, or fails to write it entirely. The metadata (shape, dims) is correct, but the actual float data is zeroed. This may be a different manifestation of the same import pipeline bug as #231 (Int8 gets wrong count + corrupt data; Int4 gets right count + zero data).

Reproduction

cd tiny-model-ground-truth
apr pull hf://HuggingFaceTB/SmolLM-135M
apr import hf://HuggingFaceTB/SmolLM-135M --quantize int4 -o models/smollm-135m-int4.apr
apr run models/smollm-135m-int4.apr -p "Hello" -n 32 --json
# → F-DATA-QUALITY-001 density failure

Environment

  • apr v0.2.16 (f39b7df)
  • Oracle: transformers 5.1.0, torch 2.10.0, float32, CPU, greedy
  • Platform: Linux x86_64

Contract Reference

  • contracts/tensor-layout-v1.yaml rule F-DATA-QUALITY-001

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions