Skip to content

GPT-2 hidden_dim metadata reports 64 (head_dim) instead of 768 #235

@noahgift

Description

@noahgift

Bug Report

Source: tiny-model-ground-truth parity checker (0/59 passing)
Severity: Critical — blocks ALL GPT-2 inference
Related: Follow-up to #233 (GPT-2 architecture support)

Description

After the GH-233 fix added GPT-2 architecture detection and tensor name mapping, inference still fails because the metadata reports hidden_dim=64 instead of hidden_dim=768. This is GPT-2's head_dim (768 / 12 heads = 64), not the actual hidden dimension.

The contract validator then expects embeddings of shape [50257, 64] (3.2M elements) but finds [50257, 768] (38.6M elements).

Error Output

[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[50257, 768], expected [vocab=50257, hidden=64]
[APR-LOAD] Token 0 embedding sample: [-0.1101, -0.0393, 0.0331, 0.1338, -0.0485]  ← good data
[APR-LOAD] Embedding loaded: 38597376 elements (vocab=50257 x hidden=64)  ← 38M is correct for 50257×768

error: F-LAYOUT-CONTRACT-001 Tensor 'token_embedding': Shape mismatch: got 38597376 elements, expected 3216448 (50257x64)

Root Cause

The infer_architecture() or metadata extraction path for GPT-2 is reading n_embd_head_k or similar per-head dimension field instead of n_embd (768). GPT-2 config:

Field Value
n_embd (hidden_dim) 768
n_head 12
n_embd / n_head (head_dim) 64

The import pipeline is storing 64 as hidden_dim in APR metadata.

Affected Models

Model Expected hidden_dim Got hidden_dim Architecture
GPT-2 124M 768 64 GPT-2

Both Int4 and Int8 are affected.

Fix

In the GPT-2 metadata extraction path (likely infer_architecture() or wherever hidden_dim is read from SafeTensors/HF config), ensure n_embd (768) is used, not n_embd / n_head (64).

Reproduction

cd tiny-model-ground-truth
make clean && make convert
apr run models/gpt2-124m-int4.apr -p "Hello" -n 32 --json
# → expected [vocab=50257, hidden=64] but tensor has 768 columns

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions