GPT-2 hidden_dim metadata reports 64 (head_dim) instead of 768

## Bug Report

**Source**: `tiny-model-ground-truth` parity checker (0/59 passing)
**Severity**: Critical — blocks ALL GPT-2 inference
**Related**: Follow-up to #233 (GPT-2 architecture support)

## Description

After the GH-233 fix added GPT-2 architecture detection and tensor name mapping, inference still fails because the metadata reports `hidden_dim=64` instead of `hidden_dim=768`. This is GPT-2's `head_dim` (768 / 12 heads = 64), not the actual hidden dimension.

The contract validator then expects embeddings of shape `[50257, 64]` (3.2M elements) but finds `[50257, 768]` (38.6M elements).

## Error Output

```
[APR-LOAD] Embedding tensor 'model.embed_tokens.weight': dims=[50257, 768], expected [vocab=50257, hidden=64]
[APR-LOAD] Token 0 embedding sample: [-0.1101, -0.0393, 0.0331, 0.1338, -0.0485]  ← good data
[APR-LOAD] Embedding loaded: 38597376 elements (vocab=50257 x hidden=64)  ← 38M is correct for 50257×768

error: F-LAYOUT-CONTRACT-001 Tensor 'token_embedding': Shape mismatch: got 38597376 elements, expected 3216448 (50257x64)
```

## Root Cause

The `infer_architecture()` or metadata extraction path for GPT-2 is reading `n_embd_head_k` or similar per-head dimension field instead of `n_embd` (768). GPT-2 config:

| Field | Value |
|-------|-------|
| `n_embd` (hidden_dim) | 768 |
| `n_head` | 12 |
| `n_embd / n_head` (head_dim) | 64 |

The import pipeline is storing 64 as hidden_dim in APR metadata.

## Affected Models

| Model | Expected hidden_dim | Got hidden_dim | Architecture |
|-------|-------------------|---------------|-------------|
| GPT-2 124M | 768 | 64 | GPT-2 |

Both Int4 and Int8 are affected.

## Fix

In the GPT-2 metadata extraction path (likely `infer_architecture()` or wherever hidden_dim is read from SafeTensors/HF config), ensure `n_embd` (768) is used, not `n_embd / n_head` (64).

## Reproduction

```bash
cd tiny-model-ground-truth
make clean && make convert
apr run models/gpt2-124m-int4.apr -p "Hello" -n 32 --json
# → expected [vocab=50257, hidden=64] but tensor has 768 columns
```

## Environment

- `apr` v0.2.16 + GH-231/232/233 fixes applied
- Platform: Linux x86_64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-2 hidden_dim metadata reports 64 (head_dim) instead of 768 #235

Bug Report

Description

Error Output

Root Cause

Affected Models

Fix

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Value
`n_embd` (hidden_dim)	768
`n_head`	12
`n_embd / n_head` (head_dim)	64

GPT-2 hidden_dim metadata reports 64 (head_dim) instead of 768 #235

Description

Bug Report

Description

Error Output

Root Cause

Affected Models

Fix

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions