Qwen3 GPU inference produces garbage output (architecture not supported)

## Summary

Qwen3 models (e.g., Qwen3-4B) produce garbage output during GPU inference via `apr run`. Both FP16 and Q4K quantized models produce random token sequences. Qwen2.5 models work correctly with the same inference path.

## Reproduction

```bash
apr import /path/to/qwen3-4b/ --arch qwen3 --quantize q4k -o qwen3-4b-q4k.apr
apr run qwen3-4b-q4k.apr --prompt "def fibonacci(n):" --max-tokens 32 --json --chat
# Output: gibberish tokens (e.g., "obufæīĢæľīçļĦ...")
```

Same result with FP16 (no quantization), confirming the issue is in the inference architecture, not the import pipeline.

## Root Cause Analysis

Qwen3 config differences from Qwen2:
- `attention_bias: false` (Qwen2 has QKV biases)
- `head_dim: 128` (explicit, not inferred)
- `model_type: "qwen3"` (not `"qwen2"`)
- `Qwen3ForCausalLM` architecture class

The realizar GPU inference path likely:
1. Doesn't recognize `qwen3` as a supported architecture
2. Falls through to a generic path that incorrectly applies QKV biases
3. Or uses wrong attention pattern (Qwen3 may use different RoPE or attention layout)

## Environment

- gx10: NVIDIA GB10, sm_121 (Blackwell), CUDA 13.0
- Import pipeline: streaming sharded import with GH-478 fixes (quantization + tokenizer + weight tying all verified working)
- Qwen3-4B: 36 layers, hidden=2560, heads=32, kv_heads=8, intermediate=9728

## Expected Behavior

Qwen3 models should produce coherent text, matching Qwen2.5 quality for equivalent parameter counts.

## Impact

Blocks Qwen3-4B HumanEval evaluation. Currently falling back to Qwen2.5-Coder-7B-Instruct (proven 85.37% pass@1).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3 GPU inference produces garbage output (architecture not supported) #479

Summary

Reproduction

Root Cause Analysis

Environment

Expected Behavior

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Qwen3 GPU inference produces garbage output (architecture not supported) #479

Description

Summary

Reproduction

Root Cause Analysis

Environment

Expected Behavior

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions