Background
`src/SharpInference.Core/ModelGraph.cs` — `FromGgufMetadata` — has a switch over architecture strings deciding NEOX vs LLaMA-interleaved RoPE. The list is a verbatim mirror of llama.cpp's `llama_model_rope_type()` in `src/llama-model.cpp` (~line 9100, the "pairs of head values are offset by n_rot/2" block). PR #7 captured the state as of this writing — 60+ architectures.
The drift problem
llama.cpp adds new architectures regularly. Anything not in our switch falls through to LLaMA-interleaved, which is silently wrong for any new Qwen/Phi/Gemma/Falcon derivative. The failure mode is the same as the bug fixed in #6 — looks OK on simple prompts, breaks on others.
Suggested cadence
Quarterly diff:
```bash
In a llama.cpp checkout, view the NEOX block:
grep -A 70 'pairs of head values are offset by n_rot/2' src/llama-model.cpp
Map LLM_ARCH_* -> string via:
grep -nE 'LLM_ARCH_[A-Z0-9_]+,\s+"[a-z0-9_-]+"' src/llama-arch.cpp
Compare against ModelGraph.cs switch.
```
Any new entries should be added to the SharpInference switch. Non-trivial new variants (anything that isn't "NEOX vs interleaved") needs its own dispatch — see #9.
Priority
Low recurring — set a quarterly reminder, or piggyback on whenever a new Qwen / Phi / Gemma generation lands.
Background
`src/SharpInference.Core/ModelGraph.cs` — `FromGgufMetadata` — has a switch over architecture strings deciding NEOX vs LLaMA-interleaved RoPE. The list is a verbatim mirror of llama.cpp's `llama_model_rope_type()` in `src/llama-model.cpp` (~line 9100, the "pairs of head values are offset by n_rot/2" block). PR #7 captured the state as of this writing — 60+ architectures.
The drift problem
llama.cpp adds new architectures regularly. Anything not in our switch falls through to LLaMA-interleaved, which is silently wrong for any new Qwen/Phi/Gemma/Falcon derivative. The failure mode is the same as the bug fixed in #6 — looks OK on simple prompts, breaks on others.
Suggested cadence
Quarterly diff:
```bash
In a llama.cpp checkout, view the NEOX block:
grep -A 70 'pairs of head values are offset by n_rot/2' src/llama-model.cpp
Map LLM_ARCH_* -> string via:
grep -nE 'LLM_ARCH_[A-Z0-9_]+,\s+"[a-z0-9_-]+"' src/llama-arch.cpp
Compare against ModelGraph.cs switch.
```
Any new entries should be added to the SharpInference switch. Non-trivial new variants (anything that isn't "NEOX vs interleaved") needs its own dispatch — see #9.
Priority
Low recurring — set a quarterly reminder, or piggyback on whenever a new Qwen / Phi / Gemma generation lands.