Skip to content

Periodic sync: keep ModelGraph NEOX rope-arch list in sync with llama.cpp #11

@pekkah

Description

@pekkah

Background

`src/SharpInference.Core/ModelGraph.cs` — `FromGgufMetadata` — has a switch over architecture strings deciding NEOX vs LLaMA-interleaved RoPE. The list is a verbatim mirror of llama.cpp's `llama_model_rope_type()` in `src/llama-model.cpp` (~line 9100, the "pairs of head values are offset by n_rot/2" block). PR #7 captured the state as of this writing — 60+ architectures.

The drift problem

llama.cpp adds new architectures regularly. Anything not in our switch falls through to LLaMA-interleaved, which is silently wrong for any new Qwen/Phi/Gemma/Falcon derivative. The failure mode is the same as the bug fixed in #6 — looks OK on simple prompts, breaks on others.

Suggested cadence

Quarterly diff:

```bash

In a llama.cpp checkout, view the NEOX block:

grep -A 70 'pairs of head values are offset by n_rot/2' src/llama-model.cpp

Map LLM_ARCH_* -> string via:

grep -nE 'LLM_ARCH_[A-Z0-9_]+,\s+"[a-z0-9_-]+"' src/llama-arch.cpp

Compare against ModelGraph.cs switch.

```

Any new entries should be added to the SharpInference switch. Non-trivial new variants (anything that isn't "NEOX vs interleaved") needs its own dispatch — see #9.

Priority

Low recurring — set a quarterly reminder, or piggyback on whenever a new Qwen / Phi / Gemma generation lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintenanceRecurring upkeep / sync with upstream

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions