Skip to content

feat(rosetta): Architecture::InternLm2 variant + internlm2 family (closes #1589)#1686

Merged
noahgift merged 7 commits into
mainfrom
feat/reinternlm2-1589-clean
May 15, 2026
Merged

feat(rosetta): Architecture::InternLm2 variant + internlm2 family (closes #1589)#1686
noahgift merged 7 commits into
mainfrom
feat/reinternlm2-1589-clean

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Re-authors PR #1674 (InternLM2 rosetta variant) on a clean cherry-pick
onto current main — the original branch developed unresolvable
interleaving conflicts with the FalconClassic + BLOOM variants once
those landed in parallel.

Adds:

  • Architecture::InternLm2 enum variant in converter_types.rs
  • internlm2_map_name function mapping HuggingFace
    model.tok_embeddings/wqkv/wo/w1/w2/w3/attention_norm/
    ffn_norm/output to the APR canonical
    model.embed_tokens/self_attn.qkv_proj/o_proj/mlp.gate_proj/
    down_proj/up_proj/input_layernorm/post_attention_layernorm/
    lm_head layout
  • from_model_type("internlm2" | "internlm2_5" | "internlm2.5")
    Some(Architecture::InternLm2)
  • is_llm() includes InternLm2; display_name() returns "InternLM2"
  • contracts/model-families/internlm2.yaml (123-line family contract,
    InternLM2-7B / 2.5-7B / 20B; shared 92544 vocab, RoPE θ=1000000,
    8 K/V groups)

Out of scope (separate tickets):

  • Fused wqkv splitter at conversion layer (InternLM2 packs Q/K/V
    interleaved per GQA group — distinct from GPT-NeoX concat layout)
  • is_inference_verified() returns false until splitter is wired

Verified

  • pv validate contracts/model-families/internlm2.yaml → 0 errors
  • All 13766 aprender-core --lib tests pass
  • 3 falsifiers green: FALSIFY-PARITY-002, FALSIFY-MF-006, FALSIFY-MF-011
  • Compiles cleanly under cargo check

Closes #1589. Supersedes #1674.

🤖 Generated with Claude Code

…oses #1589)

InternLM2 / InternLM2.5 (Shanghai AI Lab; InternLM2ForCausalLM) is
LLaMA-derivative architecturally (GQA + RoPE + SwiGLU + RMSNorm) but
uses renamed tensor subtrees throughout:

  HF                                      → APR canonical
  model.tok_embeddings.weight             → model.embed_tokens.weight
  model.layers.N.attention.wqkv.weight    → self_attn.qkv_proj.weight (fused)
  model.layers.N.attention.wo.weight      → self_attn.o_proj.weight
  model.layers.N.feed_forward.w1.weight   → mlp.gate_proj.weight
  model.layers.N.feed_forward.w2.weight   → mlp.down_proj.weight  (note: w2=down)
  model.layers.N.feed_forward.w3.weight   → mlp.up_proj.weight    (note: w3=up)
  model.layers.N.attention_norm.weight    → input_layernorm.weight
  model.layers.N.ffn_norm.weight          → post_attention_layernorm.weight
  output.weight                           → lm_head.weight

Adds `Architecture::InternLm2` variant + `internlm2_map_name` (50 LOC)
+ YAML covering InternLM2-7B/2.5-7B and InternLM2-20B (shared 92544
vocab, RoPE θ=1000000, 8 K/V groups).

`from_model_type` adds "internlm2" / "internlm2_5" / "internlm2.5"
recognizers.

Verified:
- pv validate clean
- FALSIFY-PARITY-002, FALSIFY-MF-006, FALSIFY-MF-011 pass
- All 13764 aprender-core --lib tests pass

Out of scope (separate tickets):
- Fused wqkv splitter at conversion layer (InternLM2 packs Q/K/V
  interleaved per GQA group, distinct from GPT-NeoX concat layout)
- is_inference_verified() returns false until splitter is wired
@noahgift noahgift enabled auto-merge (squash) May 15, 2026 06:30
@noahgift noahgift merged commit 930ec2d into main May 15, 2026
10 checks passed
@noahgift noahgift deleted the feat/reinternlm2-1589-clean branch May 15, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add InternLM2/2.5 (InternLM2ForCausalLM) loader to aprender::rosetta

1 participant