Skip to content

feat(rosetta): Architecture::FalconClassic variant + falcon family (closes #1587)#1673

Merged
noahgift merged 1 commit into
mainfrom
fix/1587-falcon-classic-variant
May 14, 2026
Merged

feat(rosetta): Architecture::FalconClassic variant + falcon family (closes #1587)#1673
noahgift merged 1 commit into
mainfrom
fix/1587-falcon-classic-variant

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Closes #1587. Adds `Architecture::FalconClassic` variant + Falcon-specific tensor mapper + `contracts/model-families/falcon.yaml`.

Why FalconClassic needs its own variant

Falcon (TII; `FalconForCausalLM`) is distinct from both:

  • FalconH1 (hybrid Transformer+SSM, uses `Architecture::FalconH1` mapper)
  • BLOOM (different prefix, different position encoding)

Falcon-specific traits:

  • HF prefix `transformer.h.N.` (BLOOM: `h.N.`; LLaMA: `model.layers.N.*`)
  • RoPE position encoding (BLOOM uses ALiBi)
  • Fused QKV with MQA (7B: 1 K/V head) or MGQA (40B/11B: 8 K/V groups)
  • Two layernorm layouts: 7B has single per-block; 40B has separate `ln_attn` + `ln_mlp` (parallel residual design)

Engine changes

  • `converter_types.rs::Architecture` + `FalconClassic` variant
  • `tensor_expectation.rs::map_name` + dispatch
  • `tensor_expectation.rs::is_llm`, `display_name`, `from_model_type` updates
  • `tensor_expectation.rs::falcon_classic_map_name` NEW (95 LOC) — handles both 7B (single `input_layernorm`) and 40B (`ln_attn` + `ln_mlp`) variants
  • Coverage test `test_from_model_type_unknown_gh219` updated: `from_model_type("falcon")` now returns `Some(FalconClassic)`

YAML

`contracts/model-families/falcon.yaml` covers 7B (MQA, RoPE θ=10000), 11B (MGQA, RoPE θ=500000), 40B (MGQA, RoPE θ=10000). All share 65024-token vocab.

Test plan

  • `pv validate` clean
  • FALSIFY-PARITY-002 `test_every_model_family_yaml_has_architecture` passes
  • FALSIFY-MF-006 `no_duplicate_architecture_classes` passes
  • FALSIFY-MF-011 `vocab_consistency` passes
  • All 13764 aprender-core --lib tests pass
  • CI: workspace-test

Out of scope

  • Parallel attn+mlp residual runtime — `is_inference_verified()` returns false for FalconClassic; engine has no parallel-residual code path
  • MQA/MGQA-aware QKV splitter at conversion layer

🤖 Generated with Claude Code

…loses #1587)

Falcon classic (TII; FalconForCausalLM, 7B/40B/11B/RW variants) is
distinct from FalconH1 (hybrid Transformer+SSM) and from BLOOM:
- HF prefix is `transformer.h.N.*` (BLOOM uses `h.N.*`; LLaMA uses `model.layers.N.*`)
- RoPE position encoding (not ALiBi like BLOOM)
- Fused QKV (`self_attention.query_key_value`) with MQA/MGQA layout
- Falcon-7B uses single per-block layernorm
- Falcon-40B uses separate `ln_attn` + `ln_mlp` (parallel attn+mlp residuals)

Adds `Architecture::FalconClassic` variant + `falcon_classic_map_name`
(95 LOC) that translates both 7B and 40B layernorm variants:
  transformer.word_embeddings.weight   → model.embed_tokens.weight
  transformer.h.N.input_layernorm.*    → model.layers.N.input_layernorm.* (7B)
  transformer.h.N.ln_attn.*            → model.layers.N.input_layernorm.* (40B)
  transformer.h.N.ln_mlp.*             → model.layers.N.post_attention_layernorm.* (40B)
  transformer.h.N.self_attention.query_key_value.* → model.layers.N.self_attn.qkv_proj.* (fused)
  transformer.h.N.self_attention.dense.* → model.layers.N.self_attn.o_proj.*
  transformer.h.N.mlp.dense_h_to_4h.*    → model.layers.N.mlp.up_proj.*
  transformer.h.N.mlp.dense_4h_to_h.*    → model.layers.N.mlp.down_proj.*
  transformer.ln_f.*                     → model.norm.*

YAML at contracts/model-families/falcon.yaml covers 7B (MQA),
11B (MGQA, RoPE θ=500000), and 40B (MGQA, RoPE θ=10000) sizes.

Coverage test updated: `test_from_model_type_unknown_gh219` previously
asserted `from_model_type("falcon") == None`. Updated to expect
`Some(FalconClassic)` post-#1587.

Verified:
- pv validate clean
- FALSIFY-PARITY-002, FALSIFY-MF-006, FALSIFY-MF-011 pass
- All 13764 aprender-core --lib tests pass

Out of scope (separate tickets):
- Parallel attn+mlp residual runtime support
- MQA/MGQA-aware QKV splitter at conversion layer
@noahgift noahgift enabled auto-merge (squash) May 14, 2026 14:58
@noahgift noahgift merged commit 41a5914 into main May 14, 2026
11 checks passed
@noahgift noahgift deleted the fix/1587-falcon-classic-variant branch May 14, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add Falcon (classic) (FalconForCausalLM) loader to aprender::rosetta

1 participant