Skip to content

Add KORMo model#46427

Open
mjkmain wants to merge 1 commit into
huggingface:mainfrom
mjkmain:add-kormo-model
Open

Add KORMo model#46427
mjkmain wants to merge 1 commit into
huggingface:mainfrom
mjkmain:add-kormo-model

Conversation

@mjkmain
Copy link
Copy Markdown

@mjkmain mjkmain commented Jun 4, 2026

Add KORMo model

What does this PR do?

Adds KORMo (Korean Open Reasoning Model), a fully
open bilingual (Korean–English) LLM, to Transformers as a native model (KORMoForCausalLM,
model_type="kormo").

KORMo is architecturally identical to Llama. The only difference is that the two
decoder-layer RMSNorms are named pre_attention_layernorm / pre_mlp_layernorm (Llama uses
input_layernorm / post_attention_layernorm). The model is therefore implemented with the
modular mechanism, inheriting from Llama and overriding just the decoder layer. Keeping the
KORMo norm names means the existing public checkpoints load unchanged (no weight renaming
required), which matters because the KORMo repos publish many training-dynamics checkpoints.

Checkpoints: https://huggingface.co/KORMo-Team

Implementation notes

  • modular_kormo.py inherits from Llama; configuration_kormo.py / modeling_kormo.py are
    generated by the modular converter.
  • Equivalence checked numerically: loading a KORMo checkpoint into LlamaForCausalLM after
    renaming the two layernorm keys produces bit-identical logits in fp32 (max abs diff 0.0),
    confirming KORMo == Llama up to the norm naming.

Tests

pytest tests/models/kormo/test_modeling_kormo.py
# 156 passed, 121 skipped, 1206 subtests passed

Local consistency / quality (CI parity):

make check-repository-consistency   # all checks pass
make check-code-quality             # all checks pass (ruff 0.14.10, ty 0.0.20)

Before submitting

AI assistance disclosure

This PR was scaffolded with AI assistance (modular file, registration, docs, tests). The
submitter has reviewed every changed line, understands the change end-to-end (KORMo is the
submitter's own model), and ran the tests above.

cc @Cyrilvallez

KORMo (Korean Open Reasoning Model) is a fully open Korean-English LLM
(https://huggingface.co/papers/2510.09426). It shares Llama's architecture;
the only difference is that the two decoder-layer RMSNorms are named
pre_attention_layernorm / pre_mlp_layernorm, so existing checkpoints load
without any weight renaming.

Added via modular (inherits from Llama).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, kormo

@Rocketknight1
Copy link
Copy Markdown
Member

Since the models are identical, can't we just make new checkpoints with the weights renamed? I don't think we need any extra architecture in transformers for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants