Add KORMo model by mjkmain · Pull Request #46427 · huggingface/transformers

mjkmain · 2026-06-04T22:16:59Z

Add KORMo model

What does this PR do?

Adds KORMo (Korean Open Reasoning Model), a fully
open bilingual (Korean–English) LLM, to Transformers as a native model (KORMoForCausalLM,
model_type="kormo").

KORMo is architecturally identical to Llama. The only difference is that the two
decoder-layer RMSNorms are named pre_attention_layernorm / pre_mlp_layernorm (Llama uses
input_layernorm / post_attention_layernorm). The model is therefore implemented with the
modular mechanism, inheriting from Llama and overriding just the decoder layer. Keeping the
KORMo norm names means the existing public checkpoints load unchanged (no weight renaming
required), which matters because the KORMo repos publish many training-dynamics checkpoints.

Checkpoints: https://huggingface.co/KORMo-Team

Implementation notes

modular_kormo.py inherits from Llama; configuration_kormo.py / modeling_kormo.py are
generated by the modular converter.
Equivalence checked numerically: loading a KORMo checkpoint into LlamaForCausalLM after
renaming the two layernorm keys produces bit-identical logits in fp32 (max abs diff 0.0),
confirming KORMo == Llama up to the norm naming.

Tests

pytest tests/models/kormo/test_modeling_kormo.py
# 156 passed, 121 skipped, 1206 subtests passed

Local consistency / quality (CI parity):

make check-repository-consistency   # all checks pass
make check-code-quality             # all checks pass (ruff 0.14.10, ty 0.0.20)

Before submitting

Coordinated on an issue (link: Add KORMo model #46426)
Tests pass locally (see above)
make fix-repo / style run

AI assistance disclosure

This PR was scaffolded with AI assistance (modular file, registration, docs, tests). The
submitter has reviewed every changed line, understands the change end-to-end (KORMo is the
submitter's own model), and ran the tests above.

cc @Cyrilvallez

KORMo (Korean Open Reasoning Model) is a fully open Korean-English LLM (https://huggingface.co/papers/2510.09426). It shares Llama's architecture; the only difference is that the two decoder-layer RMSNorms are named pre_attention_layernorm / pre_mlp_layernorm, so existing checkpoints load without any weight renaming. Added via modular (inherits from Llama).

github-actions · 2026-06-04T22:18:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, kormo

Rocketknight1 · 2026-06-05T11:19:48Z

Since the models are identical, can't we just make new checkpoints with the weights renamed? I don't think we need any extra architecture in transformers for this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KORMo model#46427

Add KORMo model#46427
mjkmain wants to merge 1 commit into
huggingface:mainfrom
mjkmain:add-kormo-model

mjkmain commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Rocketknight1 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mjkmain commented Jun 4, 2026

Add KORMo model

What does this PR do?

Implementation notes

Tests

Before submitting

AI assistance disclosure

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Rocketknight1 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants