model: Added support for Maincoder-1B model #18534

maincode-prabod · 2026-01-02T05:51:48Z

Added support for Maincoder-1B

ngxson

The model seems like a normal llama arch to me. Probably enough to just add one single line to convert_hf_to_gguf.py

CISC · 2026-01-02T13:11:17Z

The model seems like a normal llama arch to me. Probably enough to just add one single line to convert_hf_to_gguf.py

Agreed, this is another arch rebrand, though I'd say Qwen given the tokenizer and chat template.

maincode-prabod · 2026-01-02T13:45:22Z

Thanks for the review! I understand the concern about architecture similarity with LLAMA and Qwen.

Let me clarify the key difference:
MAINCODER applies QK normalization AFTER RoPE, not before:

# From modelling_maincoder.py (lines 200-206)
# RoPE first
query_states, key_states = apply_rotary_emb(query_states, key_states, ...)

# Then QK 
query_states = self.q_norm(query_states)  
normkey_states = self.k_norm(key_states)

This differs from:

Qwen3: Applies QK norm → then RoPE
Llama (with use_kq_norm): Uses unweighted RMS norm, no learned weights

The order matters mathematically, RoPE modifies the query/key vectors, so normalizing before vs after produces different results.

Additionally, the model has learned QK norm weights (q_norm.weight, k_norm.weight per layer), which are not present in Qwen2 and use a different application order than Qwen3.

If there's an existing architecture that matches this exact pattern (RoPE → learned QK norm), I'm happy to use that instead. I checked llama.cpp, qwen2.cpp, qwen3.cpp, and others, but none match this specific order.

convert_hf_to_gguf.py

src/models/maincoder.cpp

Removed trailing spaces from maincoder.cpp

src/models/maincoder.cpp

convert_hf_to_gguf.py

ngxson · 2026-01-02T17:57:29Z

A quick search show that hunyuan-dense.cpp also has weighted norm after rope. But I'm ok to add a new one as this case is rare.

In any cases, it's always better to leave comments in the code to avoid confusions for future contributors.

Just an extra question, what's the reason behind having norm after rope (vs. before rope?) If I understand correctly, most models apply norm before rope to avoid adding any distortions to the embedded positional information

maincode-prabod · 2026-01-02T18:54:42Z

Thanks @ngxson. Yeah, it is similar, The catch is Hunyuan uses ROPE_TYPE_NEOX (interleaved dimension pairs) while Maincoder uses ROPE_TYPE_NORM (sequential dimension pairs), different dimension ordering, so can't directly reuse it. Let me know if I'm missing something that would allow us to reuse it, though!

On norm-after-RoPE:
Good question. Normalizing after RoPE means we're scaling the combined semantic+positional representation together, which keeps attention scores more stable across different positions. We found this helped during RL fine-tuning where attention distributions are sensitive. Most models do norm-before-RoPE to preserve positional encoding exactly, but post-RoPE norm worked better for our specific use case.

pwilkin · 2026-01-02T18:58:06Z

Also, can we please appreciate the new formulation of RoPE that I at least haven't seen in Transformers before?

def apply_rotary_emb(
    xq: torch.Tensor,
    xk: torch.Tensor,
    freqs_cis: torch.Tensor,
) -> tuple[torch.Tensor, torch.Tensor]:
    """Apply rotary embeddings to query and key tensors."""
    xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))
    xk_ = torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2))

    # Broadcast freqs_cis
    freqs_cis = freqs_cis[:, :, None, :]

    xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(3)
    xk_out = torch.view_as_real(xk_ * freqs_cis).flatten(3)

    return xq_out.type_as(xq), xk_out.type_as(xk)

It's amazing in how many ways you can express the same thing in Python!

maincode-prabod · 2026-01-02T19:28:43Z

Thanks @CISC and @ngxson for the review and for helping get this over the line! Really appreciate the time, expect more PRs from our team soon!

@pwilkin I wish I could take credit for that RoPE implementation! I had the exact same reaction when I saw it. I actually adapted it from the Llama 4 implementation here

Add Maincoder model support

77b923a

maincode-prabod requested review from CISC and ggerganov as code owners January 2, 2026 05:51

github-actions bot added model Model specific python python script changes labels Jan 2, 2026

loci-dev mentioned this pull request Jan 2, 2026

UPSTREAM PR #18534: model: Added support for Maincoder-1B model auroralabs-loci/llama.cpp#782

Open

ngxson reviewed Jan 2, 2026

View reviewed changes

CISC approved these changes Jan 2, 2026

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

src/models/maincoder.cpp Outdated Show resolved Hide resolved

maincode-prabod added 2 commits January 2, 2026 08:41

Removed SPM model vocabulary setting and MOE related GGUF parameters

9e48120

Removed trailing spaces from maincoder.cpp

removed set_vocab

f3a618b

CISC reviewed Jan 2, 2026

View reviewed changes

src/models/maincoder.cpp Outdated Show resolved Hide resolved

added new line

9c8f1f9

CISC reviewed Jan 2, 2026

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

Fix formatting

2557edf

CISC reviewed Jan 2, 2026

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

Add a new line for PEP8

56adb2f

CISC merged commit 5755e52 into ggml-org:master Jan 2, 2026
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: Added support for Maincoder-1B model #18534

model: Added support for Maincoder-1B model #18534

Uh oh!

maincode-prabod commented Jan 2, 2026

Uh oh!

ngxson left a comment

Uh oh!

CISC commented Jan 2, 2026

Uh oh!

maincode-prabod commented Jan 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jan 2, 2026

Uh oh!

maincode-prabod commented Jan 2, 2026

Uh oh!

pwilkin commented Jan 2, 2026

Uh oh!

Uh oh!

maincode-prabod commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

model: Added support for Maincoder-1B model #18534

model: Added support for Maincoder-1B model #18534

Uh oh!

Conversation

maincode-prabod commented Jan 2, 2026

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented Jan 2, 2026

Uh oh!

maincode-prabod commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jan 2, 2026

Uh oh!

maincode-prabod commented Jan 2, 2026

Uh oh!

pwilkin commented Jan 2, 2026

Uh oh!

Uh oh!

maincode-prabod commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maincode-prabod commented Jan 2, 2026 •

edited

Loading