model: support Rnj-1 #17811

philip-essential · 2025-12-06T02:19:47Z

This adds support for Rnj-1, which is an 8B model we just released. We've been using llama.cpp to play around with the model internally, and we released a GGUF checkpoint for the instruction-tuned version.

The model architecture is similar enough to Gemma3 that in Transformers/VLLM/SGLang we can reuse the same model file. However, in llama.cpp we need some small changes, so I've added a new implementation, based closely on the Gemma3 one. The changes are:

All layers use global attention.
Long-context is via YaRN.
(edited to add:) Uses final_logit_softcapping

Because our huggingface config.json uses "Gemma3ForCausalLM" as the architecture, convert_hf_to_gguf.py is unable to tell that these configs are for Rnj-1. The solution I came up with is to manually change the architecture to Rnj1ForCausalLM before converting the checkpoint. I added a note in convert_hf_to_gguf.py about this. But perhaps there's a better solution?

CISC · 2025-12-06T13:48:39Z

Because our huggingface config.json uses "Gemma3ForCausalLM" as the architecture, convert_hf_to_gguf.py is unable to tell that these configs are for Rnj-1. The solution I came up with is to manually change the architecture to Rnj1ForCausalLM before converting the checkpoint. I added a note in convert_hf_to_gguf.py about this. But perhaps there's a better solution?

Instead change llm_build_gemma3_iswa into a templated llm_build_gemma3, like f.ex. smallthinker and add support for YaRN and non-SWA in Gemma3Model conversion.

faisal-fida · 2025-12-07T11:13:08Z

@philip-essential Just following up on PR #17811 (Rnj-1 support).

Currently hitting an error: unknown model architecture: 'rnj1' when trying to load the GGUF. Any chance we can prioritize merging this so the community can use Rnj-1?

sirmo · 2025-12-07T16:29:56Z

I tested the current fork of this PR and it works pretty well with the published gguf Q4 quants. The model follows OpenCode (TUI coding agent) instructions well in my brief testing. Neat model!

(though 32K context size is a bit limiting for local coding agents) this might be a great agentic model for efficient execution. Thank you for all your work!

Hardware tested on: 7900xtx with ROCm backend.

philip-essential · 2025-12-07T22:03:04Z

Instead change llm_build_gemma3_iswa into a templated llm_build_gemma3, like f.ex. smallthinker and add support for YaRN and non-SWA in Gemma3Model conversion.

That makes sense. I can try and do that soon.

philip-essential · 2025-12-08T18:37:32Z

Q: There are a few GGUF quantizations of Rnj-1 out now that use rnj1 for the architecture. We can update the one in our official repo, but there are some created by 3rd parties. Do you prefer that we add rnj1 as an alias of gemma3, or only use gemma3? Since rnj1 was never on master, I would guess the latter.

CISC · 2025-12-08T18:55:30Z

Q: There are a few GGUF quantizations of Rnj-1 out now that use rnj1 for the architecture. We can update the one in our official repo, but there are some created by 3rd parties. Do you prefer that we add rnj1 as an alias of gemma3, or only use gemma3? Since rnj1 was never on master, I would guess the latter.

Yes, the latter please, it can't be helped that GGUFs pop up pre-merge that end up being incompatible, if they are not updated post-merge so be it.

philip-essential · 2025-12-08T23:05:40Z

I just pushed a commit that refactors it as you describe. While doing this I also noticed a bug, which is that Rnj-1 should use final_logit_softcapping and it wasn't. I fixed this as well. I'm currently pushing updated weights to our HF repo.

If you try to run rnj-1 without this PR (exactly as though it were gemma3), it does start and give superficially coherent responses at short context. As expected, at long context it breaks down. I tested this by attaching this paper and asking it who wrote the paper. This gives coherent responses with this PR and not without it.

convert_hf_to_gguf.py

src/llama-model.cpp

src/models/gemma3.cpp

philip-essential · 2025-12-09T02:35:41Z

Thanks, I applied those changes and reconverted the checkpoint to get the metadata differences. It seems to run as expected.

CISC · 2025-12-09T02:36:38Z

Thanks, I applied those changes and reconverted the checkpoint to get the metadata differences. It seems to run as expected.

Just beware your previous one will run as SWA. :)

philip-essential · 2025-12-09T02:51:49Z

Ah yes, I missed that. The new checkpoint conversion ensures that sliding_window is not present at all in the metadata if sliding_window_pattern==1, and we use that to determine whether to run with sliding window. I can see in the logs of llama-server that sliding_window is no longer present, and I just ran a 28k-token example that seems correct.

CISC · 2025-12-09T02:55:54Z

Ah yes, I missed that. The new checkpoint conversion ensures that sliding_window is not present at all in the metadata if sliding_window_pattern==1, and we use that to determine whether to run with sliding window. I can see in the logs of llama-server that sliding_window is no longer present, and I just ran a 28k-token example that seems correct.

Yep, just be sure to update GGUFs on HF, will merge in a bit.

philip-essential · 2025-12-09T02:57:46Z

Yes, this commit is the one that I tested, it's deployed to HF.

add support for rnj1

e1b7d46

philip-essential requested review from CISC and ggerganov as code owners December 6, 2025 02:19

github-actions bot added model Model specific python python script changes labels Dec 6, 2025

loci-dev mentioned this pull request Dec 6, 2025

UPSTREAM PR #17811: model: support Rnj-1 auroralabs-loci/llama.cpp#464

Open

refactor gemma3 to support rnj-1

569a009

CISC reviewed Dec 9, 2025

View reviewed changes

address review comments

14c977a

CISC approved these changes Dec 9, 2025

View reviewed changes

CISC merged commit 1d2a1ab into ggml-org:master Dec 9, 2025
82 checks passed

model: support Rnj-1 #17811

model: support Rnj-1 #17811

Conversation

philip-essential commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Dec 6, 2025

Uh oh!

faisal-fida commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sirmo commented Dec 7, 2025

Uh oh!

philip-essential commented Dec 7, 2025

Uh oh!

philip-essential commented Dec 8, 2025

Uh oh!

CISC commented Dec 8, 2025

Uh oh!

philip-essential commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philip-essential commented Dec 9, 2025

Uh oh!

CISC commented Dec 9, 2025

Uh oh!

philip-essential commented Dec 9, 2025

Uh oh!

CISC commented Dec 9, 2025

Uh oh!

philip-essential commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

philip-essential commented Dec 6, 2025 •

edited

Loading

faisal-fida commented Dec 7, 2025 •

edited

Loading