-
Notifications
You must be signed in to change notification settings - Fork 14k
model: support Rnj-1 #17811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: support Rnj-1 #17811
Conversation
Instead change |
|
@philip-essential Just following up on PR #17811 (Rnj-1 support). Currently hitting an error: unknown model architecture: 'rnj1' when trying to load the GGUF. Any chance we can prioritize merging this so the community can use Rnj-1? |
|
I tested the current fork of this PR and it works pretty well with the published gguf Q4 quants. The model follows OpenCode (TUI coding agent) instructions well in my brief testing. Neat model! (though 32K context size is a bit limiting for local coding agents) this might be a great agentic model for efficient execution. Thank you for all your work! Hardware tested on: 7900xtx with ROCm backend. |
That makes sense. I can try and do that soon. |
|
Q: There are a few GGUF quantizations of Rnj-1 out now that use |
Yes, the latter please, it can't be helped that GGUFs pop up pre-merge that end up being incompatible, if they are not updated post-merge so be it. |
|
I just pushed a commit that refactors it as you describe. While doing this I also noticed a bug, which is that Rnj-1 should use If you try to run rnj-1 without this PR (exactly as though it were gemma3), it does start and give superficially coherent responses at short context. As expected, at long context it breaks down. I tested this by attaching this paper and asking it who wrote the paper. This gives coherent responses with this PR and not without it. |
|
Thanks, I applied those changes and reconverted the checkpoint to get the metadata differences. It seems to run as expected. |
Just beware your previous one will run as SWA. :) |
|
Ah yes, I missed that. The new checkpoint conversion ensures that sliding_window is not present at all in the metadata if sliding_window_pattern==1, and we use that to determine whether to run with sliding window. I can see in the logs of |
Yep, just be sure to update GGUFs on HF, will merge in a bit. |
|
Yes, this commit is the one that I tested, it's deployed to HF. |
This adds support for Rnj-1, which is an 8B model we just released. We've been using llama.cpp to play around with the model internally, and we released a GGUF checkpoint for the instruction-tuned version.
The model architecture is similar enough to Gemma3 that in Transformers/VLLM/SGLang we can reuse the same model file. However, in llama.cpp we need some small changes, so I've added a new implementation, based closely on the Gemma3 one. The changes are:
final_logit_softcappingBecause our huggingface config.json uses "Gemma3ForCausalLM" as the architecture, convert_hf_to_gguf.py is unable to tell that these configs are for Rnj-1. The solution I came up with is to manually change the architecture to Rnj1ForCausalLM before converting the checkpoint. I added a note in convert_hf_to_gguf.py about this. But perhaps there's a better solution?