Skip to content

Conversation

@ravenouse
Copy link
Contributor

@ravenouse ravenouse commented Oct 31, 2025

This pull request introduces support for the Janus‑Pro 1B and Janus‑Pro 7B models within the llama.cpp framework.

The focus of this update is on image understanding (i.e., visual-input → textual or conceptual output).
Image generation is not covered by this PR.

Usage & Current Progress

Convert models to GGUF files:

# Convert the base Janus-Pro 1B model
python convert_hf_to_gguf.py deepseek-community/Janus-Pro-1B \
    --remote \
    --outfile janus-pro-1b-f16.gguf \
    --outtype f16

# Convert the mmproj component
python convert_hf_to_gguf.py deepseek-community/Janus-Pro-1B \
    --remote \
    --outfile mmproj-janus-pro-1b-f16.gguf \
    --outtype f16 \
    --mmproj

The converted GGUF files can be accessed here: https://huggingface.co/Ericwang/Janus-Pro-1B-GGUF

Run the model:

# Build the project:
cmake -B build
cmake --build build --target llama-mtmd-cli

./build/bin/llama-mtmd-cli \
    -m janus-pro-1b-f16.gguf \
    --mmproj mmproj-janus-pro-1b-f16.gguf \
    --chat-template deepseek

References

Janus-Pro 1B model card:
https://huggingface.co/deepseek-community/Janus-Pro-1B

Janus-Pro 7B model card:
https://huggingface.co/deepseek-community/Janus-Pro-7B

Configurations:
https://huggingface.co/deepseek-community/Janus-Pro-1B/blob/main/config.json
https://huggingface.co/deepseek-community/Janus-Pro-7B/blob/main/config.json

HF Implementation:
https://github.com/huggingface/transformers/tree/main/src/transformers/models/janus

@ravenouse ravenouse requested review from CISC and ngxson as code owners October 31, 2025 23:24
@ravenouse
Copy link
Contributor Author

ravenouse and others added 5 commits November 1, 2025 09:05
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
@ravenouse
Copy link
Contributor Author

Hi @CISC and @ngxson ,

Thank you for the thorough review and valuable feedback.
I've addressed all the comments. I also re-ran the conversion and inference workflows, and both are working as expected.

Ready for another look when you have a moment. Thanks a lot!

ravenouse and others added 2 commits November 2, 2025 10:05
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
@ravenouse
Copy link
Contributor Author

ravenouse commented Nov 2, 2025

Thanks again for the review!

Just updated the code and tested it again.

I've updated the code and retested it with the following image:
https://1.bp.blogspot.com/-tLB0HRLcOp4/Tj4Pvhsq6vI/AAAAAAAAAG8/h6ahy6g4GJI/s1600/Llama_lying_down.jpg

Screenshot 2025-11-02 at 10 17 44 AM


PS: The forced push was made to correct a formatting typo in the commit message.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Merging once the CI passes

@CISC
Copy link
Collaborator

CISC commented Nov 2, 2025

Fix all the whitespace errors though. :)
https://github.com/ggml-org/llama.cpp/actions/runs/19016845485/job/54305935425?pr=16906

ngxson and others added 2 commits November 2, 2025 21:14
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@ngxson ngxson requested a review from CISC November 2, 2025 20:18
@ngxson ngxson merged commit 6b9a524 into ggml-org:master Nov 2, 2025
75 of 79 checks passed
@ravenouse
Copy link
Contributor Author

Hi @CISC and @ngxson ,

Thank you so much for the quick review and support to finalize this PR. Truly appreciated!

gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 3, 2025
* origin/master: (169 commits)
opencl: support imrope (ggml-org#16914)
fix: Viewing multiple PDF attachments (ggml-org#16974)
model-conversion : pass config to from_pretrained (ggml-org#16963)
server : add props.model_alias (ggml-org#16943)
ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962)
mtmd: add --image-min/max-tokens (ggml-org#16921)
mtmd: pad mask for qwen2.5vl (ggml-org#16954)
ggml : LoongArch fixes (ggml-org#16958)
sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949)
SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869)
feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508)
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936)
ci : disable failing riscv cross build (ggml-org#16952)
model: add Janus Pro for image understanding (ggml-org#16906)
clip : use FA (ggml-org#16837)
server : support unified cache across slots (ggml-org#16736)
common : move gpt-oss reasoning processing to init params (ggml-org#16937)
docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920)
CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917)
devops: fix failing s390x docker build (ggml-org#16918)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants