- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
model: add Janus Pro for image understanding #16906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
| 
           Thanks again for the review! Just updated the code and tested it again. I've updated the code and retested it with the following image:  
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Merging once the CI passes
| 
           Fix all the whitespace errors though. :)  | 
    
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* origin/master: (169 commits) opencl: support imrope (ggml-org#16914) fix: Viewing multiple PDF attachments (ggml-org#16974) model-conversion : pass config to from_pretrained (ggml-org#16963) server : add props.model_alias (ggml-org#16943) ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962) mtmd: add --image-min/max-tokens (ggml-org#16921) mtmd: pad mask for qwen2.5vl (ggml-org#16954) ggml : LoongArch fixes (ggml-org#16958) sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949) SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869) feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508) test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936) ci : disable failing riscv cross build (ggml-org#16952) model: add Janus Pro for image understanding (ggml-org#16906) clip : use FA (ggml-org#16837) server : support unified cache across slots (ggml-org#16736) common : move gpt-oss reasoning processing to init params (ggml-org#16937) docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920) CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917) devops: fix failing s390x docker build (ggml-org#16918) ...


This pull request introduces support for the Janus‑Pro 1B and Janus‑Pro 7B models within the llama.cpp framework.
The focus of this update is on image understanding (i.e., visual-input → textual or conceptual output).
Image generation is not covered by this PR.
Usage & Current Progress
Convert models to GGUF files:
The converted GGUF files can be accessed here: https://huggingface.co/Ericwang/Janus-Pro-1B-GGUF
Run the model:
References
Janus-Pro 1B model card:
https://huggingface.co/deepseek-community/Janus-Pro-1B
Janus-Pro 7B model card:
https://huggingface.co/deepseek-community/Janus-Pro-7B
Configurations:
https://huggingface.co/deepseek-community/Janus-Pro-1B/blob/main/config.json
https://huggingface.co/deepseek-community/Janus-Pro-7B/blob/main/config.json
HF Implementation:
https://github.com/huggingface/transformers/tree/main/src/transformers/models/janus