-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Add support for Ovis-Image #12740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Ovis-Image #12740
Conversation
|
Ovis-Image has been released:
|
|
@bot /style |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the PR! I left a few feedbacks, and I think we can merge this very soon
Congrats on the release!! Sorry, we overlooked the PR (it was the thanksgiving holiday in US)
We will reach out to set up a collaboration channel for your future release.
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
|
can you run and do you want to add docs and test in a follow up PR? we need to add these docs:
test I saw you already created a folder:) |
What does this PR do?
This PR introduces Ovis-Image into the diffusers library. Ovis-Image integrates a diffusion-based visual decoder with the Ovis 2.5 multimodal backbone, leveraging a text-centric training pipeline that combines large-scale pre-training with carefully tailored post-training refinements. Despite its compact architecture, Ovis-Image achieves text rendering performance on par with significantly larger open models such as Qwen-Image and approaches closed-source systems like Seedream and GPT4o.