Skip to content

Add Support for Ovis2.5 Multi-Modal Model​ #40841

@xschen-beb

Description

@xschen-beb

Model description

Key Features:

Small Model Performance: Optimized training strategies enable small-scale models to achieve higher capability density, demonstrating cross-tier leading advantages.

Enhanced Reasoning Capabilities: Significantly strengthens Chain-of-Thought (CoT) reasoning abilities through the combination of instruction tuning and preference learning.

Video and Multi-Image Processing: Video and multi-image data are incorporated into training to enhance the ability to handle complex visual information across frames and images.

Multilingual Support and OCR: Enhances multilingual OCR beyond English and Chinese and improves structured data extraction from complex visual elements like tables and charts.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

[1] Arxiv: https://arxiv.org/abs/2508.11737
[2] Huggingface: https://huggingface.co/AIDC-AI/Ovis2-8B

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions