support PaddleOCR-VL-1.5 #7979

slin000111 · 2026-02-04T09:10:53Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.
PaddlePaddle/PaddleOCR-VL-1.5, #7975

Experiment results

Paste your experiment result here(if needed).

support paddleOCR-vl-1.5

gemini-code-assist · 2026-02-04T09:11:15Z

Summary of Changes

Hello @slin000111, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the PaddleOCR-VL-1.5 model into the system. The changes involve updating documentation to reflect the new model, defining necessary constants and architecture details, implementing specific loading and input processing logic for its multimodal capabilities, and adding a dedicated test case to ensure its proper functioning.

Highlights

New Model Support: The pull request introduces comprehensive support for the PaddleOCR-VL-1.5 multimodal model, enabling its integration and use within the system.
Documentation Updates: The Supported-models-and-datasets.md documentation, in both Chinese and English versions, has been updated to include PaddleOCR-VL-1.5 in the list of available models.
Core Model Integration: New constants for paddle_ocr_1_5 have been added to MLLMModelType and MLLMTemplateType, and its architecture details (language model, aligner, vision tower) are now registered in MLLMModelArch.
Model Loading and Template Implementation: A dedicated PaddleOCR1_5Loader class has been implemented for handling the model's configuration and loading, along with a PaddleOCR1_5Template to manage multimodal input encoding, specifically for image embeddings.
Test Coverage: A new test function, test_paddle_ocr_1_5(), has been added to tests/test_align/test_template/test_vision.py to validate the correct integration and functionality of the new model.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/Instruction/Supported-models-and-datasets.md
- Added an entry for PaddlePaddle/PaddleOCR-VL-1.5 to the list of supported models.
docs/source_en/Instruction/Supported-models-and-datasets.md
- Added an entry for PaddlePaddle/PaddleOCR-VL-1.5 to the English list of supported models.
swift/model/constant.py
- Introduced paddle_ocr_1_5 as a new MLLMModelType.
swift/model/model_arch.py
- Added paddle_ocr_1_5 to MLLMModelArch.
- Registered the specific architecture details for paddle_ocr_1_5, including its language model, aligner, and vision tower components.
swift/model/models/baidu.py
- Imported PretrainedConfig.
- Implemented PaddleOCR1_5Loader for loading the PaddleOCR-VL-1.5 model.
- Registered the PaddleOCR-VL-1.5 model with its metadata and architecture.
swift/template/constant.py
- Defined paddle_ocr_1_5 as a new MLLMTemplateType.
swift/template/templates/baidu.py
- Created PaddleOCR1_5Template to handle multimodal input encoding for PaddleOCR-VL-1.5, specifically managing image embeddings.
- Registered the new PaddleOCR1_5Template.
tests/test_align/test_template/test_vision.py
- Added a new test function test_paddle_ocr_1_5() to validate the new model's functionality.
- Updated the main execution block to include the test_paddle_ocr_1_5() test.

Activity

The pull request was initiated by slin000111 with the explicit goal of supporting PaddleOCR-VL-1.5.
The changes involve adding new code for model integration, updating documentation, and extending test coverage.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the PaddleOCR-VL-1.5 model. The changes are comprehensive, including updates to documentation, model constants, architecture registration, a new loader, a new template, and an integration test. The implementation is mostly solid, but I've identified a potential bug in the new PaddleOCR1_5Loader where it overrides a base class method incorrectly. I've provided a suggestion to fix this, which also simplifies the code.

gemini-code-assist · 2026-02-04T09:14:14Z

swift/model/models/baidu.py

+class PaddleOCR1_5Loader(ModelLoader):
+
+    def get_config(self, model_dir: str) -> PretrainedConfig:
+        from transformers import AutoConfig
+        auto_config_cls = self.auto_config_cls or AutoConfig
+        return auto_config_cls.from_pretrained(model_dir)
+
+    def get_model(self, model_dir: str, *args, **kwargs) -> PreTrainedModel:
+        from transformers import AutoModelForImageTextToText
+        self.auto_model_cls = self.auto_model_cls or AutoModelForImageTextToText
+        return super().get_model(model_dir, *args, **kwargs)


The get_config method overrides the base implementation and misses trust_remote_code=True, which is present in the base ModelLoader and important for loading many models from the hub. This could lead to errors. It's safer to remove this method and inherit the correct behavior from the base class.

Suggested change

class PaddleOCR1_5Loader(ModelLoader):

def get_config(self, model_dir: str) -> PretrainedConfig:

from transformers import AutoConfig

auto_config_cls = self.auto_config_cls or AutoConfig

return auto_config_cls.from_pretrained(model_dir)

def get_model(self, model_dir: str, *args, **kwargs) -> PreTrainedModel:

from transformers import AutoModelForImageTextToText

self.auto_model_cls = self.auto_model_cls or AutoModelForImageTextToText

return super().get_model(model_dir, *args, **kwargs)

class PaddleOCR1_5Loader(ModelLoader):

def get_model(self, model_dir: str, *args, **kwargs) -> PreTrainedModel:

from transformers import AutoModelForImageTextToText

self.auto_model_cls = self.auto_model_cls or AutoModelForImageTextToText

return super().get_model(model_dir, *args, **kwargs)

This model code has been merged into the transformers library and is different from the code in Model Hub, so trust_remote_code is not needed.

slin000111 and others added 6 commits January 29, 2026 22:46

support step3-vl-10b

1bcd3ca

add base model and fix typos

5dcbb30

Merge branch 'modelscope:main' into main

40361fb

support paddleOCR-vl-1.5

51b9dbf

Merge branch 'modelscope:main' into main

6b84b3b

Merge branch 'paddleOCR1_5' into main

7422e6c

support paddleOCR-vl-1.5

gemini-code-assist bot reviewed Feb 4, 2026

View reviewed changes

Jintao-Huang approved these changes Feb 4, 2026

View reviewed changes

slin000111 merged commit 4b7fc22 into modelscope:main Feb 4, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support PaddleOCR-VL-1.5 #7979

support PaddleOCR-VL-1.5 #7979

slin000111 commented Feb 4, 2026

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

slin000111 Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

support PaddleOCR-VL-1.5 #7979

support PaddleOCR-VL-1.5 #7979

Conversation

slin000111 commented Feb 4, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

slin000111 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants