-
Notifications
You must be signed in to change notification settings - Fork 1.2k
support PaddleOCR-VL-1.5 #7979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support PaddleOCR-VL-1.5 #7979
Conversation
Summary of ChangesHello @slin000111, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the PaddleOCR-VL-1.5 model. The changes are comprehensive, including updates to documentation, model constants, architecture registration, a new loader, a new template, and an integration test. The implementation is mostly solid, but I've identified a potential bug in the new PaddleOCR1_5Loader where it overrides a base class method incorrectly. I've provided a suggestion to fix this, which also simplifies the code.
| class PaddleOCR1_5Loader(ModelLoader): | ||
|
|
||
| def get_config(self, model_dir: str) -> PretrainedConfig: | ||
| from transformers import AutoConfig | ||
| auto_config_cls = self.auto_config_cls or AutoConfig | ||
| return auto_config_cls.from_pretrained(model_dir) | ||
|
|
||
| def get_model(self, model_dir: str, *args, **kwargs) -> PreTrainedModel: | ||
| from transformers import AutoModelForImageTextToText | ||
| self.auto_model_cls = self.auto_model_cls or AutoModelForImageTextToText | ||
| return super().get_model(model_dir, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_config method overrides the base implementation and misses trust_remote_code=True, which is present in the base ModelLoader and important for loading many models from the hub. This could lead to errors. It's safer to remove this method and inherit the correct behavior from the base class.
| class PaddleOCR1_5Loader(ModelLoader): | |
| def get_config(self, model_dir: str) -> PretrainedConfig: | |
| from transformers import AutoConfig | |
| auto_config_cls = self.auto_config_cls or AutoConfig | |
| return auto_config_cls.from_pretrained(model_dir) | |
| def get_model(self, model_dir: str, *args, **kwargs) -> PreTrainedModel: | |
| from transformers import AutoModelForImageTextToText | |
| self.auto_model_cls = self.auto_model_cls or AutoModelForImageTextToText | |
| return super().get_model(model_dir, *args, **kwargs) | |
| class PaddleOCR1_5Loader(ModelLoader): | |
| def get_model(self, model_dir: str, *args, **kwargs) -> PreTrainedModel: | |
| from transformers import AutoModelForImageTextToText | |
| self.auto_model_cls = self.auto_model_cls or AutoModelForImageTextToText | |
| return super().get_model(model_dir, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model code has been merged into the transformers library and is different from the code in Model Hub, so trust_remote_code is not needed.
PR type
PR information
Write the detail information belongs to this PR.
PaddlePaddle/PaddleOCR-VL-1.5, #7975
Experiment results
Paste your experiment result here(if needed).