Skip to content

Conversation

@Jintao-Huang
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on extending the system's capabilities to support the new mistral_2506 large language model. It involves defining the new model and template types, implementing the necessary model and tokenizer loading functions, and registering them within the framework. Additionally, the PR includes a significant refactoring of existing Mistral model template registrations, moving common template metadata into a dedicated dataclass for improved maintainability and consistency across Mistral models.

Highlights

  • New Mistral Model Integration: Added comprehensive support for the mistral_2506 model, including its model type, template type, and specific model and tokenizer loading logic.
  • Mistral Template Refactoring: Streamlined the registration of Mistral-related templates (mistral_2501, mistral_2503, devstral) by introducing a new Mistral3TemplateMeta dataclass for improved consistency and maintainability.
  • Dependency Management Update: Modified the mistral.py model loading to directly import Mistral3ForConditionalGeneration without a try-except block, indicating a more stable dependency assumption.
  • Model Architecture Cleanup: Removed mistral_2503 from MLLMModelArch, clarifying the distinction between model types and architecture definitions within the system.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the mistral_2506 model. The changes include adding the new model type and template, along with the necessary registration and loading functions. The implementation has a few issues that need to be addressed. There appears to be a copy-paste error in the model IDs used for registration, which could lead to loading the wrong model. More critically, the template class for mistral_2506 has unimplemented methods, which will cause errors during tokenization. Additionally, there's a hardcoded filename for the tokenizer which could be improved for better maintainability. Overall, the changes are in the right direction but require fixes to be functional.

@Jintao-Huang
Copy link
Collaborator Author

/gemini review

@Jintao-Huang Jintao-Huang merged commit 7e1d218 into modelscope:main Nov 17, 2025
1 of 2 checks passed
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the mistral_2506 model. The changes, including the addition of model types, templates, registration, and tests, are well-implemented. However, I've identified a couple of areas for improvement. There's an unused import that should be removed for code cleanliness. More importantly, I've found an issue where a stale date is used in the system prompts for some existing Mistral models, which could lead to incorrect model behavior over time. I've provided detailed feedback on how to address this to ensure the system prompts are generated dynamically.

Comment on lines +13 to +18
today = datetime.now().strftime('%Y-%m-%d')

mistral_2501_system = (
'You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup '
'headquartered in Paris.\n'
f'Your knowledge base was last updated on 2023-10-01. The current date is {today}.\n\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The today variable is defined at the module level, which means it's only evaluated once when the module is imported. If the application runs for more than a day, the date in the system prompt will become stale. This can lead to incorrect behavior from the model. A similar issue was addressed for Mistral2506Template by dynamically generating the system prompt. Please apply a similar fix here for mistral_2501 and mistral_2503 templates.

This would involve:

  1. Creating a get_mistral_2501_system() function that returns the system prompt with the current date.
  2. Creating a Mistral2501Template(Template) class with an _swift_encode method that calls get_mistral_2501_system() to set the system prompt if it's not provided.
  3. Updating the registration for mistral_2501 to use this new template class with default_system=None.
  4. Changing Mistral2503Template to inherit from Mistral2501Template.
  5. Updating the registration for mistral_2503 to set default_system=None.

model_kwargs: Dict[str, Any],
load_model: bool = True,
**kwargs):
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MistralTokenizer is imported but not used in this function. It should be removed to keep the code clean.

vx120 pushed a commit to vx120/ms-swift that referenced this pull request Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants