-
Notifications
You must be signed in to change notification settings - Fork 995
[model] support mistral 2506 #6624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[model] support mistral 2506 #6624
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on extending the system's capabilities to support the new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the mistral_2506 model. The changes include adding the new model type and template, along with the necessary registration and loading functions. The implementation has a few issues that need to be addressed. There appears to be a copy-paste error in the model IDs used for registration, which could lead to loading the wrong model. More critically, the template class for mistral_2506 has unimplemented methods, which will cause errors during tokenization. Additionally, there's a hardcoded filename for the tokenizer which could be improved for better maintainability. Overall, the changes are in the right direction but require fixes to be functional.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the mistral_2506 model. The changes, including the addition of model types, templates, registration, and tests, are well-implemented. However, I've identified a couple of areas for improvement. There's an unused import that should be removed for code cleanliness. More importantly, I've found an issue where a stale date is used in the system prompts for some existing Mistral models, which could lead to incorrect model behavior over time. I've provided detailed feedback on how to address this to ensure the system prompts are generated dynamically.
| today = datetime.now().strftime('%Y-%m-%d') | ||
|
|
||
| mistral_2501_system = ( | ||
| 'You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup ' | ||
| 'headquartered in Paris.\n' | ||
| f'Your knowledge base was last updated on 2023-10-01. The current date is {today}.\n\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The today variable is defined at the module level, which means it's only evaluated once when the module is imported. If the application runs for more than a day, the date in the system prompt will become stale. This can lead to incorrect behavior from the model. A similar issue was addressed for Mistral2506Template by dynamically generating the system prompt. Please apply a similar fix here for mistral_2501 and mistral_2503 templates.
This would involve:
- Creating a
get_mistral_2501_system()function that returns the system prompt with the current date. - Creating a
Mistral2501Template(Template)class with an_swift_encodemethod that callsget_mistral_2501_system()to set the system prompt if it's not provided. - Updating the registration for
mistral_2501to use this new template class withdefault_system=None. - Changing
Mistral2503Templateto inherit fromMistral2501Template. - Updating the registration for
mistral_2503to setdefault_system=None.
| model_kwargs: Dict[str, Any], | ||
| load_model: bool = True, | ||
| **kwargs): | ||
| from mistral_common.tokens.tokenizers.mistral import MistralTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.