[API][Misc] Add new model capabilities and update its determination logic#294
Conversation
…termination logic
Summary of ChangesHello @beiguo218, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the model capability system by introducing a set of standardized, industry-aligned names for various model functionalities. It also overhauls the internal logic responsible for identifying these capabilities, ensuring that models are correctly categorized under the new scheme. While maintaining backward compatibility for existing deployments, this change aims to streamline future development and improve the clarity of model descriptions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces new standardized model capabilities, aligning with industry standards. The capability determination logic has been refactored for correctness and clarity, removing unreachable code. The unit tests have been updated to reflect these changes. The changes are a good improvement. I have one suggestion to further improve the readability and efficiency of the capability determination logic.
| architecture := hfModel.GetArchitecture() | ||
| modelType := hfModel.GetModelType() | ||
|
|
||
| // Add vision capability if the model supports it | ||
| // For vision, only support image text capability right now | ||
| if hfModel.HasVision() { | ||
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityVision)) | ||
| } else if strings.Contains(strings.ToLower(architecture), "embedding") || | ||
| return append(capabilities, string(v1beta1.ModelCapabilityImageTextToText)) | ||
| } | ||
|
|
||
| // Check for text embedding capability | ||
| if strings.Contains(strings.ToLower(architecture), "embedding") || | ||
| strings.Contains(strings.ToLower(architecture), "sentence") || | ||
| strings.Contains(strings.ToLower(modelType), "bert") || | ||
| // Special case for known embedding models | ||
| (strings.Contains(strings.ToLower(modelType), "mistral") && | ||
| strings.Contains(strings.ToLower(architecture), "mistralmodel")) { | ||
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityTextEmbeddings)) | ||
| return capabilities | ||
| } else if !slices.Contains(capabilities, string(v1beta1.ModelCapabilityChat)) { | ||
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityChat)) | ||
| } else { | ||
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityTextGeneration)) | ||
| return append(capabilities, string(v1beta1.ModelCapabilityEmbedding)) | ||
| } | ||
|
|
||
| return capabilities | ||
| // Default to text-to-text capability | ||
| return append(capabilities, string(v1beta1.ModelCapabilityTextToText)) |
There was a problem hiding this comment.
To improve readability and avoid repeated calls to strings.ToLower, you can convert architecture and modelType to lowercase once at the beginning of the function.
| architecture := hfModel.GetArchitecture() | |
| modelType := hfModel.GetModelType() | |
| // Add vision capability if the model supports it | |
| // For vision, only support image text capability right now | |
| if hfModel.HasVision() { | |
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityVision)) | |
| } else if strings.Contains(strings.ToLower(architecture), "embedding") || | |
| return append(capabilities, string(v1beta1.ModelCapabilityImageTextToText)) | |
| } | |
| // Check for text embedding capability | |
| if strings.Contains(strings.ToLower(architecture), "embedding") || | |
| strings.Contains(strings.ToLower(architecture), "sentence") || | |
| strings.Contains(strings.ToLower(modelType), "bert") || | |
| // Special case for known embedding models | |
| (strings.Contains(strings.ToLower(modelType), "mistral") && | |
| strings.Contains(strings.ToLower(architecture), "mistralmodel")) { | |
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityTextEmbeddings)) | |
| return capabilities | |
| } else if !slices.Contains(capabilities, string(v1beta1.ModelCapabilityChat)) { | |
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityChat)) | |
| } else { | |
| capabilities = append(capabilities, string(v1beta1.ModelCapabilityTextGeneration)) | |
| return append(capabilities, string(v1beta1.ModelCapabilityEmbedding)) | |
| } | |
| return capabilities | |
| // Default to text-to-text capability | |
| return append(capabilities, string(v1beta1.ModelCapabilityTextToText)) | |
| architecture := strings.ToLower(hfModel.GetArchitecture()) | |
| modelType := strings.ToLower(hfModel.GetModelType()) | |
| // For vision, only support image text capability right now | |
| if hfModel.HasVision() { | |
| return append(capabilities, string(v1beta1.ModelCapabilityImageTextToText)) | |
| } | |
| // Check for text embedding capability | |
| if strings.Contains(architecture, "embedding") || | |
| strings.Contains(architecture, "sentence") || | |
| strings.Contains(modelType, "bert") || | |
| // Special case for known embedding models | |
| (strings.Contains(modelType, "mistral") && | |
| strings.Contains(architecture, "mistralmodel")) { | |
| return append(capabilities, string(v1beta1.ModelCapabilityEmbedding)) | |
| } | |
| // Default to text-to-text capability | |
| return append(capabilities, string(v1beta1.ModelCapabilityTextToText)) |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The legacy capabilities (TEXT_GENERATION, TEXT_SUMMARIZATION, TEXT_EMBEDDINGS, TEXT_RERANK, CHAT, VISION) are preserved for backward compatibility but marked for future deprecation.
Does this PR introduce a user-facing change?
Yes
Model capabilities are now reported using standardized names aligned with HuggingFace conventions. Models will now return capabilities such as
TEXT_TO_TEXT(instead ofCHAT),EMBEDDING(instead ofTEXT_EMBEDDINGS), andIMAGE_TEXT_TO_TEXT(instead ofVISION). The old capability names remain supported for backward compatibility but are considered legacy, and users can still pass old capability names in the model spec.