[model] support qwen3_asr#78
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the Qwen3-ASR model, including its bridge implementation, audio feature handling, and registration. It also refactors the multimodal type detection by replacing the boolean support_multimodal flag with a more specific test_mm_type attribute (e.g., 'image', 'audio', 'text') across the model configuration and various model classes. A new test case for qwen3_asr has been added to the test suite. The review feedback suggests improving the robustness of the audio feature handling in qwen3_asr.py by using the dynamic batch size from input_ids instead of a hardcoded value for dummy inputs.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for the qwen3_asr model, including the implementation of the Qwen3ASRBridge and Qwen3ASRVit classes. It refactors the multimodal configuration by replacing the support_multimodal boolean with a test_mm_type attribute to better distinguish between input types like 'image', 'audio', and 'text'. Additionally, it includes a fix in gpt_model.py to expand position_ids when using 'mrope' embeddings. A review comment correctly identified a potential AttributeError in the new qwen3_asr.py file where a module's .dtype was accessed, suggesting the use of self.config.params_dtype instead.
No description provided.