-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Multimodality support (LLAVA) #678
Comments
Hey @JianbangZ , thanks for bringing this up! We're bringing in LLaVA support this coming week after some major announcements are finished. We'll also bring a simple Python API that handles multimodality. |
Woah, why specifically LLaVa? Any plans for InstructBlip or other options? Either way, VERY cool. @Kathryn-cat |
I think mainly due to it's simpleness. straight matmul projector instead of cross attention etc. |
@Kathryn-cat Great news! Will the entire pipeline be able to run on Vulkan or CUDA? perticularly the CLIP visual encoder part as LLM already has great Vulkan/CUDA support. |
Any updates on adding LLaVA to MLC? |
Really looking forward to this. Is it still happening? |
Hello folks, llava has been supported in #1974 and other followup PRs recently (many thanks to @anibohara2000!!). You are more than welcome to try out, and are welcome to open new issues if there are errors or questions regarding the Llava support. |
馃殌 Feature
Multimodality model support (LLAVA)
There has been more and more community interest in the multimodality models, such as LLAVA. LLAVA itself has quite simple architecture: CLIP+projector+LLM.
The text was updated successfully, but these errors were encountered: