New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking] Multimodality Support #679
Comments
@Kathryn-cat Update on LLAVA progress? |
Hi @JianbangZ , sorry for the delay. We recently underwent a huge refactorization of Python/C++ and iOS codebase, so hopefully we can officially introduce it in the next week or two. Side question: Would you prefer running LLAVA on MLC in the Gradio frontend (a webpage for uploading the images) or in phone environments (iPhone allowing you to take a picture and ask questions)? |
I think a Gradio demonstration with full Vulkan would be great |
got it! We're working on it in progress now and will release it soon. |
+1 for llava-llama-2-chat support! Any updates or timeline for this @Kathryn-cat? I see you have a dev branch here: https://github.com/Kathryn-cat/mlc-llm/tree/pr-llava-support It would seem the approach is to include inside MLC chat the CLIP encoder, projection, and embeddings. Yes, it's nice for it to run out-of-the-box like that - however for flexibility, it would also be useful to be able to embed your own tokens into the prompt. Llava is just a Llama model that has the image patch tokens embedded in the prompt. For example, I could run CLIP with TensorRT (although if MLC is fast enough with it, I can just use that) EDIT: Would |
What's the status of this? I just saw the latest LLaVA version and it seems pretty cool! It would be great to have it in MLC-LLM |
I want to use the QwenVL model with 4 billion parameters on a mobile app. So is there any update on multimodality support? Because I would love to be able to run it offline on phone with MLC LLM. If not, would I have to use TVM to do that? |
A late update - we have supported llava starting #1974. So we can conclude this issue for now I think. |
Overview
Currently, we have multimodality support for MiniGPT4, but we have not concretized a high-level Python API for that, and we have not announced its CLI and iOS support yet. Also, we need to look into more modals like LLaVA etc.
Action Items
Links to Related Issues and PRs
The text was updated successfully, but these errors were encountered: