[Tracking] Multimodality Support #679

Kathryn-cat · 2023-08-07T13:21:39Z

Overview

Currently, we have multimodality support for MiniGPT4, but we have not concretized a high-level Python API for that, and we have not announced its CLI and iOS support yet. Also, we need to look into more modals like LLaVA etc.

Action Items

test and tune MiniGPT4 module to ensure fast performance
bring in high-level Python API that supports multimodal generation
LLaVA model support
concretize doc on the usage and make announcement

Links to Related Issues and PRs

[Feature Request] Multimodality support (LLAVA) #678

JianbangZ · 2023-08-17T19:54:42Z

@Kathryn-cat Update on LLAVA progress?

Kathryn-cat · 2023-08-26T16:05:58Z

Hi @JianbangZ , sorry for the delay. We recently underwent a huge refactorization of Python/C++ and iOS codebase, so hopefully we can officially introduce it in the next week or two.

Side question: Would you prefer running LLAVA on MLC in the Gradio frontend (a webpage for uploading the images) or in phone environments (iPhone allowing you to take a picture and ask questions)?

JianbangZ · 2023-08-28T01:35:19Z

Hi @JianbangZ , sorry for the delay. We recently underwent a huge refactorization of Python/C++ and iOS codebase, so hopefully we can officially introduce it in the next week or two.

Side question: Would you prefer running LLAVA on MLC in the Gradio frontend (a webpage for uploading the images) or in phone environments (iPhone allowing you to take a picture and ask questions)?

I think a Gradio demonstration with full Vulkan would be great

Kathryn-cat · 2023-09-01T15:50:47Z

got it! We're working on it in progress now and will release it soon.

dusty-nv · 2023-09-16T15:45:59Z

+1 for llava-llama-2-chat support! Any updates or timeline for this @Kathryn-cat?

I see you have a dev branch here: https://github.com/Kathryn-cat/mlc-llm/tree/pr-llava-support

It would seem the approach is to include inside MLC chat the CLIP encoder, projection, and embeddings. Yes, it's nice for it to run out-of-the-box like that - however for flexibility, it would also be useful to be able to embed your own tokens into the prompt. Llava is just a Llama model that has the image patch tokens embedded in the prompt. For example, I could run CLIP with TensorRT (although if MLC is fast enough with it, I can just use that)

EDIT: Would --sep-embed and prefill_with_embed() from #419 (comment) be the correct mlc_chat API for that?

acalatrava · 2023-10-14T10:08:17Z

What's the status of this? I just saw the latest LLaVA version and it seems pretty cool! It would be great to have it in MLC-LLM

Smaran222 · 2023-12-28T23:46:14Z

I want to use the QwenVL model with 4 billion parameters on a mobile app. So is there any update on multimodality support? Because I would love to be able to run it offline on phone with MLC LLM. If not, would I have to use TVM to do that?

MasterJH5574 · 2024-04-06T02:29:32Z

A late update - we have supported llava starting #1974. So we can conclude this issue for now I think.

Kathryn-cat added the status: tracking Tracking work in progress label Aug 7, 2023

Kathryn-cat self-assigned this Aug 7, 2023

dylanbeadle mentioned this issue Aug 20, 2023

[iOS] support for multimodal #524

Merged

MasterJH5574 closed this as completed Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] Multimodality Support #679

[Tracking] Multimodality Support #679

Kathryn-cat commented Aug 7, 2023

JianbangZ commented Aug 17, 2023

Kathryn-cat commented Aug 26, 2023

JianbangZ commented Aug 28, 2023

Kathryn-cat commented Sep 1, 2023

dusty-nv commented Sep 16, 2023 •

edited

acalatrava commented Oct 14, 2023

Smaran222 commented Dec 28, 2023

MasterJH5574 commented Apr 6, 2024

[Tracking] Multimodality Support #679

[Tracking] Multimodality Support #679

Comments

Kathryn-cat commented Aug 7, 2023

Overview

Action Items

Links to Related Issues and PRs

JianbangZ commented Aug 17, 2023

Kathryn-cat commented Aug 26, 2023

JianbangZ commented Aug 28, 2023

Kathryn-cat commented Sep 1, 2023

dusty-nv commented Sep 16, 2023 • edited

acalatrava commented Oct 14, 2023

Smaran222 commented Dec 28, 2023

MasterJH5574 commented Apr 6, 2024

dusty-nv commented Sep 16, 2023 •

edited