[Question] Possibility for Multi-image input? #197

jsg921019 · 2023-05-26T04:01:36Z

Question

I really enjoyed reading this paper and I have played with it few days and came up to this question:
LLava Architecture is capable of having more than one image input. I tried give two images as an input, but the inference result was not good. That is probably the model was not trained with multiple image input. Have you tried training LLaVA with dataset that has more than one input??

haotian-liu · 2023-05-29T02:19:11Z

Hi @jsg921019

Thank you for your interest in our work. Due to the current way of training, we do not observe the model having very good capability referring to / comparing with multiple images. We are working on improving this aspect as well, stay tuned! You can also checkout some of the discussions here.

codybum · 2023-05-29T13:50:56Z

This is also of great interest to our group as well. We work with pathology images and it can take more than one image to describe a region of interest due to image size. In this case we don't need to compare images, but allow several images to represent one thing. This would be similar in concept to MIL modeling (https://github.com/Project-MONAI/tutorials/tree/main/pathology/multiple_instance_learning).

haotian-liu · 2023-05-31T17:49:45Z

@codybum Thanks for explaining this interesting direction! I am curious about if there is any plan in your group in working in this direction? Would be happy to integrate that into LLaVA :)

sskorol · 2023-10-19T12:14:46Z

I am also looking forward to seeing this feature soon. For instance, GPT-4V can take several images and find relationships between objects on different images. It's pretty cool and has a variety of use cases.

unnikrishnanrnair · 2023-11-17T09:37:32Z

@haotian-liu In my understanding GPT4v slices higher resolution images into 512x512 images plus one context image and then tokenizes + collates those tokens. Have you tried something like this with the latest LLaVA model by any chance? Is it something worth trying? My use case is simlar to @codybum 's where I need to pass in higher resolution images.

codybum mentioned this issue Jul 2, 2023

python inference demo #57

Open

HireTheHero mentioned this issue Sep 12, 2023

Multiple images for eval.run_llava #432

Merged

monatis mentioned this issue Oct 14, 2023

server : parallel decoding and multimodal ggerganov/llama.cpp#3589

Closed

9 tasks

gyupro mentioned this issue Oct 17, 2023

[Question] Multiple image input. #592

Closed

Quasimondo mentioned this issue Jan 24, 2024

Multiple image embeds in one prompt? vikhyat/moondream#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Possibility for Multi-image input? #197

[Question] Possibility for Multi-image input? #197

jsg921019 commented May 26, 2023

haotian-liu commented May 29, 2023

codybum commented May 29, 2023 •

edited

haotian-liu commented May 31, 2023

sskorol commented Oct 19, 2023

unnikrishnanrnair commented Nov 17, 2023

[Question] Possibility for Multi-image input? #197

[Question] Possibility for Multi-image input? #197

Comments

jsg921019 commented May 26, 2023

Question

haotian-liu commented May 29, 2023

codybum commented May 29, 2023 • edited

haotian-liu commented May 31, 2023

sskorol commented Oct 19, 2023

unnikrishnanrnair commented Nov 17, 2023

codybum commented May 29, 2023 •

edited