Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I let the model receive multiple images at once #60

Open
bibibabibo26 opened this issue Jul 3, 2024 · 1 comment
Open

How can I let the model receive multiple images at once #60

bibibabibo26 opened this issue Jul 3, 2024 · 1 comment

Comments

@bibibabibo26
Copy link

Can your model be fed with multiple images at once, such as different frames of a video? Or can it be modified so that the input to the language model is the tokens of multiple images at once?

@mmaaz60
Copy link
Member

mmaaz60 commented Jul 8, 2024

Hi @bibibabibo26,

Thank you for your interest in our work. The current GLaMM model is designed to work with single image only. However, it can be modified to accept multiple images. At the LLM part, it would be relatively simpler as we can consider multiple images as video frames and concatenate the images. In the grounding part, we may have to introduce special tokens to decide if the generated <seg> token refers to the first of second image. Alternatively, we need to design a segmentation encoder-decoder architecture that can work with multiple images.

Please do share if you have made any progress towards this interesting research direction. Good Luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants