Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use ImageBind to generate image or audio? #42

Open
NateDong72 opened this issue May 12, 2023 · 10 comments
Open

How to use ImageBind to generate image or audio? #42

NateDong72 opened this issue May 12, 2023 · 10 comments

Comments

@NateDong72
Copy link

I can run the example code. But how to run the model to generate the some images and audio?
image

@SoftologyPro
Copy link

Agreed. How can you guys spend all that time training the model and writing the paper and setting up the demo website and not spend a few hours giving working example scripts to show us how to use it?

@echo-lalia
Copy link

I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.

@WilTay1
Copy link

WilTay1 commented May 13, 2023

I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.

But the model can be downloaded and loaded in the script.

@bakachan19
Copy link

bakachan19 commented May 15, 2023

I am also interested in this. Any news?
Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper.
Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image?
Thanks!

@ikuinen
Copy link

ikuinen commented May 16, 2023

I am also interested in this. Any news? Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper. Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image? Thanks!

We made a quick attempt: https://github.com/sail-sg/BindDiffusion

@Zeqiang-Lai
Copy link

Zeqiang-Lai commented May 16, 2023

See also Anything2Image and InternGPT, it is implemented with Diffusers.

@SoftologyPro
Copy link

See also Anything2Image , it is implemented with Diffusers.

This works well with a nice gradio GUI interface.

@ChloeL19
Copy link

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

@Zeqiang-Lai
Copy link

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Maybe this could help Zeqiang-Lai/Anything2Image#4

@celster
Copy link

celster commented Jul 4, 2023

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Maybe this could help Zeqiang-Lai/Anything2Image#4

This is great!!
I'm also looking for "Image+Text --> Image". For example, take a photo and ask to perform some augmentation to the person on the photo (e.g. makeup).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants