[FEATURE REQUEST] Image2SFX #209

Merzmensch · 2024-01-31T11:37:23Z

Would it be possible to implement Image2SFX (https://huggingface.co/spaces/fffiloni/Image2SFX-comparison)? Especially with the possibility of comparing different models. Probably even have a multiple-choice UX where you can select the models you would like to use.

Thank you!

d8ahazard · 2024-04-04T23:37:08Z

So, basically, all this is doing is using some kosmos API to do a caption of the image, and then feeding that to one of the audiogen models.

As such, this would feel like a great opportunity for an extension to be created that leverages one/more LLMs to create the caption...similar to my smartprocess extension for Auto1111.

Load the image, pick a LLM to do the captioning, feed it into one of the musicGen models...

Merzmensch added the enhancement New feature or request label Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] Image2SFX #209

[FEATURE REQUEST] Image2SFX #209

Merzmensch commented Jan 31, 2024

d8ahazard commented Apr 4, 2024

[FEATURE REQUEST] Image2SFX #209

[FEATURE REQUEST] Image2SFX #209

Comments

Merzmensch commented Jan 31, 2024

d8ahazard commented Apr 4, 2024