Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widget for text to image generation #113

Closed
osanseviero opened this issue Jun 16, 2021 · 22 comments
Closed

Widget for text to image generation #113

osanseviero opened this issue Jun 16, 2021 · 22 comments
Assignees
Labels
frontend good first issue Good for newcomers widgets About our Inference widgets

Comments

@osanseviero
Copy link
Member

Input would be text
The output would be the image

@osanseviero osanseviero added good first issue Good for newcomers widgets About our Inference widgets frontend labels Jun 16, 2021
@mishig25
Copy link
Contributor

More than happy to take this one.
@osanseviero @julien-c please let me know if there's anything I need to improve in terms of the process I have taken for huggingface/hub-docs#14 and huggingface/hub-docs#15
Otherwise, I'll follow similar approach and implement other widgets

@osanseviero
Copy link
Member Author

You're going full speed! ⚡

The approach has been great so far, I think you can go ahead with the widget.

@mishig25
Copy link
Contributor

Awesome! 🤗

@mishig25
Copy link
Contributor

Created draft PR #131

@osanseviero osanseviero linked a pull request Jun 21, 2021 that will close this issue
@osanseviero
Copy link
Member Author

Some example motivation for this :)
https://discuss.huggingface.co/t/generate-gif-reply-to-english-text-with-vqgan-clip/7166

@patil-suraj
Copy link

Some example motivation for this :)
https://discuss.huggingface.co/t/generate-gif-reply-to-english-text-with-vqgan-clip/7166

And this as well huggingface/transformers#12281 😉

@borisdayma
Copy link
Contributor

Hi, I'm interested in this widget.
What's the progress?

@osanseviero
Copy link
Member Author

There is an existing PR - #131 but at the moment this is not integrated to the Inference API (since there is no transformers pipeline for this nor in the other libraries). If you want to build a demo for this, consider using Spaces instead for the time being.

@osanseviero
Copy link
Member Author

Btw, if you have model repos with a text to image generation model, I would be interested if you share them + some example on how to do inference.

@borisdayma
Copy link
Contributor

Would it make sense to allow custom pipelines?
We're working on a mini DALL-E repo so we'll need something completely custom

@osanseviero
Copy link
Member Author

Yes, having a generic docker image to support custom inference is in our roadmap. We will require a very specific input/output requirement for the API to work, but other than that it will be up to the users to do their own implementations, including adding their custom dependencies.

FYI @Narsil @julien-c

@julien-c
Copy link
Member

They can already submit a PR against https://github.com/huggingface/huggingface_hub/tree/main/api-inference-community though no?

@osanseviero
Copy link
Member Author

@julien-c you mean in the form of a new docker image just for this use case?

@julien-c
Copy link
Member

Yes – the boundary between single model and "library" being rather slim as we've seen in other cases, wouldn't mini-dall-e be a good image name in api-inference-community?
The image can also always support different checkpoints down the line. WDYT?

@mishig25
Copy link
Contributor

Should we change the API output: from image to an array of images?

That way, we can show multiple generated images from a single text prompt, as it's currently done in most papers/demos/tweets. (See attached an image from @borisdayma's tweet)
Screenshot 2021-07-19 at 15 25 55

If the output changes to an array of images, I'm thinking that the widget would involve some kind of horizontal scrolling/switching between different images

@osanseviero
Copy link
Member Author

Yes – the boundary between single model and "library" being rather slim as we've seen in other cases, wouldn't mini-dall-e be a good image name in api-inference-community?

If this is a short-time solution, let's do it! @borisdayma, feel free to open a PR about this.

For one-shot cases, instead of having to maintain a very large number of images, I would prefer to go down the generic image path so we don't end up with hundreds of images. I think the us from the future will thank us going that road. I can work on adding the generic docker image the first week of August (in two weeks from now).

Should we change the API output: from image to an array of images?

That way, we can show multiple generated images from a single text prompt, as it's currently done in most papers/demos/tweets. (See attached an image from @borisdayma's tweet)

I'll ask @Narsil to share his opinion on this, from an endpoint point of view it makes more sense to only return one imo. Alternatively users could maybe specify number of generated images as a param, but maybe it's just better to make multiple calls to the API

@borisdayma
Copy link
Contributor

I'm not sure if the best here would be to work with the inference API or HF spaces (which probably has more flexibility for these cases).

@julien-c
Copy link
Member

I'll ask @Narsil to share his opinion on this, from an endpoint point of view it makes more sense to only return one imo

Personally I'd like widgets to be a 1:1 representation of the underlying API call, so if we want to display multiple generated images in one click, I'd be in strong favor of doing it at the API level

(for text generation, we do have an API param to generate multiple continuations, no?)

@borisdayma
Copy link
Contributor

It depends on what you wanna do exactly.
For example for our demo we generate a bunch of images and then display only the best ones (8 over 64 or 128).

@julien-c
Copy link
Member

Yep in that case Spaces are more flexible. (whereas widgets are meant to be the canonical 1:1 representation of a model's interface)

@mishig25
Copy link
Contributor

mishig25 commented Jul 20, 2021

(can confirm: for text generation, num_return_sequences param exists for multiple continuations)

@osanseviero
Copy link
Member Author

Closing this issue since now there's a widget!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend good first issue Good for newcomers widgets About our Inference widgets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants