Widget for text to image generation #113

osanseviero · 2021-06-16T14:45:38Z

Input would be text
The output would be the image

mishig25 · 2021-06-21T09:48:54Z

More than happy to take this one.
@osanseviero @julien-c please let me know if there's anything I need to improve in terms of the process I have taken for huggingface/hub-docs#14 and huggingface/hub-docs#15
Otherwise, I'll follow similar approach and implement other widgets

osanseviero · 2021-06-21T10:10:30Z

You're going full speed! ⚡

The approach has been great so far, I think you can go ahead with the widget.

mishig25 · 2021-06-21T10:16:41Z

Awesome! 🤗

mishig25 · 2021-06-21T12:57:09Z

Created draft PR #131

osanseviero · 2021-06-24T09:54:21Z

Some example motivation for this :)
https://discuss.huggingface.co/t/generate-gif-reply-to-english-text-with-vqgan-clip/7166

patil-suraj · 2021-06-24T10:58:21Z

Some example motivation for this :)
https://discuss.huggingface.co/t/generate-gif-reply-to-english-text-with-vqgan-clip/7166

And this as well huggingface/transformers#12281 😉

borisdayma · 2021-07-15T22:19:51Z

Hi, I'm interested in this widget.
What's the progress?

osanseviero · 2021-07-16T06:39:05Z

There is an existing PR - #131 but at the moment this is not integrated to the Inference API (since there is no transformers pipeline for this nor in the other libraries). If you want to build a demo for this, consider using Spaces instead for the time being.

osanseviero · 2021-07-16T06:39:37Z

Btw, if you have model repos with a text to image generation model, I would be interested if you share them + some example on how to do inference.

borisdayma · 2021-07-16T18:38:27Z

Would it make sense to allow custom pipelines?
We're working on a mini DALL-E repo so we'll need something completely custom

osanseviero · 2021-07-16T20:17:59Z

Yes, having a generic docker image to support custom inference is in our roadmap. We will require a very specific input/output requirement for the API to work, but other than that it will be up to the users to do their own implementations, including adding their custom dependencies.

FYI @Narsil @julien-c

julien-c · 2021-07-17T08:44:23Z

They can already submit a PR against https://github.com/huggingface/huggingface_hub/tree/main/api-inference-community though no?

osanseviero · 2021-07-17T16:15:13Z

@julien-c you mean in the form of a new docker image just for this use case?

julien-c · 2021-07-19T07:16:58Z

Yes – the boundary between single model and "library" being rather slim as we've seen in other cases, wouldn't mini-dall-e be a good image name in api-inference-community?
The image can also always support different checkpoints down the line. WDYT?

mishig25 · 2021-07-19T13:37:29Z

Should we change the API output: from image to an array of images?

That way, we can show multiple generated images from a single text prompt, as it's currently done in most papers/demos/tweets. (See attached an image from @borisdayma's tweet)

If the output changes to an array of images, I'm thinking that the widget would involve some kind of horizontal scrolling/switching between different images

osanseviero · 2021-07-19T18:03:16Z

Yes – the boundary between single model and "library" being rather slim as we've seen in other cases, wouldn't mini-dall-e be a good image name in api-inference-community?

If this is a short-time solution, let's do it! @borisdayma, feel free to open a PR about this.

For one-shot cases, instead of having to maintain a very large number of images, I would prefer to go down the generic image path so we don't end up with hundreds of images. I think the us from the future will thank us going that road. I can work on adding the generic docker image the first week of August (in two weeks from now).

Should we change the API output: from image to an array of images?

That way, we can show multiple generated images from a single text prompt, as it's currently done in most papers/demos/tweets. (See attached an image from @borisdayma's tweet)

I'll ask @Narsil to share his opinion on this, from an endpoint point of view it makes more sense to only return one imo. Alternatively users could maybe specify number of generated images as a param, but maybe it's just better to make multiple calls to the API

borisdayma · 2021-07-20T05:17:01Z

I'm not sure if the best here would be to work with the inference API or HF spaces (which probably has more flexibility for these cases).

julien-c · 2021-07-20T06:42:34Z

I'll ask @Narsil to share his opinion on this, from an endpoint point of view it makes more sense to only return one imo

Personally I'd like widgets to be a 1:1 representation of the underlying API call, so if we want to display multiple generated images in one click, I'd be in strong favor of doing it at the API level

(for text generation, we do have an API param to generate multiple continuations, no?)

borisdayma · 2021-07-20T07:37:17Z

It depends on what you wanna do exactly.
For example for our demo we generate a bunch of images and then display only the best ones (8 over 64 or 128).

julien-c · 2021-07-20T07:38:41Z

Yep in that case Spaces are more flexible. (whereas widgets are meant to be the canonical 1:1 representation of a model's interface)

mishig25 · 2021-07-20T07:44:45Z

(can confirm: for text generation, num_return_sequences param exists for multiple continuations)

osanseviero · 2021-09-09T08:10:50Z

Closing this issue since now there's a widget!

osanseviero added good first issue Good for newcomers widgets About our Inference widgets frontend labels Jun 16, 2021

osanseviero assigned mishig25 Jun 21, 2021

osanseviero linked a pull request Jun 21, 2021 that will close this issue

Widget text-to-image #131

Closed

mishig25 mentioned this issue Jul 19, 2021

Widget text-to-image #131

Closed

osanseviero mentioned this issue Aug 2, 2021

Add validation for text to image in Inference API #246

Merged

osanseviero closed this as completed Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Widget for text to image generation #113

Widget for text to image generation #113

osanseviero commented Jun 16, 2021

mishig25 commented Jun 21, 2021

osanseviero commented Jun 21, 2021

mishig25 commented Jun 21, 2021

mishig25 commented Jun 21, 2021

osanseviero commented Jun 24, 2021

patil-suraj commented Jun 24, 2021

borisdayma commented Jul 15, 2021

osanseviero commented Jul 16, 2021

osanseviero commented Jul 16, 2021

borisdayma commented Jul 16, 2021

osanseviero commented Jul 16, 2021

julien-c commented Jul 17, 2021

osanseviero commented Jul 17, 2021

julien-c commented Jul 19, 2021

mishig25 commented Jul 19, 2021

osanseviero commented Jul 19, 2021

borisdayma commented Jul 20, 2021

julien-c commented Jul 20, 2021

borisdayma commented Jul 20, 2021

julien-c commented Jul 20, 2021

mishig25 commented Jul 20, 2021 •

edited

Loading

osanseviero commented Sep 9, 2021

Widget for text to image generation #113

Widget for text to image generation #113

Comments

osanseviero commented Jun 16, 2021

mishig25 commented Jun 21, 2021

osanseviero commented Jun 21, 2021

mishig25 commented Jun 21, 2021

mishig25 commented Jun 21, 2021

osanseviero commented Jun 24, 2021

patil-suraj commented Jun 24, 2021

borisdayma commented Jul 15, 2021

osanseviero commented Jul 16, 2021

osanseviero commented Jul 16, 2021

borisdayma commented Jul 16, 2021

osanseviero commented Jul 16, 2021

julien-c commented Jul 17, 2021

osanseviero commented Jul 17, 2021

julien-c commented Jul 19, 2021

mishig25 commented Jul 19, 2021

osanseviero commented Jul 19, 2021

borisdayma commented Jul 20, 2021

julien-c commented Jul 20, 2021

borisdayma commented Jul 20, 2021

julien-c commented Jul 20, 2021

mishig25 commented Jul 20, 2021 • edited Loading

osanseviero commented Sep 9, 2021

mishig25 commented Jul 20, 2021 •

edited

Loading