Any plans for a diffusers version? #3

tonyf · 2023-06-06T20:25:09Z

Hey guys, this paper looks great. Really excited to see the full training code. Was curious-- do you had any plans to make a diffusers port?

haoosz · 2023-06-07T05:02:16Z

Yeah. We will make diffusers version after all the work is done. Thank you.

tonyf · 2023-06-07T16:21:17Z

Amazing! Looking forward to seeing it. Just curious-- is there an expected timeline for the diffusers version? Debating whether to implement it myself

haoosz · 2023-06-08T13:27:04Z

Sorry, but I am occupied by the following work and might not work on the diffuser version right now. I will work on the diffuser version in August. If it is too late for you, I am very glad you can implement by yourself. Thank you!

okaris · 2023-06-21T07:55:29Z

I am working on this.

garychan22 · 2023-07-13T12:22:37Z

I have finished the diffusers version but simply feeding the reference image to the frozen unet and doing the otsu is low-speed, which is weird. hahaha

okaris · 2023-07-13T12:50:25Z

@garychan22 I've also recently finished it and have been working on getting the hyperparams to fit my needs. otsu itself is the bottleneck, the point of having it is to escape the need of preprocessing, but if you are already doing that a manually supplied mask could also help and speed it up. Other than that this repo is not taking advantage of higher performance attention processors, which you can't use for the attention calculations where you need to extract the scores. But it's possible to use xformers or pytorch's scaled_dot_product_attention for faster calculations.

Were you able to replicate the results exactly like the samples here?

okaris · 2023-07-13T13:28:10Z

Also if you would like to submit a PR, here is my issue: huggingface/diffusers#3719

garychan22 · 2023-07-14T04:03:31Z

@garychan22 I've also recently finished it and have been working on getting the hyperparams to fit my needs. otsu itself is the bottleneck, the point of having it is to escape the need of preprocessing, but if you are already doing that a manually supplied mask could also help and speed it up. Other than that this repo is not taking advantage of higher performance attention processors, which you can't use for the attention calculations where you need to extract the scores. But it's possible to use xformers or pytorch's scaled_dot_product_attention for faster calculations.

Were you able to replicate the results exactly like the samples here?

Thanks for the useful tips here! For now, I have not replicated the similar results as this repo and I will keep working on this.

Moreover, I have been training my own blip-diffusion, finding that better results to dreambooth can be achieved within one-minute fine-tuning, which is awesome. Hope to replicate the results as shown in the paper and release the pre-trained model to the hub soon.

haoosz closed this as completed Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans for a diffusers version? #3

Any plans for a diffusers version? #3

tonyf commented Jun 6, 2023

haoosz commented Jun 7, 2023

tonyf commented Jun 7, 2023

haoosz commented Jun 8, 2023

okaris commented Jun 21, 2023 •

edited

Loading

garychan22 commented Jul 13, 2023

okaris commented Jul 13, 2023

okaris commented Jul 13, 2023

garychan22 commented Jul 14, 2023

Any plans for a diffusers version? #3

Any plans for a diffusers version? #3

Comments

tonyf commented Jun 6, 2023

haoosz commented Jun 7, 2023

tonyf commented Jun 7, 2023

haoosz commented Jun 8, 2023

okaris commented Jun 21, 2023 • edited Loading

garychan22 commented Jul 13, 2023

okaris commented Jul 13, 2023

okaris commented Jul 13, 2023

garychan22 commented Jul 14, 2023

okaris commented Jun 21, 2023 •

edited

Loading