-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix Issue #1197 since 2022] Support pre-trained openai/guided-diffusion (ADM) with minimal code change #6730
base: main
Are you sure you want to change the base?
Conversation
Some other samples using the converted model with diffusers:
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_cat_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_ffhq_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_horse_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_bedroom_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png") |
Thanks very much for your work on this. I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature. So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible. @patrickvonplaten @yiyixuxu WDYT here? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them. I have tried to use diffuser attention but the model produces garbage images like (suppose to be bedroom): |
Can we try to maybe find a way to port the legacy attention to the one that's used now? |
Sorry, I did not quite get what you mean by "port". Did you mean to create a separate class like legacy attention, and use arguement like attention_type? |
See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192 |
In fact, the part you refer to is about model conversion only, and I have already done it by calling the code of https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_consistency_to_diffusers.py#L143. In this way, we can indeed unify the model weights of openai/guideddiffusion and diffusers. However, it has nothing to do with legacy / non legacy attention order. It is purely about the parameterization of linear layers. However, what can not be avoided is the run time difference between legacy and non legacy attention. The "qkv, q, k, v" you are referring to are model weights, the "qkv, q, k, v" I am referring to are activation tensors. They are different stuffs with different shape. |
In openai/guideddiffusion, both normal attention and legacy attention are implemented in separate class:
Those two attentions have exactly the same model weights, and the two classes have no parameters at all. So it has nothing to do with clever tricks in model conversion. This has to be solved in runtime. |
Hi @sayakpaul, any further comments? |
output_scale_factor: float = 1.0, | ||
): | ||
super().__init__() | ||
resnet_groups = resnet_groups if resnet_groups is not None else min(in_channels // 4, 32) | ||
self.add_attention = add_attention | ||
|
||
if attn_groups is None: | ||
attn_groups = resnet_groups if resnet_time_scale_shift == "default" else None | ||
attn_groups = None if resnet_time_scale_shift == "spatial" else resnet_groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will break for resnet_time_scale_shift = "scale_shift"
, no? Even though the doc string says only accept "default"
and "spatial"
I'm not sure if no model has configured with resnet_time_scale_shift = scale_shift
I want to know your thoughts here @sayakpaul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case is that when actually resnet_time_scale_shift = "scale_shift", the original code will produce attn_groups = None, which nullify the arguement resnet_groups. And this behaviour of UNetMidBlock2D is in-consistent with other up and down classes such as AttnDownBlock2D and ResnetDownsampleBlock2D. The incoming code will produce attn_groups = resnet groups, which make the attention layer of mid block consistent with up and down block.
I am just following the doc string here. But I think it is better to have mid block behaviour consistent with up and down block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe come back with a better condition that doesn't leave any grounds for a potentially backward-breaking change? I agree with @yiyixuxu here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe come back with a better condition that doesn't leave any grounds for a potentially backward-breaking change? I agree with @yiyixuxu here.
Cool, let me see if I can solve it in some other way. Would you prefer the current solution, or a slightly more ugly one like a new class of UNetMiddleBlock? In fact, I can easy make it free of backward-breaking, but I might introduce more ugly additional blocks with more code.
Hi, I have found a way to avoid breaking the possible backward deps in class UNetMidBlock2D and updated the PR. The change is still minimal but it does not break anything. I would love your advices @yiyixuxu @sayakpaul. |
"ResnetUpsampleBlock2D", | ||
"ResnetUpsampleBlock2D", | ||
], | ||
"resnet_time_scale_shift": "scale_shift", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show me the codepath that becomes effective when resnet_time_scale_shift == "scale_shift"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convert_adm_to_diffusers.py is just calling functions of con_pt_to_diffuser function in convert_consistency_to_diffusers.py.
First, the resnet_time_scale_shift == "scale_shift" is not a new option set by this PR.
Setting resnet_time_scale_shift == "scale_shift" will pass the argument though UNet2DModel. Through init of UNet2DModel, it will be pass into get_down_block, UNetMidBlock2D and get_up_block, and will be further passed into each building blocks such as ResnetDownsampleBlock2D, ResnetUpsampleBlock2D, AttnDownBlock2D, AttnUpBlock2D. The eventual effect of resnet_time_scale_shift == "scale_shift" will set the class ResnetBlock2D's time_embedding_norm == "scale_shift". And this option effects the resnet's time embedding's shape https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py#L283, and the behaviour of time embedding https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py#L378.
It has no special effect on UNetMidBlock2D, as I circumvent this problem in https://github.com/tongdaxu/diffusers/blob/main/src/diffusers/models/unets/unet_2d.py#L200.
The resnet_time_scale_shift == "scale_shift" is necessary in model conversion script as the resnet's time embedding's input shape is doubled with resnet_time_scale_shift == "scale_shift".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really lean. I left a question.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Let's add this model to the research_projects folder no? It's a bit too outdated to be in core diffusers I'd say (cc @yiyixuxu) |
@tongdaxu |
I am fine with that, what should I do to move this to research folder? |
You can follow the structure of https://github.com/huggingface/diffusers/tree/main/examples/research_projects/controlnetxs as an example. Here's what you could consider. Have all the conversion script, modeling and pipeline files under a folder and make sure they work. |
Hi @tongdaxu, thank you for your great work! I am having trouble generating nice images with your PR. I hope you can help :) I installed the PR as follows:
Then i ran: from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png") My generated images look like this: Did I install the PR wrong or is there a bug? |
I am out of office until 17 Feb, would you like to try other models first? And the hugging face model hub has been updated since the middle of commit. Are you using the latest model? |
Sorry I do not have access to GPU for now. But the instructions in https://github.com/tongdaxu/InverseDiffusion should work. Would you like to give it a try? |
Hi, I just ran a small test with tongdaxu@111eac1. It seems to be ok. I am not sure what is happening here. And I might need more time figuring it out when I am back to office after 17 Feb. I can only run some small sanity check for now. And all I can say is that with the commit above and the imagenet model (I check the hash sum), the sampling should be fine. I am not sure if there can be some dependency problem (I am using torch 2.1.0). I might need to go back to office for more testing. Thanks for pointing it out and for your patience. |
Thank you @kschwethelm, that is very strange. I also find that it fails with torch 1.9 and works with torch 2.1. I do not have a clue about why it fails. Have you figured out why? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Not stale. @yiyixuxu WDYT? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
how to finetune with the adm pretrained model |
What does this PR do?
TLDR: This PR includes openai/guided-diffusion pre-trained models into diffusers, with a diverse family of pre-trained openai/guided-diffusion models, including pre-trained model with ImageNet, LSUN bedroom, LSUN cat, LSUN horse and FFHQ dataset.
The openai/guided-diffusion pre-trained models (ADM unconditional 256x256) are used by academic community a lot (e.g. DPS ICLR 2023: https://github.com/DPS2022/diffusion-posterior-sampling, FreeDOM ICCV 2023: https://github.com/vvictoryuki/FreeDoM?tab=readme-ov-file). However, it is not supported in huggingface/diffusers.
This issue has been raised as early as 2022: #1197 but left unsolved. As it is indeed quite complicated.
I make changes to UNet2DModel as minimal as possible, to make it
so that we can play openai/guided-diffusion pretrained model as any standard unconditional diffusion model (https://huggingface.co/docs/diffusers/using-diffusers/unconditional_image_generation).
Those changes includes:
I have been very careful not to break any existing code, and make the new code as short as possible.
I have provided a script to convert pre-trained openai/guided-diffusion to huggingface compatible model, in https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_adm_to_diffusers.py.
I have also provide my conversion of models with configs. Those conversions have mean absolute error$~5e-5$ , and relative absolute error $~6e-5$ , when the input noise is the same. As the error is minimal, the model and conversion is correct. The complete list of converted models is:
Now we can sample from the pre-trained models of openai/guided-diffusion, using diffusers in an out of box way.
And the result is as good as the original openai/guided-diffusion model:
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@patrickvonplaten @yiyixuxu and @sayakpaul