-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [Core] Add FreeU mechanism #5164
Conversation
Code: from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-512")
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
pipe.unet.freeu.enable = True
steps = 50
pipe.unet.freeu.sd21()
def cb(step, _, __):
if step == int(steps * 0.5):
pipe.unet.freeu.ones()
generator = torch.manual_seed(42)
output = pipe(
prompt="A photo of a man",
generator=generator,
height=512,
width=512,
callback=cb,
) SD2.1: |
Thank you for this implementation. Do you think it'll work for ex. SD XL and Zeroscope XL too? |
we are going to want to anneal this over the given timestep schedule so the user can decide when the effect should end. it seems mostly beneficial for low frequency features during the initial stages of inference and not for the tail that adds the fine details. you didnt add any parameters to control it? or i might've overlooked that |
additionally, you can use |
RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float I solved the error and added float16 support. It now runs 10 seconds faster. |
It supports SD-XL model. Does the Zeroscope model have a pipeline? |
I added the parameters manually. The Diffusers team can fix this. I don't know the Unet structure. |
Hi @kadirnar, I adapted the solution of FreeU in my local repo and there is a slight difference that makes me think your solution is not exactly what the paper provides, take into account that if you do this:
You are applying in the same tensor the FFT and the mask mult, I was curious if it works and retrieves an image comparatively good to the vanilla one. The idea is to apply the mask over the sample, and the FFT over the sample from DownSample (image below), I would suggest you apply to every kind of UpBlock inside |
This is how Zeroscope +XL is used in Diffusers: https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/text_to_video#cerspensezeroscopev2576w-cerspensezeroscopev2xl |
Can you share your sample code? |
@sayakpaul can you implement the timestep annealing here? <3 |
Also, what if we do not want to use FreeU? I think it is better to be optional, Do you think something like this would help? maybe a community pipeline?
|
the optional parameters should be implemented, they're just not at this current stage of the PR. if it is annealed over the process, you could define a beginning and end range for when it is applied during the pipeline's process. alternatively i guess you could just return the noisy latents and pass them into the pipeline again continuing where it left off, with the parameters disabled then. idk. i think having it built into the single shot process is much cleaner. |
I'm excited to see this integrated so I can stop having a full copy of the unet file. Converting the samples to a different data type for each invocation isn't ideal. It's only when you're using dimensions that aren't powers of 2 that you need to convert the tensors. I only convert the tensors to another type when the dimensions aren't pow2. It's quite a bit faster for pow2 dimensions on my 3090. def Fourier_filter(x_in, threshold, scale):
x = x_in
B, C, H, W = x.shape
# Non-power of 2 images must be float32
if (W & (W - 1)) != 0 or (H & (H - 1)) != 0:
x = x.to(dtype=torch.float32)
# FFT
x_freq = fft.fftn(x, dim=(-2, -1))
x_freq = fft.fftshift(x_freq, dim=(-2, -1))
B, C, H, W = x_freq.shape
mask = torch.ones((B, C, H, W), device=x.device)
crow, ccol = H // 2, W // 2
mask[..., crow - threshold : crow + threshold, ccol - threshold : ccol + threshold] = scale
x_freq = x_freq * mask
# IFFT
x_freq = fft.ifftshift(x_freq, dim=(-2, -1))
x_filtered = fft.ifftn(x_freq, dim=(-2, -1)).real
return x_filtered.to(dtype=x_in.dtype) Also, the application appears to be wrong as @JorgeAV-ai points out. You'll need to adapt it to how the Diffusers library wants to configure it. For my case, I define a dataclass and manipulate it in my callback functions: @dataclass
class UNetFreeUConfig:
enabled: bool = True
s1: float = 1.0
s2: float = 1.0
b1: float = 1.0
b2: float = 1.0
def sd21(self):
"""Set the default weighting values for SD2.1 suggested by the paper authors"""
self.s1 = 0.9
self.s2 = 0.2
self.b1 = 1.1
self.b2 = 1.2
def ones(self):
"""Set all ones to disable the FreeU adaptation"""
self.s1 = 1.0
self.s2 = 1.0
self.b1 = 1.0
self.b2 = 1.0 Then I instantiate it in the UNet class init: self.freeu = UNetFreeUConfig() And I set the desired config before using the pipe. Here I switch back to all 1 weights half way through the diffusion process, because with my fine-tuned model the results are much worse if I leave it enabled for the whole diffusion process. steps = 30
pipe.unet.freeu.sd21()
def cb(step, _, __):
if step == int(steps * 0.5):
pipe.unet.freeu.ones()
output = pipe(prompt, num_inference_steps=steps, callback=cb) |
@justindujardin |
Super cool to get this started @kadirnar, can't wait to play around with it - it's a really neat idea :-) |
@patrickvonplaten There is also this project: https://github.com/lyn-rgb/FreeU_Diffusers |
@kadirnar thanks for kicking this off! It's indeed going to be very useful. To be able to merge this PR, we first need to settle on an API design that respects
Once these things have been addressed, we can write a thorough doc page about it. To help you get started with this a bit, I prepared this dummy PR (#5186) which you could refer to if need be. Let me know :-) |
Okay so after discussing with @patrickvonplaten internally, we mutually agreed that enabling FreeU support via a I updated #5186 to reflect that. @kadirnar let me know if you have any questions :-) |
Hi, thank you for your feedback and help. I'm very busy but I will update tomorrow evening👍🏻 |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Gonna merge after the CI runs fully. As discussed with @DN6 over Slack, gonna merge without the SDXL slow test as it needs its own slow test suite. |
Failing test is unrelated. |
Thanks @kadirnar for kickstarting this! |
Cool. Thank you! Is FreeU for video/Zeroscope planned too? |
Great 🚀 I will test it soon ⭐ |
It is available. I added it in this PR itself :-) @tin2tin |
is FreeU works for inpainting too? |
Hi @justindujardin, nice work! But this is still unclear to me how I can use this. Could you share the code or the details? |
Hey, @justindujardin will there be FreeU for SimpleCrossAttnUpBlock2D? It seems that it doesn't work for now due to shape incompatibility |
Hi, not sure you'd call this a bug but disabling FreeU if its not been enabled yet gives an error.
|
Please open a new issue with a fully reproducible snippet. |
#5544 raised |
* ✨ Added Fourier filter function to upsample blocks * 🔧 Update Fourier_filter for float16 support * ✨ Added UNetFreeUConfig to UNet model for FreeU adaptation 🛠️ * move unet to its original form and add fourier_filter to torch_utils. * implement freeU enable mechanism * implement disable mechanism * resolution index. * correct resolution idx condition. * fix copies. * no need to use resolution_idx in vae. * spell out the kwargs * proper config property * fix attribution setting * place unet hasattr properly. * fix: attribute access. * proper disable * remove validation method. * debug * debug * debug * debug * debug * debug * potential fix. * add: doc. * fix copies * add: tests. * add: support freeU in SDXL. * set default value of resolution idx. * set default values for resolution_idx. * fix copies * fix rest. * fix copies * address PR comments. * run fix-copies * move apply_free_u to utils and other minors. * introduce support for video (unet3D) * minor ups * consistent fix-copies. * consistent stuff * fix-copies * add: rest * add: docs. * fix: tests * fix: doc path * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * style up * move to techniques. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for video with freeu * add: slow test for video with freeu * add: slow test for video with freeu * style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* ✨ Added Fourier filter function to upsample blocks * 🔧 Update Fourier_filter for float16 support * ✨ Added UNetFreeUConfig to UNet model for FreeU adaptation 🛠️ * move unet to its original form and add fourier_filter to torch_utils. * implement freeU enable mechanism * implement disable mechanism * resolution index. * correct resolution idx condition. * fix copies. * no need to use resolution_idx in vae. * spell out the kwargs * proper config property * fix attribution setting * place unet hasattr properly. * fix: attribute access. * proper disable * remove validation method. * debug * debug * debug * debug * debug * debug * potential fix. * add: doc. * fix copies * add: tests. * add: support freeU in SDXL. * set default value of resolution idx. * set default values for resolution_idx. * fix copies * fix rest. * fix copies * address PR comments. * run fix-copies * move apply_free_u to utils and other minors. * introduce support for video (unet3D) * minor ups * consistent fix-copies. * consistent stuff * fix-copies * add: rest * add: docs. * fix: tests * fix: doc path * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * style up * move to techniques. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for sd freeu. * add: slow test for video with freeu * add: slow test for video with freeu * add: slow test for video with freeu * style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@patrickvonplaten and @sayakpaul
FreeU: https://github.com/ChenyangSi/FreeU
#5153