Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [Core] Add FreeU mechanism #5164

Merged
merged 63 commits into from
Oct 5, 2023
Merged

✨ [Core] Add FreeU mechanism #5164

merged 63 commits into from
Oct 5, 2023

Conversation

kadirnar
Copy link
Contributor

@kadirnar
Copy link
Contributor Author

kadirnar commented Sep 24, 2023

Code:

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-512")
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
pipe.unet.freeu.enable = True

steps = 50
pipe.unet.freeu.sd21()
def cb(step, _, __):
    if step == int(steps * 0.5):
        pipe.unet.freeu.ones()
      
generator = torch.manual_seed(42)  
output = pipe(
    prompt="A photo of a man",
    generator=generator,
    height=512,
    width=512,
    callback=cb,
)  

SD2.1:
https://huggingface.co/datasets/kadirnar/diffusers_readme_images/blob/main/normal.png
SD2.1 + FreeU:
https://huggingface.co/datasets/kadirnar/diffusers_readme_images/blob/main/output.png

@tin2tin
Copy link

tin2tin commented Sep 24, 2023

Thank you for this implementation. Do you think it'll work for ex. SD XL and Zeroscope XL too?

@bghira
Copy link
Contributor

bghira commented Sep 24, 2023

we are going to want to anneal this over the given timestep schedule so the user can decide when the effect should end.

it seems mostly beneficial for low frequency features during the initial stages of inference and not for the tail that adds the fine details.

you didnt add any parameters to control it? or i might've overlooked that

@bghira
Copy link
Contributor

bghira commented Sep 24, 2023

additionally, you can use ptx0/pseudo-flex-base as a terminal snr 1024px test for sd 2.1 using some of the test prompts and parameters it has on its model card

@kadirnar
Copy link
Contributor Author

RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float

I solved the error and added float16 support. It now runs 10 seconds faster.

@kadirnar
Copy link
Contributor Author

Thank you for this implementation. Do you think it'll work for ex. SD XL and Zeroscope XL too?

It supports SD-XL model. Does the Zeroscope model have a pipeline?

@kadirnar
Copy link
Contributor Author

we are going to want to anneal this over the given timestep schedule so the user can decide when the effect should end.

it seems mostly beneficial for low frequency features during the initial stages of inference and not for the tail that adds the fine details.

you didnt add any parameters to control it? or i might've overlooked that

I added the parameters manually. The Diffusers team can fix this. I don't know the Unet structure.

@JorgeAV-ai
Copy link

Hi @kadirnar, I adapted the solution of FreeU in my local repo and there is a slight difference that makes me think your solution is not exactly what the paper provides, take into account that if you do this:

if sample.shape[1] == 1280:
       sample[:, :640] *= 1.1 # For SD2.1
       sample = Fourier_filter(sample, threshold=1, scale=0.9)

You are applying in the same tensor the FFT and the mask mult, I was curious if it works and retrieves an image comparatively good to the vanilla one.

The idea is to apply the mask over the sample, and the FFT over the sample from DownSample (image below), I would suggest you apply to every kind of UpBlock inside unet_2d_block.py the code provided in the FreeU original GitHub 😄
image

@tin2tin
Copy link

tin2tin commented Sep 24, 2023

Thank you for this implementation. Do you think it'll work for ex. SD XL and Zeroscope XL too?

It supports SD-XL model. Does the Zeroscope model have a pipeline?

This is how Zeroscope +XL is used in Diffusers: https://huggingface.co/docs/diffusers/v0.21.0/en/api/pipelines/text_to_video#cerspensezeroscopev2576w-cerspensezeroscopev2xl

@kadirnar
Copy link
Contributor Author

Hi @kadirnar, I adapted the solution of FreeU in my local repo and there is a slight difference that makes me think your solution is not exactly what the paper provides, take into account that if you do this:

if sample.shape[1] == 1280:
       sample[:, :640] *= 1.1 # For SD2.1
       sample = Fourier_filter(sample, threshold=1, scale=0.9)

You are applying in the same tensor the FFT and the mask mult, I was curious if it works and retrieves an image comparatively good to the vanilla one.

The idea is to apply the mask over the sample, and the FFT over the sample from DownSample (image below), I would suggest you apply to every kind of UpBlock inside unet_2d_block.py the code provided in the FreeU original GitHub 😄 image

Can you share your sample code?

@bghira
Copy link
Contributor

bghira commented Sep 24, 2023

@sayakpaul can you implement the timestep annealing here? <3

@adhikjoshi
Copy link

Also, what if we do not want to use FreeU?

I think it is better to be optional,

Do you think something like this would help? maybe a community pipeline?


def apply_FreeU(pipe):
      # Code

pipe = apply_FreeU(pipe)

@patrickvonplaten

@bghira
Copy link
Contributor

bghira commented Sep 24, 2023

the optional parameters should be implemented, they're just not at this current stage of the PR. if it is annealed over the process, you could define a beginning and end range for when it is applied during the pipeline's process.

alternatively i guess you could just return the noisy latents and pass them into the pipeline again continuing where it left off, with the parameters disabled then. idk. i think having it built into the single shot process is much cleaner.

@justindujardin
Copy link

justindujardin commented Sep 24, 2023

I'm excited to see this integrated so I can stop having a full copy of the unet file.

Converting the samples to a different data type for each invocation isn't ideal. It's only when you're using dimensions that aren't powers of 2 that you need to convert the tensors.

I only convert the tensors to another type when the dimensions aren't pow2. It's quite a bit faster for pow2 dimensions on my 3090.

def Fourier_filter(x_in, threshold, scale):
    x = x_in
    B, C, H, W = x.shape

    # Non-power of 2 images must be float32
    if (W & (W - 1)) != 0 or (H & (H - 1)) != 0:
        x = x.to(dtype=torch.float32)

    # FFT
    x_freq = fft.fftn(x, dim=(-2, -1))
    x_freq = fft.fftshift(x_freq, dim=(-2, -1))

    B, C, H, W = x_freq.shape
    mask = torch.ones((B, C, H, W), device=x.device)

    crow, ccol = H // 2, W // 2
    mask[..., crow - threshold : crow + threshold, ccol - threshold : ccol + threshold] = scale
    x_freq = x_freq * mask

    # IFFT
    x_freq = fft.ifftshift(x_freq, dim=(-2, -1))
    x_filtered = fft.ifftn(x_freq, dim=(-2, -1)).real

    return x_filtered.to(dtype=x_in.dtype)

Also, the application appears to be wrong as @JorgeAV-ai points out.

You'll need to adapt it to how the Diffusers library wants to configure it. For my case, I define a dataclass and manipulate it in my callback functions:

@dataclass
class UNetFreeUConfig:
    enabled: bool = True
    s1: float = 1.0
    s2: float = 1.0
    b1: float = 1.0
    b2: float = 1.0

    def sd21(self):
        """Set the default weighting values for SD2.1 suggested by the paper authors"""
        self.s1 = 0.9
        self.s2 = 0.2
        self.b1 = 1.1
        self.b2 = 1.2

    def ones(self):
        """Set all ones to disable the FreeU adaptation"""
        self.s1 = 1.0
        self.s2 = 1.0
        self.b1 = 1.0
        self.b2 = 1.0

Then I instantiate it in the UNet class init:

self.freeu = UNetFreeUConfig()

And I set the desired config before using the pipe. Here I switch back to all 1 weights half way through the diffusion process, because with my fine-tuned model the results are much worse if I leave it enabled for the whole diffusion process.

steps = 30
pipe.unet.freeu.sd21()
def cb(step, _, __):
    if step == int(steps * 0.5):
        pipe.unet.freeu.ones()
output = pipe(prompt, num_inference_steps=steps, callback=cb)

freeu_partial_mario

@kadirnar
Copy link
Contributor Author

kadirnar commented Sep 24, 2023

@justindujardin
Thank you for your contributions. Open source ❤️

@patrickvonplaten
Copy link
Contributor

Super cool to get this started @kadirnar, can't wait to play around with it - it's a really neat idea :-)

@tin2tin
Copy link

tin2tin commented Sep 25, 2023

@patrickvonplaten There is also this project: https://github.com/lyn-rgb/FreeU_Diffusers

@sayakpaul
Copy link
Member

@kadirnar thanks for kicking this off! It's indeed going to be very useful.

To be able to merge this PR, we first need to settle on an API design that respects diffusers design philosophy. In order for us to get there, let's maybe do the following:

Once these things have been addressed, we can write a thorough doc page about it.

To help you get started with this a bit, I prepared this dummy PR (#5186) which you could refer to if need be.

Let me know :-)

@sayakpaul
Copy link
Member

Okay so after discussing with @patrickvonplaten internally, we mutually agreed that enabling FreeU support via a enable() & disable() paradigm (muck akin to how we have it for xformers, for example) is a better idea.

I updated #5186 to reflect that.

@kadirnar let me know if you have any questions :-)

@kadirnar
Copy link
Contributor Author

kadirnar commented Sep 26, 2023

Okay so after discussing with @patrickvonplaten internally, we mutually agreed that enabling FreeU support via a enable() & disable() paradigm (muck akin to how we have it for xformers, for example) is a better idea.

I updated #5186 to reflect that.

@kadirnar let me know if you have any questions :-)

Hi, thank you for your feedback and help. I'm very busy but I will update tomorrow evening👍🏻

@sayakpaul
Copy link
Member

Gonna merge after the CI runs fully.

As discussed with @DN6 over Slack, gonna merge without the SDXL slow test as it needs its own slow test suite.

@sayakpaul
Copy link
Member

Failing test is unrelated.

@sayakpaul sayakpaul merged commit 84b82a6 into huggingface:main Oct 5, 2023
10 of 11 checks passed
@sayakpaul
Copy link
Member

Thanks @kadirnar for kickstarting this!

@tin2tin
Copy link

tin2tin commented Oct 5, 2023

Cool. Thank you! Is FreeU for video/Zeroscope planned too?

@kadirnar
Copy link
Contributor Author

kadirnar commented Oct 5, 2023

Great 🚀 I will test it soon ⭐

@kadirnar kadirnar deleted the add-freeU branch October 5, 2023 08:46
@sayakpaul
Copy link
Member

Cool. Thank you! Is FreeU for video/Zeroscope planned too?

It is available. I added it in this PR itself :-) @tin2tin

@ZhouXiner
Copy link

is FreeU works for inpainting too?

@SkylerZheng
Copy link

I'm excited to see this integrated so I can stop having a full copy of the unet file.

Converting the samples to a different data type for each invocation isn't ideal. It's only when you're using dimensions that aren't powers of 2 that you need to convert the tensors.

I only convert the tensors to another type when the dimensions aren't pow2. It's quite a bit faster for pow2 dimensions on my 3090.

def Fourier_filter(x_in, threshold, scale):
    x = x_in
    B, C, H, W = x.shape

    # Non-power of 2 images must be float32
    if (W & (W - 1)) != 0 or (H & (H - 1)) != 0:
        x = x.to(dtype=torch.float32)

    # FFT
    x_freq = fft.fftn(x, dim=(-2, -1))
    x_freq = fft.fftshift(x_freq, dim=(-2, -1))

    B, C, H, W = x_freq.shape
    mask = torch.ones((B, C, H, W), device=x.device)

    crow, ccol = H // 2, W // 2
    mask[..., crow - threshold : crow + threshold, ccol - threshold : ccol + threshold] = scale
    x_freq = x_freq * mask

    # IFFT
    x_freq = fft.ifftshift(x_freq, dim=(-2, -1))
    x_filtered = fft.ifftn(x_freq, dim=(-2, -1)).real

    return x_filtered.to(dtype=x_in.dtype)

Also, the application appears to be wrong as @JorgeAV-ai points out.

You'll need to adapt it to how the Diffusers library wants to configure it. For my case, I define a dataclass and manipulate it in my callback functions:

@dataclass
class UNetFreeUConfig:
    enabled: bool = True
    s1: float = 1.0
    s2: float = 1.0
    b1: float = 1.0
    b2: float = 1.0

    def sd21(self):
        """Set the default weighting values for SD2.1 suggested by the paper authors"""
        self.s1 = 0.9
        self.s2 = 0.2
        self.b1 = 1.1
        self.b2 = 1.2

    def ones(self):
        """Set all ones to disable the FreeU adaptation"""
        self.s1 = 1.0
        self.s2 = 1.0
        self.b1 = 1.0
        self.b2 = 1.0

Then I instantiate it in the UNet class init:

self.freeu = UNetFreeUConfig()

And I set the desired config before using the pipe. Here I switch back to all 1 weights half way through the diffusion process, because with my fine-tuned model the results are much worse if I leave it enabled for the whole diffusion process.

steps = 30
pipe.unet.freeu.sd21()
def cb(step, _, __):
    if step == int(steps * 0.5):
        pipe.unet.freeu.ones()
output = pipe(prompt, num_inference_steps=steps, callback=cb)

freeu_partial_mario

Hi @justindujardin, nice work! But this is still unclear to me how I can use this. Could you share the code or the details?

@americanexplorer13
Copy link

Hey, @justindujardin will there be FreeU for SimpleCrossAttnUpBlock2D? It seems that it doesn't work for now due to shape incompatibility

@Vargol
Copy link

Vargol commented Oct 26, 2023

Hi, not sure you'd call this a bug but disabling FreeU if its not been enabled yet gives an error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-5-a9c69bf88260>](https://qnuuk7mc3-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231024-060124_RC00_576097381#) in <cell line: 39>()
     47       pipe.enable_freeu(s1=0.95, s2=1.0, b1=1.0, b2=1.05)
     48     else:
---> 49       pipe.disable_freeu()
     50 
     51 

2 frames
[/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py](https://qnuuk7mc3-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231024-060124_RC00_576097381#) in disable_freeu(self)
    619     def disable_freeu(self):
    620         """Disables the FreeU mechanism if enabled."""
--> 621         self.unet.disable_freeu()
    622 
    623     @torch.no_grad()

[/usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_condition.py](https://qnuuk7mc3-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231024-060124_RC00_576097381#) in disable_freeu(self)
    792         for i, upsample_block in enumerate(self.up_blocks):
    793             for k in freeu_keys:
--> 794                 if hasattr(upsample_block, k) or getattr(upsample_block, k) is not None:
    795                     setattr(upsample_block, k, None)
    796 

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://qnuuk7mc3-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231024-060124_RC00_576097381#) in __getattr__(self, name)
   1693             if name in modules:
   1694                 return modules[name]
-> 1695         raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
   1696 
   1697     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

AttributeError: 'CrossAttnUpBlock2D' object has no attribute 's1'

@sayakpaul
Copy link
Member

Please open a new issue with a fully reproducible snippet.

@Vargol
Copy link

Vargol commented Oct 26, 2023

#5544 raised

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* ✨ Added Fourier filter function to upsample blocks

* 🔧 Update Fourier_filter for float16 support

* ✨ Added UNetFreeUConfig to UNet model for FreeU adaptation 🛠️

* move unet to its original form and add fourier_filter to torch_utils.

* implement freeU enable mechanism

* implement disable mechanism

* resolution index.

* correct resolution idx condition.

* fix copies.

* no need to use resolution_idx in vae.

* spell out the kwargs

* proper config property

* fix attribution setting

* place unet hasattr properly.

* fix: attribute access.

* proper disable

* remove validation method.

* debug

* debug

* debug

* debug

* debug

* debug

* potential fix.

* add: doc.

* fix copies

* add: tests.

* add: support freeU in SDXL.

* set default value of resolution idx.

* set default values for resolution_idx.

* fix copies

* fix rest.

* fix copies

* address PR comments.

* run fix-copies

* move apply_free_u to utils and other minors.

* introduce support for video (unet3D)

* minor ups

* consistent fix-copies.

* consistent stuff

* fix-copies

* add: rest

* add: docs.

* fix: tests

* fix: doc path

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style up

* move to techniques.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for video with freeu

* add: slow test for video with freeu

* add: slow test for video with freeu

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* ✨ Added Fourier filter function to upsample blocks

* 🔧 Update Fourier_filter for float16 support

* ✨ Added UNetFreeUConfig to UNet model for FreeU adaptation 🛠️

* move unet to its original form and add fourier_filter to torch_utils.

* implement freeU enable mechanism

* implement disable mechanism

* resolution index.

* correct resolution idx condition.

* fix copies.

* no need to use resolution_idx in vae.

* spell out the kwargs

* proper config property

* fix attribution setting

* place unet hasattr properly.

* fix: attribute access.

* proper disable

* remove validation method.

* debug

* debug

* debug

* debug

* debug

* debug

* potential fix.

* add: doc.

* fix copies

* add: tests.

* add: support freeU in SDXL.

* set default value of resolution idx.

* set default values for resolution_idx.

* fix copies

* fix rest.

* fix copies

* address PR comments.

* run fix-copies

* move apply_free_u to utils and other minors.

* introduce support for video (unet3D)

* minor ups

* consistent fix-copies.

* consistent stuff

* fix-copies

* add: rest

* add: docs.

* fix: tests

* fix: doc path

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style up

* move to techniques.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for sd freeu.

* add: slow test for video with freeu

* add: slow test for video with freeu

* add: slow test for video with freeu

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet