Make T2I-Adapter downscale padding match the UNet #5435

RyanJDick · 2023-10-17T23:30:46Z

What does this PR do?

This PR updates the padding behavior of the T2I-Adapter down blocks to more closely mirror the padding behavior of the UNet model.

Prior to this change, the T2I-Adapter pipelines only worked with input/output image dimensions that were multiples of the T2I-Adapter's total downscale factor (64 for SD1 and 32 for SDXL). After this change, the same pipelines can be run with input/output dimensions that are multiples of the T2I-Adapter's initial pixel unshuffling downscale factor (8 for SD1 and 16 for SDXL).

Manual Testing

The change is covered by unit tests, but the following script can be used as a visual regression test for this feature. This script is adapted from this comment on the original issue.

import torch
from PIL import Image
from controlnet_aux import PidiNetDetector
import numpy as np
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline

in_dims = (680, 384)
#in_dims = (640, 384)

image = Image.open("dog.png")
processor = PidiNetDetector.from_pretrained("lllyasviel/Annotators")
sketch_image = processor(image).resize(in_dims).convert("L")
tensor_img = (torch.from_numpy(np.array(sketch_image).astype(np.float32) / 255.0) > 0.5).float()
sketch_image = tensor_img.numpy()
sketch_image = (sketch_image * 255).astype(np.uint8)
sketch_image = Image.fromarray(sketch_image)
sketch_image.save("sketch.png")
adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_sketch_sd15v2", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)
pipe.to("cuda")
generator = torch.Generator().manual_seed(0)
sketch_image_out = pipe(prompt="a dog", image=sketch_image, generator=generator).images[0]
sketch_image_out.save("sketch_image_out.png")

print(f"In dims: {in_dims}, Out dims: {(sketch_image_out.width, sketch_image_out.height)}")

Input dog.png:

Before this change, `in_dims=(680, 384)`

Note that output dimensions do not match input dimensions.
In dims: (680, 384), Out dims: (640, 384)

Before this change, `in_dims=(640, 384)`

In dims: (640, 384), Out dims: (640, 384)

After this change, `in_dims=(680, 384)`

The output dimensions match the input dimensions, as expected. The change in the output's appearance is expected, because the latent dimension has changed, so the same random seed no longer produces the same noise.
In dims: (680, 384), Out dims: (680, 384)

After this change, `in_dims=(640, 384)`

The output image for dimensions that already 'worked' is identical to before the change.
In dims: (640, 384), Out dims: (640, 384)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…t and width as params.

…UNet down blocks so that all T2I-Adapter down blocks get exercised.

…or of the UNet.

RyanJDick · 2023-10-17T23:33:57Z

src/diffusers/models/adapter.py

@@ -399,7 +409,7 @@ def __init__(self, in_channels: int, out_channels: int, num_res_blocks: int, dow

        self.downsample = None
        if down:
-            self.downsample = Downsample2D(in_channels)
+            self.downsample = nn.AvgPool2d(kernel_size=2, stride=2, ceil_mode=True)


Note to reviewers:

Downsample2D(in_channels) is really just a nn.AvgPool2d(kernel_size=2, stride=2) layer. The only real change here is setting ceil_mode=True.

RyanJDick · 2023-10-17T23:38:05Z

tests/pipelines/stable_diffusion/test_stable_diffusion_adapter.py

+        expected_slice = np.array(
+            [0.27978146, 0.36439905, 0.3206715, 0.29253614, 0.36390454, 0.3165658, 0.4384598, 0.43083128, 0.38120443]
+        )


Note to reviewers:

You'll notice several expected_slice values changing in this PR. These values changed as a result of changes to get_dummy_components(...) in 6d4a060. The changes to adapter.py did not require changes to expected_slice. I was careful to keep these changes in separate commits so that you can verify this, if desired.

I still prefer to keep the default tests unchanged and instead add new tests. If possible, could we do that? Applies to all the tests below.

Done. See my comment here: #5435 (comment)

(My original comment in this thread no longer applies.)

RyanJDick · 2023-10-17T23:40:28Z

@sayakpaul @MC-E

src/diffusers/models/adapter.py

src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_adapter.py

tests/pipelines/stable_diffusion/test_stable_diffusion_adapter.py

sayakpaul

The changes look very clean to me. Thanks for keeping them that way!

I have some concerns regarding the changes made to the default tests. Otherwise, the PR is already in a very good state!

…mber of UNet down blocks so that all T2I-Adapter down blocks get exercised." This reverts commit 6d4a060.

…vior.

…ating them dynamically.

yiyixuxu

great! thank you!

tests/pipelines/stable_diffusion/test_stable_diffusion_adapter.py

sayakpaul

Very comprehensive and clean PR. Thank you for your hard work!

I wonder if something about the original issue underlying this PR should be included in our docs in the interest of full transparency. WDYT?

sayakpaul · 2023-10-19T04:11:15Z

@RyanJDick you'd need to run:

make style && make quality
make fix-copies

to mitigate the failing of the style and repo consistency tests.

RyanJDick · 2023-10-19T12:52:30Z

Fixed style checks and noted for future.

I wonder if something about the original issue underlying this PR should be included in our docs in the interest of full transparency. WDYT?

Good question. I feel like its lower-level than most people would be interested in, and lower-level than the rest of the documentation. It might just add confusion. What do you think?

sayakpaul · 2023-10-19T13:39:29Z

Failing test related to consistency is unrelated.

patrickvonplaten · 2023-10-23T16:52:28Z

Very nice PR!

* Update get_dummy_inputs(...) in T2I-Adapter tests to take image height and width as params. * Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised. * Update the T2I-Adapter down blocks to better match the padding behavior of the UNet. * Revert "Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised." This reverts commit 6d4a060. * Create utility functions for testing the T2I-Adapter downscaling bahevior. * (minor) Improve readability with an intermediate named variable. * Statically parameterize T2I-Adapter test dimensions rather than generating them dynamically. * Fix static checks. --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

RyanJDick added 3 commits October 17, 2023 18:07

Update get_dummy_inputs(...) in T2I-Adapter tests to take image heigh…

2def4b4

…t and width as params.

Update the T2I-Adapter unit tests to run with the standard number of …

6d4a060

…UNet down blocks so that all T2I-Adapter down blocks get exercised.

Update the T2I-Adapter down blocks to better match the padding behavi…

7edad66

…or of the UNet.

RyanJDick commented Oct 17, 2023

View reviewed changes

sayakpaul requested a review from patrickvonplaten October 18, 2023 03:49