Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make T2I-Adapter downscale padding match the UNet #5435

Merged
merged 9 commits into from
Oct 23, 2023

Conversation

RyanJDick
Copy link
Contributor

@RyanJDick RyanJDick commented Oct 17, 2023

What does this PR do?

Fixes #5377

This PR updates the padding behavior of the T2I-Adapter down blocks to more closely mirror the padding behavior of the UNet model.

Prior to this change, the T2I-Adapter pipelines only worked with input/output image dimensions that were multiples of the T2I-Adapter's total downscale factor (64 for SD1 and 32 for SDXL). After this change, the same pipelines can be run with input/output dimensions that are multiples of the T2I-Adapter's initial pixel unshuffling downscale factor (8 for SD1 and 16 for SDXL).

Manual Testing

The change is covered by unit tests, but the following script can be used as a visual regression test for this feature. This script is adapted from this comment on the original issue.

import torch
from PIL import Image
from controlnet_aux import PidiNetDetector
import numpy as np
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline

in_dims = (680, 384)
#in_dims = (640, 384)

image = Image.open("dog.png")
processor = PidiNetDetector.from_pretrained("lllyasviel/Annotators")
sketch_image = processor(image).resize(in_dims).convert("L")
tensor_img = (torch.from_numpy(np.array(sketch_image).astype(np.float32) / 255.0) > 0.5).float()
sketch_image = tensor_img.numpy()
sketch_image = (sketch_image * 255).astype(np.uint8)
sketch_image = Image.fromarray(sketch_image)
sketch_image.save("sketch.png")
adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_sketch_sd15v2", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)
pipe.to("cuda")
generator = torch.Generator().manual_seed(0)
sketch_image_out = pipe(prompt="a dog", image=sketch_image, generator=generator).images[0]
sketch_image_out.save("sketch_image_out.png")

print(f"In dims: {in_dims}, Out dims: {(sketch_image_out.width, sketch_image_out.height)}")

Input dog.png:
image

Before this change, in_dims=(680, 384)

Note that output dimensions do not match input dimensions.
In dims: (680, 384), Out dims: (640, 384)
image

Before this change, in_dims=(640, 384)

In dims: (640, 384), Out dims: (640, 384)
image

After this change, in_dims=(680, 384)

The output dimensions match the input dimensions, as expected. The change in the output's appearance is expected, because the latent dimension has changed, so the same random seed no longer produces the same noise.
In dims: (680, 384), Out dims: (680, 384)

image

After this change, in_dims=(640, 384)

The output image for dimensions that already 'worked' is identical to before the change.
In dims: (640, 384), Out dims: (640, 384)

image

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@@ -399,7 +409,7 @@ def __init__(self, in_channels: int, out_channels: int, num_res_blocks: int, dow

self.downsample = None
if down:
self.downsample = Downsample2D(in_channels)
self.downsample = nn.AvgPool2d(kernel_size=2, stride=2, ceil_mode=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:

Downsample2D(in_channels) is really just a nn.AvgPool2d(kernel_size=2, stride=2) layer. The only real change here is setting ceil_mode=True.

Comment on lines 224 to 226
expected_slice = np.array(
[0.27978146, 0.36439905, 0.3206715, 0.29253614, 0.36390454, 0.3165658, 0.4384598, 0.43083128, 0.38120443]
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:

You'll notice several expected_slice values changing in this PR. These values changed as a result of changes to get_dummy_components(...) in 6d4a060. The changes to adapter.py did not require changes to expected_slice. I was careful to keep these changes in separate commits so that you can verify this, if desired.

Copy link
Member

@sayakpaul sayakpaul Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to keep the default tests unchanged and instead add new tests. If possible, could we do that? Applies to all the tests below.

Copy link
Contributor Author

@RyanJDick RyanJDick Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See my comment here: #5435 (comment)

(My original comment in this thread no longer applies.)

@RyanJDick
Copy link
Contributor Author

@sayakpaul @MC-E

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look very clean to me. Thanks for keeping them that way!

I have some concerns regarding the changes made to the default tests. Otherwise, the PR is already in a very good state!

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! thank you!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very comprehensive and clean PR. Thank you for your hard work!

I wonder if something about the original issue underlying this PR should be included in our docs in the interest of full transparency. WDYT?

@sayakpaul
Copy link
Member

@RyanJDick you'd need to run:

make style && make quality
make fix-copies

to mitigate the failing of the style and repo consistency tests.

@RyanJDick
Copy link
Contributor Author

Fixed style checks and noted for future.

I wonder if something about the original issue underlying this PR should be included in our docs in the interest of full transparency. WDYT?

Good question. I feel like its lower-level than most people would be interested in, and lower-level than the rest of the documentation. It might just add confusion. What do you think?

@sayakpaul
Copy link
Member

Failing test related to consistency is unrelated.

@patrickvonplaten
Copy link
Contributor

Very nice PR!

@patrickvonplaten patrickvonplaten merged commit 0eac9cd into huggingface:main Oct 23, 2023
10 of 11 checks passed
kashif pushed a commit to kashif/diffusers that referenced this pull request Nov 11, 2023
* Update get_dummy_inputs(...) in T2I-Adapter tests to take image height and width as params.

* Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised.

* Update the T2I-Adapter down blocks to better match the padding behavior of the UNet.

* Revert "Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised."

This reverts commit 6d4a060.

* Create  utility functions for testing the T2I-Adapter downscaling bahevior.

* (minor) Improve readability with an intermediate named variable.

* Statically parameterize  T2I-Adapter test dimensions rather than generating them dynamically.

* Fix static checks.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Update get_dummy_inputs(...) in T2I-Adapter tests to take image height and width as params.

* Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised.

* Update the T2I-Adapter down blocks to better match the padding behavior of the UNet.

* Revert "Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised."

This reverts commit 6d4a060.

* Create  utility functions for testing the T2I-Adapter downscaling bahevior.

* (minor) Improve readability with an intermediate named variable.

* Statically parameterize  T2I-Adapter test dimensions rather than generating them dynamically.

* Fix static checks.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* Update get_dummy_inputs(...) in T2I-Adapter tests to take image height and width as params.

* Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised.

* Update the T2I-Adapter down blocks to better match the padding behavior of the UNet.

* Revert "Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised."

This reverts commit 6d4a060.

* Create  utility functions for testing the T2I-Adapter downscaling bahevior.

* (minor) Improve readability with an intermediate named variable.

* Statically parameterize  T2I-Adapter test dimensions rather than generating them dynamically.

* Fix static checks.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make T2I-Adapter downscale padding match the UNet
4 participants