Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the difference between 1-control and control in gradio_lineart.py and gradio_canny.py? #135

Closed
deepmayuot opened this issue Nov 25, 2023 · 2 comments

Comments

@deepmayuot
Copy link

deepmayuot commented Nov 25, 2023

Thanks for sharing the code!

I noticed that the the control in cldm.py is as follows:

    def get_input(self, batch, k, bs=None, *args, **kwargs):
        x, c = super().get_input(batch, self.first_stage_key, *args, **kwargs)
        control = batch[self.control_key]
        if bs is not None:
            control = control[:bs]
        control = control.to(self.device)
        control = einops.rearrange(control, 'b h w c -> b c h w')
        control = control.to(memory_format=torch.contiguous_format).float()

And we can use such code in gradio_canny.py to generate images:

        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)

        control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
        control = torch.stack([control for _ in range(num_samples)], dim=0)
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()

However, in gradio_lineart.py, the control is as follows:

        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)

        control = 1.0 - torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
        control = torch.stack([control for _ in range(num_samples)], dim=0)
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()

I am confused about this. Can anyone give some suggestions?

@geroldmeisinger
Copy link

geroldmeisinger commented Nov 25, 2023

1 - v on a floating point grayscale image is "invert" so as to get a black image with white lines or a white image with black lines. I think the assumption here is: canny images are usually generated with a canny edge detector, which outputs black images with white lines, whereas lineart images are usually scanned from real paper. it's for convienience.

@deepmayuot
Copy link
Author

1 - v on a floating point grayscale image is "invert" so as to get a black image with white lines or a white image with black lines. I think the assumption here is: canny images are usually generated with a canny edge detector, which outputs black images with white lines, whereas lineart images are usually scanned from real paper. it's for convienience.

Thanks for your quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants