# ControlNet Image Generation with Stable Diffusion

This notebook demonstrates how to use ControlNet with Stable Diffusion to generate images guided by edge detection. ControlNet allows us to condition the image generation process on structural information from an input image while maintaining the freedom to change the style and content.

In this example, we'll:
1. Load and prepare an input image
2. Apply Canny edge detection to extract structural information
3. Use Intel optimization with IPEX to accelerate inference
4. Generate a new image based on the edge map and a text prompt

## Import Base Libraries

First, we import the core libraries needed for deep learning:
- `torch`: PyTorch for deep learning operations
- `nn`: Neural network modules from PyTorch
- `intel_extension_for_pytorch` (IPEX): Intel's extension to optimize PyTorch performance on Intel hardware

In [None]:
import torch
from torch import nn
import intel_extension_for_pytorch as ipex

## Import Image Processing and Diffusion Libraries

Next, we import libraries for image processing and the diffusion model:
- `diffusers`: HuggingFace's library for diffusion models including Stable Diffusion
- `PIL`: Python Imaging Library for image manipulation
- `cv2`: OpenCV for computer vision tasks, particularly edge detection
- `numpy`: For numerical operations on image arrays
- We also import specific components needed for the ControlNet pipeline

In [None]:
from diffusers.utils import load_image, make_image_grid
from PIL import Image
import cv2
import numpy as np
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler

## Load Image, Apply Edge Detection, and Generate with ControlNet

This section contains the main workflow:

1. **Load and Resize Input Image**: We load a house outline image and resize it to 1024x1024

2. **Define Pipeline Optimization Function**: The `optimize_pipeline` function leverages Intel IPEX to accelerate model inference

3. **Apply Canny Edge Detection**: We extract edges from the original image to guide the generation
   - Low threshold (100) and high threshold (200) control the edge sensitivity
   - The resulting edge map is converted to an RGB image

4. **Set Up the ControlNet Model**:
   - We load a pre-trained ControlNet model specialized for Canny edge maps
   - We also load the base Stable Diffusion v1.5 model
   - Models are configured to use bfloat16 precision for efficiency

5. **Generate the Image**:
   - We run inference with the prompt "Batman" guided by our edge map
   - Intel acceleration with XPU optimizations is employed

6. **Display and Save the Results**:
   - We create a grid showing the original image, edge map, and generated result
   - The output is saved as "house.png"

In [None]:
original_image = load_image(
     "https://img.freepik.com/premium-photo/house-outline-illustration-white-background_1112329-31710.jpg"
).resize((1024, 1024))

def optimize_pipeline(pipeline):
    """
    Optimizes the model for inference using ipex.

    Parameters:
    - pipeline: The model pipeline to be optimized.

    Returns:
    - pipeline: The optimized model pipeline.
    """

    for attr in dir(pipeline):
        try:
            if isinstance(getattr(pipeline, attr), nn.Module):
                setattr(
                    pipeline,
                    attr,
                    ipex.optimize(
                        getattr(pipeline, attr).eval(),
                        dtype=pipeline.text_encoder.dtype,
                        inplace=True,
                    ),
                )
        except AttributeError:
            pass
    return pipeline

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

image_tensor = torch.tensor(np.array(canny_image)).to("xpu")
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.bfloat16, use_safetensors=True)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.bfloat16, use_safetensors=True
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("xpu")
pipe = optimize_pipeline(pipe)
# pipe.enable_xformers_memory_efficient_attention()
with torch.inference_mode():
    with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
        output = pipe(
            "Vivid Watercolor", image=canny_image
        ).images[0]

image_grid = make_image_grid([original_image, canny_image, output], rows=1, cols=3)
#image_grid = image_grid.to("xpu")
image_grid_np = np.array(image_grid)
cv2.imwrite("house.png", image_grid_np)

## Results and Next Steps

The generated image combines the structural information from the house outline with the style and visual elements associated with a vivid watercolor interpretation. The output is saved as "house.png" with three panels showing:
- Left: The original house outline image
- Middle: The Canny edge detection result
- Right: The generated image combining the structure with the "Vivid Watercolor" prompt

To experiment further:
- Try different edge detection thresholds to see how they affect the results
- Use different text prompts to generate various styles while keeping the same structure
- Experiment with other ControlNet models like depth maps, segmentation maps, or pose estimation