# Diffusion 2

## Inference 2: Image to Image

Adapted from the [fast.ai repo](https://github.com/fastai/diffusion-nbs).

## Workflow

#### Drive

If you need to load/save to your drive:

```python
import sys
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive/')

import os
os.chdir('drive/My Drive/IS53055B-DMLCP/DMLCP/python') # to change to another directory
```

#### Huggingface login

For some models and datasets, and if you want to push your model to HF (same as GitHub, but for models) you need to be logged into your HF account.

For that, you need to create an account [here](https://huggingface.co/) and then to ['/settings/tokens'](https://huggingface.co/settings/tokens) to create an access token.

```python
from pathlib import Path
from huggingface_hub import notebook_login
if not (Path.home()/'.huggingface'/'token').exists():
    notebook_login()
```

#### Install

1. On Colab, just use `pip` to install Huggingface libraries (see below).

2. Locally, the install is the same as the one used for Language models, see [`setup.md`](https://github.com/jchwenger/DMLCP/blob/main/setup.md#pytorch--huggingfacegradio).

In [None]:
import sys

if 'google.colab' in sys.modules:
    !pip install --upgrade transformers diffusers accelerate

In [None]:
import requests
from io import BytesIO
from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt

import torch

# Get cpu, gpu or mps device for training.
# See: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html#creating-models
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

from diffusers import StableDiffusionImg2ImgPipeline

The convenience function to display a batch of images.

In [None]:
# https://matplotlib.org/stable/gallery/axes_grid1/simple_axesgrid.html
from mpl_toolkits.axes_grid1 import ImageGrid

def plot_images(imgs, rows=1, cols=None, figsize=(12,8), title=None):
    fig = plt.figure(figsize=figsize)           # control figure size
    grid = ImageGrid(
        fig, 111,                                                     # similar to subplot(111) | see: https://stackoverflow.com/a/11404223
        nrows_ncols=(rows, cols if cols is not None else len(imgs)),  # control rows/cols
        axes_pad=0.1,                                                 # pad between axes in inch
    )
    if title is not None:               # https://matplotlib.org/3.2.1/gallery/subplots_axes_and_figures/figure_title.html
        fig.suptitle(title, x=0, y=0.5)

    for ax, im in zip(grid, imgs):      # Iterating over the grid returns the Axes.
        ax.set_xticks([])               # no x/y ticks: https://stackoverflow.com/a/45149018
        ax.set_yticks([])               #               https://stackoverflow.com/a/58535290
        ax.imshow(im)

In [None]:
MODEL_ID = "CompVis/stable-diffusion-v1-4" # same comment as in the first notebook

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    MODEL_ID,
    revision="fp16",
    torch_dtype=torch.float16,
    safety_checker = None # remove NSFW filter
).to("cuda")

# Note: removing the filter is no licence to do harm, it is to give *you* the responsibility
# of your use. (Also, the HF safety_checker is very, very conservative, and rejects
# a lot of abstract images.)

In [None]:
SAVE_MEMORY = False

if SAVE_MEMORY:                     # Saves memory at the cost of speed:
    pipe.enable_attention_slicing() #  https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.enable_attention_slicing 

if device=="mps":                           # First-time "warmup" pass for M1/M2 macs
    _ = pipe(prompt, num_inference_steps=1) # https://huggingface.co/docs/diffusers/v0.4.1/en/optimization/mps

In [None]:
url = 'https://cdn-uploads.huggingface.co/production/uploads/1664665907257-noauth.png'
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image

In [None]:
torch.manual_seed(1000)
prompt = "Wolf robot on the ice flow, photorealistic 4K"
n_img = 3
result = pipe(
    prompt=prompt,
    num_images_per_prompt=n_img,
    image=[init_image]*n_img, # pass the same number of init images
    strength=0.8,
    num_inference_steps=50
)

In [None]:
result

In [None]:
plot_images(result.images, rows=1, cols=3)

In [None]:
init_image = result.images[2]

torch.manual_seed(1000)
prompt = "The Wooden Wolf, painting by Vincent Van Gogh"

images = pipe(
    prompt=prompt,
    num_images_per_prompt=3,
    image=init_image,
    strength=0.78,
    num_inference_steps=70
).images

plot_images(images, rows=1, cols=3)

---

## Extra: upscaling!

Diffusion models have been trained to upscale images! The one below multiplies the image size by 4!

Original notebook [here](https://github.com/huggingface/notebooks/blob/main/diffusers/latent_diffusion_upscaler.ipynb).

The model was originally released in [Latent Diffusion repo](https://github.com/CompVis/latent-diffusion). It's a simple, 4x super-resolution model diffusion model. This model is not conditioned on text.

In [None]:
from diffusers import LDMSuperResolutionPipeline

In [None]:
MODEL_ID = "CompVis/ldm-super-resolution-4x-openimages"
pipe = LDMSuperResolutionPipeline.from_pretrained(
    MODEL_ID,
    # revision="fp16",           # better for memory, unsure if that has
    # torch_dtype=torch.float16, # an impact on image quality...
).to(device)

In [None]:
# let's download an  image
url = "https://i.pinimg.com/236x/af/84/56/af8456faa55d76bd9afa18cd2fd72d58.jpg"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")

n = 128
low_res_img = low_res_img.resize((n,n))
low_res_img

In [None]:
# run pipeline in inference (sample random noise and denoise)
upscaled_image = pipe(
    low_res_img,
    num_inference_steps=100,
    eta=1
).images[0]

upscaled_image

## Experiments

- Can you build a pipeline that repeats the process of generating an image and then selecting one as a basis for the next?
- This is of course also valid for super-resolution, where it might be possible to get interesting results by applying hyperresolution serveral times in a row (perhaps upscale, then select a portion of the result image, upscale again...)..
- How do you display your results? Note that, as with GANs, if you create a series of images, you can then merge them into a video..
- These are fully compatible with Gradio (you can see the code for many apps using these models on [Huggingface Spaces](https://huggingface.co/spaces?sort=modified&search=diffusion), it could be cool to build your own, or modify an existing one!