# CSC2516 Project - Coloring B/W Manga images using InstructPix2Pix

### Team Members: `Rajesh Marudhachalam`, `Gurman Bhullar`, `Naveen Thangavelu`


InstructPix2Pix is fine-tuned stable [diffusion](https://github.com/huggingface/diffusers) model which allows you to edit images using language instructions.

Most of the code is adapted from the official documentation by the authors of [InstructPix2Pix](https://www.timothybrooks.com/instruct-pix2pix/). 

##### Run this notebook on Colab to avoid out of memory and CUDA errors.

---

### Install the necessary packages

In [None]:
!pip install -qqq git+https://github.com/huggingface/diffusers.git gradio transformers accelerate safetensors

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.8/15.8 MB[0m [31m82.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m101.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 KB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.8/199.8 KB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m70.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.1/57.1 KB[0m [31m7.8 MB/s[0m eta [36m

---

### Load the `StableDiffusionInstructPix2PixPipeline` pipeline

In [None]:
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16", safety_checker=None)
pipe.to("cuda")
pipe.enable_attention_slicing()

  jax.tree_util.register_keypaths(


Downloading (…)p16/model_index.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

Downloading model.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/608M [00:00<?, ?B/s]

Downloading (…)_encoder/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)_checker/config.json:   0%|          | 0.00/4.85k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

Downloading (…)tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/518 [00:00<?, ?B/s]

Downloading (…)cheduler_config.json:   0%|          | 0.00/569 [00:00<?, ?B/s]

Downloading (…)ch_model.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

Downloading (…)ch_model.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Downloading (…)tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)2ba4/vae/config.json:   0%|          | 0.00/598 [00:00<?, ?B/s]

Downloading (…)ba4/unet/config.json:   0%|          | 0.00/1.07k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_instruct_pix2pix.StableDiffusionInstructPix2PixPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .


---

### Load a list of B/W images and create an individual directory for every corresponding image

In [None]:
import os

def load_bw_imgs(src_dir: str, tgt_dir: src):
    list_bw_imgs = sorted([i.replace('.jpg', '') for i in os.listdir(f'./{src_dir}')])
    for img in list_bw_imgs:
        os.mkdir(f'./{tgt_dir}/{img}')
    return list_bw_imgs

---

### Image Colorisation using diffuser

This generate 50 outputs for every B/W input image.

In [None]:
edit_instructions = [
    "Colorize this person to have natural skintones",
    "turn it colorful",
    "make it a colorful professional headshot realistically animated"
]

def generate(
    list_bw_imgs: list,
    instruction: str,
    steps: int,
    randomize_seed: bool,
    seed: int
):
    seed = random.randint(0, 100000) if randomize_seed else seed
    generator = torch.manual_seed(seed)

    for img in list_bw_imgs:
        input_image = PIL.Image.open(f"./grayscale/{img}.jpg")
        input_image = PIL.ImageOps.exif_transpose(input_image)
        input_image = image.convert("RGB")

    #Generate 50 images for every input
    for n in range(50):
        try:
            if n<10:
                image_cfg_scale=1
            elif n<20:
                image_cfg_scale=1.25
            elif n<30:
                image_cfg_scale=1.5
            elif n<40:
                image_cfg_scale=1.75
            else:
                image_cfg_scale=2
            output_img = pipe(
                instruction, image=input_image,
                guidance_scale=text_cfg_scale, image_guidance_scale=image_cfg_scale,
                num_inference_steps=steps, generator=generator,
            ).images[0]
            output_img.save(f'./color/{img}/{img}_{n}.jpg')
        except:
            print(f'Failed at {n} for {img}')

        print(f'Done {img}!')

In [None]:
def main():
    list_bw_imgs = load_bw_imgs('grayscale', 'color')
    generate(
        list_bw_imgs=list_bw_imgs,
        instruction: edit_instructions[2],
        steps=20,
        randomize_seed=True,
        seed=123
    )

if __name__ == "__main__":
    main()

---

### Download the generated outputs as zip file

In [None]:
!zip -r ./output.zip ./color
from google.colab import files
files.download("./output.zip")

---

### Choosing the best generated output out of the 50 outputs for every input using SSIM score.

In [None]:
from skimage.metrics import structural_similarity as ssim
import cv2
import random

In [None]:
#Creating an empty dictionary to store the path to best images

best_outputs = {}

for img in list_imgs:
    best_outputs[img]=[]

In [None]:
def get_best_output(list_bw_imgs):
    for img in list_bw_imgs:
        orig = cv2.cvtColor(cv2.imread(f"./grayscale/{img}.jpg"), cv2.COLOR_BGR2GRAY)
        score = 0
        for i in range(50):
            output_img = cv2.cvtColor(cv2.imread(f"./color/{img}/{img}_{i}.jpg"), cv2.COLOR_BGR2GRAY)
            x,y = output_img.shape

            #reshaping to match image sizes
            cur_score = ssim(orig[:x, :y], output_img)
            if cur_score >= score:
                score = cur_score
                best_outputs[img].append(f"./color/{img}/{img}_{i}.jpg")

In [None]:
get_best_output(list_bw_imgs)

#We get multiple best output for many images
best_outputs['Yuuki_Enokibata']

['./color/Yuuki_Enokibata/Yuuki_Enokibata_0.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_1.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_2.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_4.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_5.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_7.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_10.jpg',
 './color/Yuuki_Enokibata/Yuuki_Enokibata_11.jpg']

In [None]:
def pick_random_best_output(list_bw_imgs):
    for img in list_bw_imgs:
        cmd = 'cp ' + str(random.choice(best_outputs[img])).split('_') + '.jpg ./best/'
        os.system(cmd)

In [None]:
pick_random_best_output(list_bw_imgs)

### Download only the best outputs

In [None]:
!zip -qr ./best.zip ./best
files.download("./best.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

---

### Top 10 outputs

We use perceptual distance for this.

In [None]:
import json

#Please include the evaluation metrics files in the root for the below imports to work
from base_dataset import BaseDataset
from utils import perceptual_distance

Downloading: "https://download.pytorch.org/models/inception_v3_google-0cc3c7bd.pth" to /root/.cache/torch/hub/checkpoints/inception_v3_google-0cc3c7bd.pth
100%|██████████| 104M/104M [00:01<00:00, 55.4MB/s]


In [None]:
perceptual_dist = {}

for i in list_bw_imgs:
    src=f'./grayscale/{i}.jpg'
    dst=f'./best/{i}.jpg'
    perceptual_dist[i] = float(perceptual_distance(BaseDataset(src), BaseDataset(dst), cuda=False, batch_size = 1, resize=True))

In [None]:
#Sorting the dictionary in ascending order.

dict(sorted(perceptual_dist.items(), key=lambda x:x[1]))

{'Fuuka_Kamiigusa': 0.0007221855573914985,
 'Subaru_Ichinose': 0.0024928477286526207,
 'Collon_Rin_Purgatrio': 0.0026431882988635705,
 'Pierre_Kang': 0.003076483438493879,
 'Miguel_Aiman': 0.0030910123032593722,
 'Remilia_Scarlet': 0.003133904680063787,
 'Mikoto_Kibitsu': 0.00397096439420725,
 'Christine_Minato': 0.004740342578915965,
 'Yuuzoo_Tanegashima': 0.005385565860398724,
 'Akina_Hinatsuru': 0.006368244899347876,
 'Gina_Beaumont': 0.007561833216342853,
 'Akiko_Ifukube': 0.009227465517227557,
 'Keiichirou_Usubaru': 0.009846909969155274,
 'Zetto_Ichimura': 0.010903031278794089,
 'Ryouma_Takebayashi': 0.014143316510283612,
 'Fukukaichou_Nagata': 0.01441246387545523,
 'Erika_Shibasaki': 0.014429342129479376,
 'Sanae_Shimizu': 0.015401510675651537,
 'Yura_Sakurazuki': 0.015989141616115976,
 'Mayu_Tamano': 0.017952960223436072,
 'Bancroft': 0.02331600032520329,
 'Shuuzou_Tsurumaki': 0.025536626143784658,
 'Adeltrud_Olter': 0.027026836037725144,
 'Shigeo_Umezu': 0.0280060272105763,
 'A

In [None]:
#Writing the perceptual diffusion scores to txt files for later use
  
with open('best_difussion_outputs.txt', 'w') as convert_file:
     convert_file.write(json.dumps(dict(sorted(perceptual_dist.items(), key=lambda x:x[1]))))

In [None]:
# Top 10 outputs of the diffusio model

list(dict(sorted(perceptual_dist.items(), key=lambda x:x[1])).keys())[:10]

['Fuuka_Kamiigusa',
 'Subaru_Ichinose',
 'Collon_Rin_Purgatrio',
 'Pierre_Kang',
 'Miguel_Aiman',
 'Remilia_Scarlet',
 'Mikoto_Kibitsu',
 'Christine_Minato',
 'Yuuzoo_Tanegashima',
 'Akina_Hinatsuru']

In [None]:
#Storing the top 10 outputs in a new directory and downloadin as zip

!mkdir top_10

for i in list(dict(sorted(perceptual_dist.items(), key=lambda x:x[1])).keys())[:10]:
  os.system(f'cp ./generated/{i}.jpg ./top_10/{i}.jpg')

!zip -qr ./top_10.zip ./top_10
files.download("./top_10.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>