# Generating Headshot image using InstantID
InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.

[More Deails](https://github.com/InstantID/InstantID)


## 1. Clone the Repo

In [1]:
!git clone https://github.com/surajkarki66/instantid-headshot.git

Cloning into 'instantid-headshot'...
remote: Enumerating objects: 43, done.[K
remote: Counting objects: 100% (43/43), done.[K
remote: Compressing objects: 100% (33/33), done.[K
remote: Total 43 (delta 18), reused 32 (delta 9), pack-reused 0[K
Unpacking objects: 100% (43/43), 49.35 KiB | 2.47 MiB/s, done.


## 2. Install Dependencies

In [2]:
!cd ./instantid-headshot && pip install -r "requirements.txt"

Collecting gdown (from -r requirements.txt (line 1))
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Collecting diffusers (from -r requirements.txt (line 6))
  Downloading diffusers-0.28.2-py3-none-any.whl.metadata (19 kB)
Collecting onnxruntime-gpu (from -r requirements.txt (line 7))
  Downloading onnxruntime_gpu-1.18.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (4.3 kB)
Collecting insightface (from -r requirements.txt (line 8))
  Downloading insightface-0.7.3.tar.gz (439 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m439.5/439.5 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting omegaconf (from -r requirements.txt (line 9))
  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting contro

## 3. Download Models

In [3]:
!cd ./instantid-headshot && python download_models.py

ControlNetModel/config.json: 100%|█████████| 1.38k/1.38k [00:00<00:00, 7.02MB/s]
diffusion_pytorch_model.safetensors: 100%|██| 2.50G/2.50G [00:07<00:00, 315MB/s]
ip-adapter.bin: 100%|███████████████████████| 1.69G/1.69G [00:05<00:00, 301MB/s]
pytorch_lora_weights.safetensors: 100%|███████| 394M/394M [00:01<00:00, 307MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=18wEUfMNohBJ4K3Ly5wpTejPfDzp-8fI8
From (redirected): https://drive.google.com/uc?id=18wEUfMNohBJ4K3Ly5wpTejPfDzp-8fI8&confirm=t&uuid=a6f1f0e5-83a4-45b9-a2cd-5cb615d16bac
To: /kaggle/working/instantid-headshot/models/antelopev2.zip
100%|█████████████████████████████████████████| 361M/361M [00:01<00:00, 216MB/s]
Archive:  ./models/antelopev2.zip
   creating: ./models/antelopev2/
  inflating: ./models/antelopev2/genderage.onnx  
  inflating: ./models/antelopev2/2d106det.onnx  
  inflating: ./models/antelopev2/1k3d68.onnx  
  inflating: ./models/antelopev2/glintr100.onnx  
  inflating: ./models/antelopev2/scrf

## 4. Imports

In [4]:
%cd ./instantid-headshot
import sys
sys.path.append('./')

from typing import Tuple

import os
import cv2
import math
import torch
import random
import numpy as np
import argparse

import PIL
from PIL import Image

import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel
from diffusers import LCMScheduler

from huggingface_hub import hf_hub_download

import insightface
from insightface.app import FaceAnalysis

from style_template import styles
from pipeline_stable_diffusion_xl_instantid_full import StableDiffusionXLInstantIDPipeline
from model_util import load_models_xl, get_torch_device, torch_gc

/kaggle/working/instantid-headshot


The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

2024-06-06 07:37:49.270470: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-06 07:37:49.270572: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-06 07:37:49.386202: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)


## 4. Run

In [5]:
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
    if randomize_seed:
        seed = random.randint(0, MAX_SEED)
    return seed

In [6]:
def convert_from_cv2_to_image(img: np.ndarray) -> Image:
    return Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

In [7]:
def convert_from_image_to_cv2(img: Image) -> np.ndarray:
    return cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

In [8]:
def draw_kps(image_pil, kps, color_list=[(255,0,0), (0,255,0), (0,0,255), (255,255,0), (255,0,255)]):
    stickwidth = 4
    limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]])
    kps = np.array(kps)

    w, h = image_pil.size
    out_img = np.zeros([h, w, 3])

    for i in range(len(limbSeq)):
        index = limbSeq[i]
        color = color_list[index[0]]

        x = kps[index][:, 0]
        y = kps[index][:, 1]
        length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5
        angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1]))
        polygon = cv2.ellipse2Poly((int(np.mean(x)), int(np.mean(y))), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
        out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color)
    out_img = (out_img * 0.6).astype(np.uint8)

    for idx_kp, kp in enumerate(kps):
        color = color_list[idx_kp]
        x, y = kp
        out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1)

    out_img_pil = Image.fromarray(out_img.astype(np.uint8))
    return out_img_pil

In [9]:
def resize_img(input_image, max_side=1280, min_side=1024, size=None,
      pad_to_max_side=False, mode=PIL.Image.BILINEAR, base_pixel_number=64):

    w, h = input_image.size
    if size is not None:
        w_resize_new, h_resize_new = size
    else:
        ratio = min_side / min(h, w)
        w, h = round(ratio*w), round(ratio*h)
        ratio = max_side / max(h, w)
        input_image = input_image.resize([round(ratio*w), round(ratio*h)], mode)
        w_resize_new = (round(ratio * w) // base_pixel_number) * base_pixel_number
        h_resize_new = (round(ratio * h) // base_pixel_number) * base_pixel_number
    input_image = input_image.resize([w_resize_new, h_resize_new], mode)

    if pad_to_max_side:
        res = np.ones([max_side, max_side, 3], dtype=np.uint8) * 255
        offset_x = (max_side - w_resize_new) // 2
        offset_y = (max_side - h_resize_new) // 2
        res[offset_y:offset_y+h_resize_new, offset_x:offset_x+w_resize_new] = np.array(input_image)
        input_image = Image.fromarray(res)
    return input_image

In [10]:
def apply_style(style_name: str, positive: str, negative: str = "") -> Tuple[str, str]:
    p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
    return p.replace("{prompt}", positive), n + ' ' + negative

In [11]:
def generate_image(pipe, face_image_path, pose_image_path, prompt, negative_prompt, style_name, num_steps, identitynet_strength_ratio, adapter_strength_ratio, guidance_scale, seed, enable_LCM, enhance_face_region):
  if enable_LCM:
      pipe.enable_lora()
      pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
  else:
      pipe.disable_lora()
      pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)


  if prompt is None:
      prompt = "a person"

  # apply the style template
  prompt, negative_prompt = apply_style(style_name, prompt, negative_prompt)

  face_image = load_image(face_image_path)
  face_image = resize_img(face_image)
  face_image_cv2 = convert_from_image_to_cv2(face_image)
  height, width, _ = face_image_cv2.shape

  # Extract face features
  face_info = app.get(face_image_cv2)

  if len(face_info) == 0:
      raise Exception(f"Cannot find any face in the image! Please upload another person image")

  face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1]  # only use the maximum face
  face_emb = face_info['embedding']
  face_kps = draw_kps(convert_from_cv2_to_image(face_image_cv2), face_info['kps'])

  if pose_image_path is not None:
      pose_image = load_image(pose_image_path)
      pose_image = resize_img(pose_image)
      pose_image_cv2 = convert_from_image_to_cv2(pose_image)

      face_info = app.get(pose_image_cv2)

      if len(face_info) == 0:
          raise Exception(f"Cannot find any face in the reference image! Please upload another person image")

      face_info = face_info[-1]
      face_kps = draw_kps(pose_image, face_info['kps'])

      width, height = face_kps.size

  if enhance_face_region:
      control_mask = np.zeros([height, width, 3])
      x1, y1, x2, y2 = face_info["bbox"]
      x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
      control_mask[y1:y2, x1:x2] = 255
      control_mask = Image.fromarray(control_mask.astype(np.uint8))
  else:
      control_mask = None

  generator = torch.Generator(device=device).manual_seed(seed)

  print("Start inference...")
  print(f"[Debug] Prompt: {prompt}, \n[Debug] Neg Prompt: {negative_prompt}")

  pipe.set_ip_adapter_scale(adapter_strength_ratio)
  images = pipe(
      prompt=prompt,
      negative_prompt=negative_prompt,
      image_embeds=face_emb,
      image=face_kps,
      control_mask=control_mask,
      controlnet_conditioning_scale=float(identitynet_strength_ratio),
      num_inference_steps=num_steps,
      guidance_scale=guidance_scale,
      height=height,
      width=width,
      generator=generator
  ).images

  return images[0]

In [17]:
# global variable
MAX_SEED = np.iinfo(np.int32).max
device = get_torch_device()
dtype = torch.float16 if str(device).__contains__("cuda") else torch.float32
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "(No style)"

# Load face encoder
app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

# Path to InstantID models
face_adapter = f'./checkpoints/ip-adapter.bin'
controlnet_path = f'./checkpoints/ControlNetModel'

# Load pipeline
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=dtype)

pretrained_model_name_or_path="wangqixun/YamerMIX_v8"

if pretrained_model_name_or_path.endswith(
        ".ckpt"
    ) or pretrained_model_name_or_path.endswith(".safetensors"):
        scheduler_kwargs = hf_hub_download(
            repo_id="wangqixun/YamerMIX_v8",
            subfolder="scheduler",
            filename="scheduler_config.json",
        )

        (tokenizers, text_encoders, unet, _, vae) = load_models_xl(
            pretrained_model_name_or_path=pretrained_model_name_or_path,
            scheduler_name=None,
            weight_dtype=dtype,
        )

        scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_kwargs)
        pipe = StableDiffusionXLInstantIDPipeline(
            vae=vae,
            text_encoder=text_encoders[0],
            text_encoder_2=text_encoders[1],
            tokenizer=tokenizers[0],
            tokenizer_2=tokenizers[1],
            unet=unet,
            scheduler=scheduler,
            controlnet=controlnet,
        ).to(device)

else:
    pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
        pretrained_model_name_or_path,
        controlnet=controlnet,
        torch_dtype=dtype,
        safety_checker=None,
        feature_extractor=None,
    ).to(device)

    pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)


pipe.load_ip_adapter_instantid(face_adapter)

# load and disable LCM
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.disable_lora()

input_img = "./WhatsApp Image 2024-06-05 at 12.56.09.jpeg"
reference_img = "./WhatsApp Image 2024-06-05 at 12.57.39.jpeg"

out_img = generate_image(pipe, face_image_path=input_img, pose_image_path=reference_img, prompt="professional headshot, linkedin profile picture, realistic photo", negative_prompt="bad quality, unrealistic", style_name=None, num_steps=30, identitynet_strength_ratio=0.8, adapter_strength_ratio=0.8, guidance_scale=5, seed=810087237, enable_LCM=True, enhance_face_region=True)


Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./models/antelopev2/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./models/antelopev2/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./models/antelopev2/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0


[1;31m2024-06-06 07:46:04.689382793 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
[m
[1;31m2024-06-06 07:46:05.131404747 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
[m
[1;31m2024-06-06 07:46:05.155951343 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/sess

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./models/antelopev2/glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: ./models/antelopev2/scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0
set det-size: (640, 640)


[1;31m2024-06-06 07:46:05.966692819 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
[m
The config attributes {'controlnet_list': ['controlnet', 'RPMultiControlNetModel'], 'requires_aesthetics_score': False} were passed to StableDiffusionXLInstantIDPipeline, but are not expected and will be ignored. Please verify your model_index.json configuration file.
Keyword arguments {'controlnet_list': ['controlnet', 'RPMultiControlNetModel'], 'requires_aesthetics_score': False, 'safety_checker': None} are not expected by StableDiffusionXLInstantIDPipeline and will be ignored.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 15.89 GiB of which 30.12 MiB is free. Process 2538 has 15.86 GiB memory in use. Of the allocated memory 15.31 GiB is allocated by PyTorch, and 312.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Note: To run `app-multicontrolnet.py` free collab is not sufficient.