<a href="https://colab.research.google.com/github/nrustamli/Galanthus/blob/main/fitting_room_poc1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ü™û The Fitting Room - AI Proof of Concept

**Goal:** Generate a photorealistic image of a specific person wearing a "White T-shirt and Blue Jeans" while preserving facial identity and body shape.

**Stack:**
- Base Model: SDXL 1.0 (Stable Diffusion XL)
- Identity Adapter: IP-Adapter-FaceID (uses InsightFace embeddings for face preservation)
- Pose Control: ControlNet OpenPose
- Face Analysis: InsightFace (antelopev2 model)

---

## üöÄ Google Colab Setup

**Before running:** Go to `Runtime` ‚Üí `Change runtime type` ‚Üí Select **T4 GPU**

---

## Phase 1: Environment Setup

Run this cell to install all required dependencies:


In [None]:
# ‚Ññ Install all required dependencies (Google Colab)
# Upgrade pip first for better dependency resolution
!pip install -q --upgrade pip

# Uninstall existing packages to prevent conflicts
!pip uninstall -y mediapipe opencv-python controlnet-aux

# Core diffusers and transformers
!pip install -q diffusers[torch]>=0.25.0 transformers>=4.36.0 accelerate>=0.25.0 peft>=0.7.0

# Image processing and pose detection
!pip install -q opencv-python>=4.8.0 # Install opencv-python first
!pip install -q mediapipe              # Then install latest mediapipe
!pip install -q controlnet-aux>=0.0.7 Pillow>=10.0.0 # Then other libraries

# InsightFace for face analysis (CRITICAL for IP-Adapter-FaceID)
!pip install -q insightface>=0.7.3 onnxruntime-gpu>=1.16.0

# Utilities
!pip install -q huggingface-hub>=0.20.0 safetensors>=0.4.0 matplotlib numpy>=1.24.0
!pip install -q gdown  # For downloading models from Google Drive

print("‚úÖ All dependencies installed!")

In [None]:
# üî• Download ALL Required Models for the Fitting Room
# This cell downloads and caches all models needed for the project

import os
import torch
import subprocess
from pathlib import Path
from huggingface_hub import hf_hub_download, snapshot_download

# =============================================================================
# 1Ô∏è‚É£ INSIGHTFACE MODELS (Face Analysis & Embedding)
# =============================================================================
print("=" * 60)
print("1Ô∏è‚É£ DOWNLOADING INSIGHTFACE MODELS")
print("=" * 60)

INSIGHTFACE_ROOT = os.path.expanduser("~/.insightface/models")
os.makedirs(INSIGHTFACE_ROOT, exist_ok=True)

def download_insightface_model(model_name, urls, target_dir):
    """Download InsightFace model with fallback URLs."""
    model_dir = os.path.join(target_dir, model_name)

    # Check if model already exists
    if os.path.exists(model_dir):
        files = os.listdir(model_dir) if os.path.isdir(model_dir) else []
        if len(files) >= 4:
            print(f"   ‚úÖ {model_name} already exists ({len(files)} files)")
            return True

    print(f"üì• Downloading {model_name} model...")

    for i, url in enumerate(urls):
        try:
            print(f"   Trying source {i+1}/{len(urls)}...")
            zip_path = f"/tmp/{model_name}.zip"

            # Download
            result = subprocess.run(
                ["wget", "-q", "--show-progress", url, "-O", zip_path],
                capture_output=True, text=True, timeout=300
            )

            if result.returncode != 0:
                raise Exception(f"wget failed: {result.stderr}")

            # Check if file was downloaded
            if not os.path.exists(zip_path) or os.path.getsize(zip_path) < 1000:
                raise Exception("Downloaded file is too small or missing")

            # Unzip
            subprocess.run(["unzip", "-q", "-o", zip_path, "-d", target_dir], check=True)

            # Cleanup
            os.remove(zip_path)

            print(f"   ‚úÖ {model_name} downloaded successfully!")
            return True

        except Exception as e:
            print(f"   ‚ö†Ô∏è Source {i+1} failed: {e}")
            continue

    print(f"   ‚ùå Failed to download {model_name} from all sources")
    return False

# ---- Download antelopev2 (REQUIRED for IP-Adapter-FaceID) ----
antelopev2_urls = [
    "https://huggingface.co/MonsterMMORPG/tools/resolve/main/antelopev2.zip",
    "https://huggingface.co/DIAMONIK7777/antelopev2/resolve/main/antelopev2.zip",
    "https://huggingface.co/ashleykleynhans/inswapper/resolve/main/antelopev2.zip",
]
download_insightface_model("antelopev2", antelopev2_urls, INSIGHTFACE_ROOT)

# ---- Download buffalo_l (Full face analysis suite) ----
buffalo_urls = [
    "https://huggingface.co/datasets/Gourieff/ReActor/resolve/main/models/insightface/buffalo_l.zip",
    "https://huggingface.co/deepinsight/insightface/resolve/main/models/buffalo_l.zip",
]
download_insightface_model("buffalo_l", buffalo_urls, INSIGHTFACE_ROOT)

# ---- Download buffalo_sc (Smaller, faster model) ----
buffalo_sc_urls = [
    "https://huggingface.co/datasets/Gourieff/ReActor/resolve/main/models/insightface/buffalo_sc.zip",
]
download_insightface_model("buffalo_sc", buffalo_sc_urls, INSIGHTFACE_ROOT)

# Verify InsightFace installation
print(f"\nüìÅ InsightFace models location: {INSIGHTFACE_ROOT}")
print("\nüìã Installed models:")
for model in os.listdir(INSIGHTFACE_ROOT):
    model_path = os.path.join(INSIGHTFACE_ROOT, model)
    if os.path.isdir(model_path):
        files = os.listdir(model_path)
        print(f"   üì¶ {model}: {len(files)} files")
        for f in files[:5]:  # Show first 5 files
            print(f"      - {f}")
        if len(files) > 5:
            print(f"      ... and {len(files) - 5} more")

# =============================================================================
# 2Ô∏è‚É£ IP-ADAPTER MODELS (Identity Preservation)
# =============================================================================
print("\n" + "=" * 60)
print("2Ô∏è‚É£ DOWNLOADING IP-ADAPTER MODELS")
print("=" * 60)

# Create models directory
MODELS_DIR = Path("./models")
MODELS_DIR.mkdir(exist_ok=True)

# IP-Adapter Face models for SDXL
print("üì• Downloading IP-Adapter-Plus-Face for SDXL...")
ip_adapter_face_path = hf_hub_download(
    repo_id="h94/IP-Adapter",
    filename="sdxl_models/ip-adapter-plus-face_sdxl_vit-h.safetensors",
    local_dir=MODELS_DIR / "ip-adapter"
)
print(f"   ‚úÖ IP-Adapter-Plus-Face: {ip_adapter_face_path}")

# IP-Adapter-FaceID for SDXL (uses InsightFace embeddings directly - BETTER!)
print("üì• Downloading IP-Adapter-FaceID for SDXL (recommended)...")
try:
    ip_adapter_faceid_path = hf_hub_download(
        repo_id="h94/IP-Adapter-FaceID",
        filename="ip-adapter-faceid_sdxl.bin",
        local_dir=MODELS_DIR / "ip-adapter-faceid"
    )
    print(f"   ‚úÖ IP-Adapter-FaceID: {ip_adapter_faceid_path}")
except Exception as e:
    print(f"   ‚ö†Ô∏è Could not download IP-Adapter-FaceID: {e}")

# IP-Adapter-FaceID-Plus (combines face embedding + face image)
print("üì• Downloading IP-Adapter-FaceID-Plus for SDXL (best quality)...")
try:
    ip_adapter_faceid_plus_path = hf_hub_download(
        repo_id="h94/IP-Adapter-FaceID",
        filename="ip-adapter-faceid-plusv2_sdxl.bin",
        local_dir=MODELS_DIR / "ip-adapter-faceid"
    )
    print(f"   ‚úÖ IP-Adapter-FaceID-Plus: {ip_adapter_faceid_plus_path}")

    # Download the LoRA weights for FaceID-Plus
    ip_adapter_faceid_lora_path = hf_hub_download(
        repo_id="h94/IP-Adapter-FaceID",
        filename="ip-adapter-faceid-plusv2_sdxl_lora.safetensors",
        local_dir=MODELS_DIR / "ip-adapter-faceid"
    )
    print(f"   ‚úÖ IP-Adapter-FaceID-Plus LoRA: {ip_adapter_faceid_lora_path}")
except Exception as e:
    print(f"   ‚ö†Ô∏è Could not download IP-Adapter-FaceID-Plus: {e}")

# Download image encoder for IP-Adapter
print("üì• Downloading CLIP Image Encoder (for IP-Adapter)...")
try:
    snapshot_download(
        repo_id="h94/IP-Adapter",
        allow_patterns=["sdxl_models/image_encoder/*"],
        local_dir=MODELS_DIR / "ip-adapter"
    )
    print("   ‚úÖ CLIP Image Encoder downloaded!")
except Exception as e:
    print(f"   ‚ö†Ô∏è Could not download image encoder: {e}")

# =============================================================================
# 3Ô∏è‚É£ CONTROLNET MODELS (Pose Control)
# =============================================================================
print("\n" + "=" * 60)
print("3Ô∏è‚É£ DOWNLOADING CONTROLNET MODELS")
print("=" * 60)

# ControlNet OpenPose for SDXL
print("üì• Pre-caching ControlNet OpenPose for SDXL...")
try:
    from diffusers import ControlNetModel
    controlnet_pose = ControlNetModel.from_pretrained(
        "thibaud/controlnet-openpose-sdxl-1.0",
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        cache_dir=MODELS_DIR / "controlnet"
    )
    del controlnet_pose  # Free memory after caching
    print("   ‚úÖ ControlNet OpenPose cached!")
except Exception as e:
    print(f"   ‚ö†Ô∏è Could not pre-cache ControlNet: {e}")
    print("      (Will be downloaded on first use)")

# =============================================================================
# 4Ô∏è‚É£ OPENPOSE MODELS (for controlnet-aux)
# =============================================================================
print("\n" + "=" * 60)
print("4Ô∏è‚É£ PRE-CACHING OPENPOSE DETECTOR")
print("=" * 60)

print("üì• Pre-caching OpenPose detector...")
try:
    from controlnet_aux import OpenposeDetector
    openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
    del openpose  # Free memory after caching
    print("   ‚úÖ OpenPose detector cached!")
except Exception as e:
    print(f"   ‚ö†Ô∏è Could not pre-cache OpenPose: {e}")

# =============================================================================
# üìä SUMMARY
# =============================================================================
print("\n" + "=" * 60)
print("üìä MODEL DOWNLOAD SUMMARY")
print("=" * 60)
print("""
‚úÖ InsightFace Models (Face Analysis):
   - antelopev2: Face detection + embedding (for IP-Adapter-FaceID)
   - buffalo_l: Full suite (age, gender, expression, embedding)

‚úÖ IP-Adapter Models (Identity Preservation):
   - ip-adapter-plus-face_sdxl: Standard face adapter
   - ip-adapter-faceid_sdxl: Uses InsightFace embeddings (recommended)
   - ip-adapter-faceid-plusv2_sdxl: Best quality (face + embedding)

‚úÖ ControlNet Models (Pose Control):
   - controlnet-openpose-sdxl: Body pose control

‚úÖ Auxiliary Models:
   - OpenPose detector: Pose extraction
   - CLIP image encoder: For IP-Adapter

üöÄ All models are ready! Proceed to the next cells.
""")

print(f"\nüìÅ Models cached in: {MODELS_DIR.resolve()}")
!du -sh {MODELS_DIR}


In [None]:
# üß™ Verify InsightFace Installation
# This cell tests that InsightFace models are properly installed and working

import os  # IMPORTANT: ensure os is imported
from insightface.app import FaceAnalysis
import numpy as np

print("=" * 60)
print("üß™ VERIFYING INSIGHTFACE INSTALLATION")
print("=" * 60)

# Define the InsightFace root directory
INSIGHTFACE_ROOT = os.path.expanduser("~/.insightface")

# Test loading antelopev2 model (REQUIRED for IP-Adapter-FaceID)
print("\nüì¶ Testing antelopev2 model (required for face embeddings)...")
try:
    app_antelope = FaceAnalysis(
        name='antelopev2',
        root=INSIGHTFACE_ROOT,
        providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
    )
    app_antelope.prepare(ctx_id=0, det_size=(640, 640))
    print("   ‚úÖ antelopev2 loaded successfully!")
    print(f"   üìä Models loaded: {list(app_antelope.models.keys())}")

    # Verify the model has the required components
    required_models = ['detection', 'recognition']
    for model_type in required_models:
        if model_type in app_antelope.models:
            print(f"   ‚úì {model_type} model available")
        else:
            print(f"   ‚ö†Ô∏è {model_type} model missing - this may cause issues")

    del app_antelope
except Exception as e:
    print(f"   ‚ùå Failed to load antelopev2: {e}")
    print("   üí° Run Cell 2 to download the required models")

# Test loading buffalo_l model (alternative model)
print("\nüì¶ Testing buffalo_l model (alternative face analysis suite)...")
try:
    app_buffalo = FaceAnalysis(
        name='buffalo_l',
        root=INSIGHTFACE_ROOT,
        providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
    )
    app_buffalo.prepare(ctx_id=0, det_size=(640, 640))
    print("   ‚úÖ buffalo_l loaded successfully!")
    print(f"   üìä Models loaded: {list(app_buffalo.models.keys())}")
    del app_buffalo
except Exception as e:
    print(f"   ‚ö†Ô∏è buffalo_l not available: {e}")
    print("   (This is optional - antelopev2 is sufficient)")

# Create a test image to verify face detection works
print("\nüîç Testing face detection with a dummy image...")
try:
    # Create a simple test image (blank)
    test_image = np.zeros((640, 640, 3), dtype=np.uint8)

    app_test = FaceAnalysis(
        name='antelopev2',
        root=INSIGHTFACE_ROOT,
        providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
    )
    app_test.prepare(ctx_id=0, det_size=(640, 640))

    # Run detection (should return empty list for blank image)
    faces = app_test.get(test_image)
    print(f"   ‚úÖ Face detection working! (Found {len(faces)} faces in test image)")
    del app_test
except Exception as e:
    print(f"   ‚ùå Face detection test failed: {e}")

print("\n" + "=" * 60)
print("‚úÖ INSIGHTFACE VERIFICATION COMPLETE")
print("=" * 60)
print("""
üìã InsightFace provides the following capabilities:
   - Face Detection (RetinaFace/SCRFD) - locates faces in images
   - Face Recognition (ArcFace embeddings) - 512-dimensional identity vector
   - Face Landmarks (2D & 3D) - 5 keypoints for alignment
   - Face Attributes (age, gender) - demographic analysis

üîó The 512-dim face embeddings from ArcFace are used by IP-Adapter-FaceID
   for identity preservation in image generation.
""")


In [None]:
import torch
import cv2
import numpy as np
from PIL import Image
import os
from pathlib import Path
import time

# Check CUDA availability
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")


## Phase 2: Upload Your Images

We need two inputs:
1. **Face Image (selfie)**: A clear photo of your face for identity preservation
2. **Pose Image (full body)**: A full-body photo for body shape and pose reference

**Run the cell below to upload your images:**


In [None]:
# üì§ Set up your images
# Choose between Google Colab upload or Local file paths

import os
from pathlib import Path

# ============================================================
# CONFIGURATION: Set this based on your environment
# ============================================================
USE_COLAB = True  # Set to True for Google Colab, False for local development

if USE_COLAB:
    # Google Colab: Upload images interactively
    from google.colab import files
    import io

    print("üì∏ Please upload your FACE IMAGE (selfie):")
    uploaded_face = files.upload()
    FACE_IMAGE_NAME = list(uploaded_face.keys())[0]
    print(f"   ‚úÖ Uploaded: {FACE_IMAGE_NAME}")

    print("\nüì∏ Please upload your POSE IMAGE (full body):")
    uploaded_pose = files.upload()
    POSE_IMAGE_NAME = list(uploaded_pose.keys())[0]
    print(f"   ‚úÖ Uploaded: {POSE_IMAGE_NAME}")

    # Save uploaded files
    with open(FACE_IMAGE_NAME, 'wb') as f:
        f.write(uploaded_face[FACE_IMAGE_NAME])
    with open(POSE_IMAGE_NAME, 'wb') as f:
        f.write(uploaded_pose[POSE_IMAGE_NAME])

    FACE_IMAGE_PATH = Path(FACE_IMAGE_NAME)
    POSE_IMAGE_PATH = Path(POSE_IMAGE_NAME)
else:
    # Local Development: Use pre-existing images in the project
    # Images are located in the parent directory of proof_of_concept
    PROJECT_ROOT = Path("..").resolve()  # Go up from proof_of_concept

    # Available images in the project:
    # - selfie.jpg (for face/identity)
    # - full_body.jpeg (for pose reference)
    # - closer_body_and_face.jpeg (alternative)

    FACE_IMAGE_PATH = PROJECT_ROOT / "selfie.jpg"
    POSE_IMAGE_PATH = PROJECT_ROOT / "full_body.jpeg"

    # Verify files exist
    if not FACE_IMAGE_PATH.exists():
        raise FileNotFoundError(f"Face image not found: {FACE_IMAGE_PATH}")
    if not POSE_IMAGE_PATH.exists():
        raise FileNotFoundError(f"Pose image not found: {POSE_IMAGE_PATH}")

    print(f"üìÇ Using local images from: {PROJECT_ROOT}")

# Output directory
OUTPUT_DIR = Path("./outputs")
OUTPUT_DIR.mkdir(exist_ok=True)

print("\n" + "=" * 50)
print("üìã IMAGE CONFIGURATION")
print("=" * 50)
print(f"üë§ Face image: {FACE_IMAGE_PATH}")
print(f"ü¶¥ Pose image: {POSE_IMAGE_PATH}")
print(f"üìÅ Output dir: {OUTPUT_DIR.resolve()}")
print(f"üñ•Ô∏è Environment: {'Google Colab' if USE_COLAB else 'Local Development'}")

In [None]:
from diffusers.utils import load_image
import matplotlib.pyplot as plt

# Load the input images
face_image = load_image(str(FACE_IMAGE_PATH))
pose_image = load_image(str(POSE_IMAGE_PATH))

# Display the input images
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
axes[0].imshow(face_image)
axes[0].set_title("Face Image (Identity Source)")
axes[0].axis('off')

axes[1].imshow(pose_image)
axes[1].set_title("Pose Image (Body Shape Source)")
axes[1].axis('off')

plt.tight_layout()
plt.show()


## Phase 3: Extract Face Embeddings (InstantID)

Using InsightFace's `antelopev2` model to extract facial features that will preserve identity.


In [None]:
# üîç Initialize InsightFace for Face Embedding Extraction
# The antelopev2 model provides ArcFace embeddings used by IP-Adapter-FaceID

import os
import cv2
import numpy as np
from insightface.app import FaceAnalysis

# Define paths
INSIGHTFACE_ROOT = os.path.expanduser("~/.insightface")
model_dir = os.path.join(INSIGHTFACE_ROOT, "models", "antelopev2")

# Download antelopev2 model if not present
if not os.path.exists(model_dir) or len(os.listdir(model_dir)) < 4:
    print("üì• Downloading antelopev2 model from HuggingFace...")
    os.makedirs(model_dir, exist_ok=True)

    # Try multiple sources for reliability
    download_urls = [
        "https://huggingface.co/MonsterMMORPG/tools/resolve/main/antelopev2.zip",
        "https://huggingface.co/DIAMONIK7777/antelopev2/resolve/main/antelopev2.zip",
    ]

    success = False
    for url in download_urls:
        try:
            print(f"   Trying: {url.split('/')[-2]}...")
            !wget -q --show-progress "{url}" -O /tmp/antelopev2.zip
            !unzip -q -o /tmp/antelopev2.zip -d {INSIGHTFACE_ROOT}/models/
            !rm /tmp/antelopev2.zip
            success = True
            print("   ‚úÖ Model downloaded successfully!")
            break
        except Exception as e:
            print(f"   ‚ö†Ô∏è Failed: {e}")
            continue

    if not success:
        raise RuntimeError("‚ùå Could not download antelopev2 model. Please download manually.")
else:
    print(f"‚úÖ antelopev2 model already exists at: {model_dir}")

# Initialize InsightFace FaceAnalysis
print("\nüîß Loading InsightFace model...")
app = FaceAnalysis(
    name='antelopev2',
    root=INSIGHTFACE_ROOT,
    providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)
app.prepare(ctx_id=0, det_size=(640, 640))

# Verify the model loaded correctly
print("‚úÖ InsightFace model loaded!")
print(f"   üìä Available models: {list(app.models.keys())}")
print(f"   üéØ Detection size: 640x640")
print(f"   üí° This model will extract 512-dim face embeddings for identity preservation")


In [None]:
# üîç Extract Face Embedding from the Selfie
# InsightFace extracts a 512-dimensional ArcFace embedding that captures facial identity

import matplotlib.pyplot as plt

# Convert PIL image to OpenCV format (BGR)
face_image_cv2 = cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR)

# Run face detection and analysis
print("üîç Detecting faces in the selfie...")
face_info_list = app.get(face_image_cv2)

if len(face_info_list) == 0:
    raise ValueError("""
‚ùå No face detected in the selfie image!

Possible causes:
1. Face is too small in the image
2. Face is partially occluded
3. Lighting is too dark/bright
4. Image resolution is too low

Solutions:
- Use a clear, well-lit selfie
- Ensure face takes up at least 20% of the image
- Face should be looking directly at the camera
""")

print(f"   Found {len(face_info_list)} face(s) in image")

# Get the largest face (in case multiple faces detected)
# Sorting by face bounding box area (width * height)
face_info = sorted(
    face_info_list,
    key=lambda x: (x.bbox[2] - x.bbox[0]) * (x.bbox[3] - x.bbox[1])
)[-1]

# Extract the embedding (512-dimensional vector)
face_emb = face_info.embedding  # numpy array of shape (512,)

# Extract face keypoints (5 points: left_eye, right_eye, nose, left_mouth, right_mouth)
face_kps = face_info.kps  # numpy array of shape (5, 2)

# Extract bounding box for visualization
face_bbox = face_info.bbox.astype(int)  # [x1, y1, x2, y2]

print(f"\n‚úÖ Face embedding extracted successfully!")
print(f"   üìä Embedding shape: {face_emb.shape} (512-dim ArcFace vector)")
print(f"   üìç Keypoints shape: {face_kps.shape} (5 facial landmarks)")
print(f"   üìê Face bounding box: {face_bbox}")

# Visualize the detected face with landmarks
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Original image
axes[0].imshow(face_image)
axes[0].set_title("Original Selfie", fontsize=12)
axes[0].axis('off')

# Image with face detection overlay
face_image_annotated = np.array(face_image).copy()

# Draw bounding box
cv2.rectangle(
    face_image_annotated,
    (face_bbox[0], face_bbox[1]),
    (face_bbox[2], face_bbox[3]),
    (0, 255, 0), 3
)

# Draw keypoints
colors = [(255, 0, 0), (255, 0, 0), (0, 255, 0), (0, 0, 255), (0, 0, 255)]
labels = ['L Eye', 'R Eye', 'Nose', 'L Mouth', 'R Mouth']
for i, (kp, color) in enumerate(zip(face_kps, colors)):
    cv2.circle(face_image_annotated, (int(kp[0]), int(kp[1])), 5, color, -1)

axes[1].imshow(face_image_annotated)
axes[1].set_title("Detected Face + Landmarks", fontsize=12)
axes[1].axis('off')

plt.tight_layout()
plt.show()

print("\nüí° The 512-dim face embedding captures your unique facial identity.")
print("   This will be used by IP-Adapter-FaceID to preserve your face in generated images.")


## Phase 4: Extract Body Pose (OpenPose)

Convert the full-body photo into a skeleton/pose map so the AI knows body positioning.


In [None]:
from controlnet_aux import OpenposeDetector

# Initialize OpenPose detector
print("Loading OpenPose detector...")
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
print("OpenPose detector loaded!")

In [None]:
# Extract pose from full body image
pose_map = openpose(pose_image, include_body=True, include_hand=False, include_face=False)

# Display the pose map
fig, axes = plt.subplots(1, 2, figsize=(12, 8))
axes[0].imshow(pose_image)
axes[0].set_title("Original Full Body")
axes[0].axis('off')

axes[1].imshow(pose_map)
axes[1].set_title("Extracted Pose Map")
axes[1].axis('off')

plt.tight_layout()
plt.show()

# Save pose map for reference
pose_map.save(OUTPUT_DIR / "pose_map.png")
print(f"‚úÖ Pose map saved to {OUTPUT_DIR / 'pose_map.png'}")


## Phase 5: Load the Generation Pipeline

Load the SDXL base model with ControlNet for pose control and IP-Adapter for identity preservation.


In [None]:
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

# Determine device and dtype
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32

print(f"Using device: {device}")
print(f"Using dtype: {dtype}")


In [None]:
# Load ControlNet for OpenPose (body pose)
print("Loading ControlNet for OpenPose...")
controlnet_pose = ControlNetModel.from_pretrained(
    "thibaud/controlnet-openpose-sdxl-1.0",
    torch_dtype=dtype
)
print("‚úÖ ControlNet for pose loaded!")


In [None]:
# Load the main SDXL Pipeline with ControlNet
print("Loading SDXL Pipeline with ControlNet (this may take a few minutes)...")
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet_pose,
    torch_dtype=dtype,
    variant="fp16" if dtype == torch.float16 else None
).to(device)
print("‚úÖ SDXL Pipeline loaded!")


In [None]:
# üìã IP-Adapter Loading Options
#
# This cell initializes flags for IP-Adapter loading.
# The actual loading happens in the next cell which uses InsightFace-detected face.
#
# IP-Adapter Options for SDXL:
# 1. ip-adapter-plus-face_sdxl_vit-h.safetensors - Optimized for face images
# 2. ip-adapter-faceid_sdxl.bin - Uses InsightFace embeddings directly (requires projection)
#
# We use option 1 because it's more stable and works well with InsightFace-cropped faces.

print("üìã IP-Adapter Configuration")
print("=" * 50)
print("""
IP-Adapter will be loaded with the face region detected by InsightFace.

Why use InsightFace + IP-Adapter together?
1. InsightFace detects and crops the face region accurately
2. InsightFace extracts 512-dim embeddings for identity analysis
3. IP-Adapter uses the cropped face image for generation conditioning
4. This combination provides better identity preservation than full-image input

The IP-Adapter will be loaded in the next cell...
""")

# Initialize flags
IP_ADAPTER_LOADED = False
IP_ADAPTER_TYPE = None
IP_ADAPTER_SCALE = 0.6


### üîß Loading IP-Adapter with InsightFace Face Detection

This cell loads IP-Adapter-Plus-Face and uses the face region detected by InsightFace.

**How InsightFace improves identity preservation:**
1. **Accurate Face Detection**: InsightFace's SCRFD detector precisely locates faces
2. **Face Cropping**: We crop the detected face region with padding for context
3. **Embedding Reference**: The 512-dim ArcFace embedding can be used for validation
4. **Better Input**: IP-Adapter receives a clean, centered face image instead of full photo

**Technical Note:** While IP-Adapter-FaceID can use raw InsightFace embeddings directly,
we use IP-Adapter-Plus-Face with InsightFace-cropped images for better stability with SDXL.


In [None]:
# üÜï Load IP-Adapter-FaceID for BETTER identity preservation
# This uses InsightFace face embeddings directly (from antelopev2 model)
#
# IP-Adapter-FaceID is specifically trained to work with InsightFace embeddings
# and provides better identity preservation than standard IP-Adapter
#
# ‚ö†Ô∏è NOTE: For IP-Adapter-FaceID with SDXL, we use the standard IP-Adapter-Plus-Face
#    because the FaceID SDXL version requires additional projection layers.
#    The face embedding will be converted to a face image representation.

print("=" * 60)
print("üîß LOADING IP-ADAPTER FOR IDENTITY PRESERVATION")
print("=" * 60)

# Strategy: Use IP-Adapter-Plus-Face which works well with face images
# We'll use the face image (cropped from the detected face) for conditioning

USE_FACEID_APPROACH = True  # Use face-focused approach

if USE_FACEID_APPROACH:
    print("\nüì• Loading IP-Adapter-Plus-Face for SDXL...")

    try:
        # Load IP-Adapter-Plus-Face (optimized for face images)
        pipe.load_ip_adapter(
            "h94/IP-Adapter",
            subfolder="sdxl_models",
            weight_name="ip-adapter-plus-face_sdxl_vit-h.safetensors"
        )

        # Set IP-Adapter scale (controls identity strength vs prompt adherence)
        # 0.5-0.7 is a good balance; higher = more identity, less outfit accuracy
        IP_ADAPTER_SCALE = 0.6
        pipe.set_ip_adapter_scale(IP_ADAPTER_SCALE)

        IP_ADAPTER_LOADED = True
        IP_ADAPTER_TYPE = "Plus-Face"

        print("‚úÖ IP-Adapter-Plus-Face loaded successfully!")
        print(f"   üìä IP-Adapter Scale: {IP_ADAPTER_SCALE}")
        print("   üí° This adapter is optimized for face identity preservation")

    except Exception as e:
        print(f"‚ùå Could not load IP-Adapter: {e}")
        IP_ADAPTER_LOADED = False
        IP_ADAPTER_TYPE = None
        print("\n‚ö†Ô∏è Proceeding without identity preservation (pose-only mode)")

# Prepare the face image for IP-Adapter conditioning
# We'll crop the face region detected by InsightFace for better results
if IP_ADAPTER_LOADED:
    print("\nüñºÔ∏è Preparing face image for IP-Adapter...")

    # Get the face bounding box (with some padding for context)
    x1, y1, x2, y2 = face_bbox

    # Add 30% padding around the face
    face_width = x2 - x1
    face_height = y2 - y1
    padding_x = int(face_width * 0.3)
    padding_y = int(face_height * 0.3)

    # Ensure we don't go out of image bounds
    img_width, img_height = face_image.size
    x1_padded = max(0, x1 - padding_x)
    y1_padded = max(0, y1 - padding_y)
    x2_padded = min(img_width, x2 + padding_x)
    y2_padded = min(img_height, y2 + padding_y)

    # Crop the face region
    face_cropped = face_image.crop((x1_padded, y1_padded, x2_padded, y2_padded))

    # Resize to 224x224 (IP-Adapter expected size)
    face_for_ip_adapter = face_cropped.resize((224, 224))

    print(f"   ‚úÖ Face cropped and resized to 224x224")
    print(f"   üìê Original crop region: ({x1_padded}, {y1_padded}) to ({x2_padded}, {y2_padded})")

    # Display the cropped face
    plt.figure(figsize=(4, 4))
    plt.imshow(face_for_ip_adapter)
    plt.title("Face for IP-Adapter")
    plt.axis('off')
    plt.show()

print("\n" + "=" * 60)
print("üìä IP-ADAPTER CONFIGURATION SUMMARY")
print("=" * 60)
print(f"   Type: {IP_ADAPTER_TYPE if IP_ADAPTER_LOADED else 'None'}")
print(f"   Loaded: {'‚úÖ Yes' if IP_ADAPTER_LOADED else '‚ùå No'}")
print(f"   Scale: {IP_ADAPTER_SCALE if IP_ADAPTER_LOADED else 'N/A'}")
print(f"\nüí° The face embedding from InsightFace helps identify who to preserve,")
print("   while the cropped face image provides visual reference for the adapter.")


In [None]:
# Enable memory optimizations for GPU
if device == "cuda":
    try:
        pipe.enable_model_cpu_offload()
        print("‚úÖ Model CPU offload enabled for memory optimization")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not enable CPU offload: {e}")

    # Uncomment if you have xformers installed for faster inference
    # pipe.enable_xformers_memory_efficient_attention()


## Phase 6: Generate the Image! üé®

Now we combine everything to generate the person wearing "White T-Shirt + Blue Jeans"


In [None]:
# The Prompt defining the outfit
prompt = """A photorealistic full-body shot of a young woman wearing a plain white cotton t-shirt,
dark blue denim jeans, white sneakers, standing in a bright modern studio with neutral background,
natural lighting, 8k resolution, highly detailed, professional photography, masterpiece"""

negative_prompt = """deformed, ugly, bad anatomy, cartoon, drawing, messy, blurry, low quality,
bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs,
missing arms, missing legs, mutated hands, poorly drawn face, poorly drawn hands"""

print("‚úÖ Prompts configured!")
print(f"\nüìù Positive prompt:\n{prompt[:150]}...")
print(f"\nüö´ Negative prompt:\n{negative_prompt[:100]}...")


In [None]:
# üìê Resize images to match generation dimensions
generation_width = 768
generation_height = 1024  # Portrait orientation for full-body

# Resize pose map to generation size
pose_map_resized = pose_map.resize((generation_width, generation_height))

# Note: face_for_ip_adapter was already prepared in the IP-Adapter loading cell
# It's the InsightFace-cropped face region resized to 224x224

print("=" * 50)
print("üìê IMAGE DIMENSIONS CONFIGURED")
print("=" * 50)
print(f"\nüé® Generation size: {generation_width}x{generation_height} (portrait)")
print(f"ü¶¥ Pose map resized to: {pose_map_resized.size}")
print(f"üë§ Face image for IP-Adapter: {face_for_ip_adapter.size if 'face_for_ip_adapter' in dir() else 'Not prepared yet'}")

# Show the resized pose map
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(10, 6))
axes[0].imshow(pose_map_resized)
axes[0].set_title(f"Pose Map ({generation_width}x{generation_height})")
axes[0].axis('off')

if 'face_for_ip_adapter' in dir():
    axes[1].imshow(face_for_ip_adapter)
    axes[1].set_title("Face for IP-Adapter (224x224)")
    axes[1].axis('off')
else:
    axes[1].text(0.5, 0.5, "Face not prepared", ha='center', va='center')
    axes[1].axis('off')

plt.tight_layout()
plt.show()


In [None]:
# üéõÔ∏è Generation Parameters - TUNE THESE FOR BEST RESULTS
# This cell configures all parameters for the image generation

print("=" * 60)
print("üéõÔ∏è CONFIGURING GENERATION PARAMETERS")
print("=" * 60)

# Core generation config
generation_config = {
    "prompt": prompt,
    "negative_prompt": negative_prompt,
    "image": pose_map_resized,  # ControlNet condition (pose skeleton)
    "controlnet_conditioning_scale": 0.8,  # Pose control strength (0.5-1.0)
    "num_inference_steps": 30,  # Quality vs speed tradeoff (20-50)
    "guidance_scale": 7.5,  # How strictly to follow prompt (5-15)
    "width": generation_width,
    "height": generation_height,
    "generator": torch.Generator(device=device).manual_seed(42),  # Reproducibility
}

# Add IP-Adapter conditioning for identity preservation
if IP_ADAPTER_LOADED:
    # Use the InsightFace-detected and cropped face image
    # This provides better identity preservation than using the full selfie
    generation_config["ip_adapter_image"] = face_for_ip_adapter

    print(f"\n‚úÖ IP-Adapter configured with InsightFace-cropped face")
    print(f"   üìä IP-Adapter Type: {IP_ADAPTER_TYPE}")
    print(f"   üìä IP-Adapter Scale: {IP_ADAPTER_SCALE}")
    print(f"   üîó Face embedding shape: {face_emb.shape} (for reference)")
else:
    print("\n‚ö†Ô∏è IP-Adapter not loaded - proceeding with pose-only generation")
    print("   Identity may not be preserved without IP-Adapter")

# Display parameter summary
print("\n" + "=" * 60)
print("üìä GENERATION PARAMETER SUMMARY")
print("=" * 60)
print(f"""
üé® Image Generation:
   - Width: {generation_width}px
   - Height: {generation_height}px
   - Inference Steps: {generation_config['num_inference_steps']}
   - Guidance Scale: {generation_config['guidance_scale']}

ü¶¥ Pose Control (ControlNet):
   - Conditioning Scale: {generation_config['controlnet_conditioning_scale']}
   - Pose extracted from: pose_image

üë§ Identity Preservation (IP-Adapter):
   - Enabled: {'‚úÖ Yes' if IP_ADAPTER_LOADED else '‚ùå No'}
   - Type: {IP_ADAPTER_TYPE if IP_ADAPTER_LOADED else 'N/A'}
   - Face source: InsightFace-cropped region

üé≤ Reproducibility:
   - Seed: 42
""")

# Tuning tips
print("=" * 60)
print("üí° TUNING TIPS")
print("=" * 60)
print("""
If results need improvement, adjust these parameters:

1. IDENTITY TOO WEAK (face doesn't look like input):
   ‚Üí Increase IP-Adapter scale: pipe.set_ip_adapter_scale(0.7-0.9)

2. OUTFIT IGNORED (wrong clothes):
   ‚Üí Decrease IP-Adapter scale: pipe.set_ip_adapter_scale(0.4-0.5)
   ‚Üí Increase guidance_scale to 9-12

3. POSE INCORRECT (body position wrong):
   ‚Üí Increase controlnet_conditioning_scale to 0.9-1.0

4. QUALITY ISSUES (blurry/artifacts):
   ‚Üí Increase num_inference_steps to 40-50
""")


In [None]:
# üé® Run the generation!
print("üé® Starting image generation...")
print("   This may take 15-60 seconds depending on your GPU...")
print()

start_time = time.time()

result = pipe(**generation_config)
generated_image = result.images[0]

generation_time = time.time() - start_time
print(f"\n‚úÖ Generation complete in {generation_time:.2f} seconds!")

# Check if meets the 20-second criteria
if generation_time < 20:
    print("   üèÜ SUCCESS: Meets the 20-second performance target!")
else:
    print(f"   ‚ö†Ô∏è Note: Generation took {generation_time:.0f}s (target: <20s on T4 GPU)")


In [None]:
# Display the result comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 10))

axes[0].imshow(face_image)
axes[0].set_title("Input: Face (Identity)", fontsize=14)
axes[0].axis('off')

axes[1].imshow(pose_image)
axes[1].set_title("Input: Pose Reference", fontsize=14)
axes[1].axis('off')

axes[2].imshow(generated_image)
axes[2].set_title("Output: Generated (White Tee + Jeans)", fontsize=14)
axes[2].axis('off')

plt.suptitle("ü™û The Fitting Room - AI Virtual Try-On Result", fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig(OUTPUT_DIR / "comparison.png", dpi=150, bbox_inches='tight')
plt.show()

print(f"\nüìä Comparison saved to: {OUTPUT_DIR / 'comparison.png'}")


In [None]:
# üíæ Save the generated image
output_path = OUTPUT_DIR / "result.png"
generated_image.save(output_path)
print(f"üñºÔ∏è Result saved to: {output_path.resolve()}")

# Download for Google Colab users
if USE_COLAB:
    from google.colab import files
    files.download(str(output_path))
    print("üì• Download started!")
else:
    print(f"\nüìÇ Image saved locally at: {output_path.resolve()}")
    print("   Open this path in your file browser to view the result.")


## Phase 7: Tuning & Success Metrics

### üéõÔ∏è Tuning Guide

1. **If the face looks like a caricature:**
   - Reduce `ip_adapter_scale` from 0.6 to 0.4
   
2. **If the clothes are ignored:**
   - Increase `guidance_scale` from 7.5 to 9 or 10
   - Make prompt more descriptive
   
3. **If the body shape is wrong:**
   - Increase `controlnet_conditioning_scale` to 0.9 or 1.0
   - Ensure pose image has correct aspect ratio

### ‚úÖ Success Criteria
- [ ] The generated face is recognizable as the input user
- [ ] The outfit is consistently White T-Shirt + Jeans
- [ ] The generation time is under 20 seconds on a T4 GPU


In [None]:
# üé≤ Helper function to generate variations with different parameters
def generate_variation(seed: int, outfit_prompt: str = None, ip_scale: float = 0.6,
                       controlnet_scale: float = 0.8, guidance: float = 7.5):
    """
    Generate a variation with different parameters.

    Args:
        seed: Random seed for reproducibility
        outfit_prompt: Custom prompt for different outfits
        ip_scale: IP-Adapter scale (0.0-1.0, controls identity strength)
        controlnet_scale: ControlNet scale (0.0-1.0, controls pose adherence)
        guidance: Guidance scale (5-15, controls prompt adherence)

    Returns:
        PIL Image of the generated result
    """
    # Copy the base configuration
    config = generation_config.copy()

    # Override parameters
    config["generator"] = torch.Generator(device=device).manual_seed(seed)
    config["controlnet_conditioning_scale"] = controlnet_scale
    config["guidance_scale"] = guidance

    # Use custom prompt if provided
    if outfit_prompt:
        config["prompt"] = outfit_prompt

    # Update IP-Adapter scale if loaded
    if IP_ADAPTER_LOADED:
        pipe.set_ip_adapter_scale(ip_scale)
        # Ensure the face image is in the config
        config["ip_adapter_image"] = face_for_ip_adapter

    # Generate the image
    result = pipe(**config)
    return result.images[0]

print("‚úÖ Variation helper function defined")
print("""
Usage examples:
    # Generate with different seed
    img = generate_variation(seed=123)

    # Generate with different outfit
    img = generate_variation(seed=42, outfit_prompt="person wearing a red dress")

    # Adjust identity preservation
    img = generate_variation(seed=42, ip_scale=0.8)  # Stronger identity
    img = generate_variation(seed=42, ip_scale=0.4)  # Weaker identity, better outfit
""")


In [None]:
# Generate 3 variations with different seeds
print("üé≤ Generating variations with different seeds...")
variations = []
seeds = [42, 123, 456]

for seed in seeds:
    print(f"   Generating with seed {seed}...")
    var_img = generate_variation(seed)
    variations.append(var_img)
    var_img.save(OUTPUT_DIR / f"variation_seed_{seed}.png")

print("‚úÖ Variations complete!")


In [None]:
# Display all variations
fig, axes = plt.subplots(1, 3, figsize=(18, 10))

for i, (img, seed) in enumerate(zip(variations, seeds)):
    axes[i].imshow(img)
    axes[i].set_title(f"Variation (Seed: {seed})", fontsize=14)
    axes[i].axis('off')

plt.suptitle("üé≤ Generated Variations", fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig(OUTPUT_DIR / "variations.png", dpi=150, bbox_inches='tight')
plt.show()


## üéÅ Bonus: Try Different Outfits

Test with different clothing descriptions to verify the system works for various outfits.


In [None]:
# Different outfit prompts to test
outfits = {
    "black_dress": """A photorealistic full-body shot of a young woman wearing an elegant black midi dress,
    black high heels, standing in a bright modern studio, natural lighting, 8k resolution, masterpiece""",

    "casual_summer": """A photorealistic full-body shot of a young woman wearing a floral summer dress,
    white sandals, standing in a bright sunny studio, natural lighting, 8k resolution, masterpiece""",

    "business_casual": """A photorealistic full-body shot of a young woman wearing a beige blazer,
    white blouse, black dress pants, black heels, standing in a modern office setting, 8k resolution, masterpiece""",

    "sporty": """A photorealistic full-body shot of a young woman wearing a navy blue sports hoodie,
    black yoga pants, white running shoes, standing in a modern gym, natural lighting, 8k resolution, masterpiece"""
}

print("Available test outfits:")
for name, prompt in outfits.items():
    print(f"  üè∑Ô∏è {name}")


In [None]:
# Uncomment to generate an alternative outfit
# outfit_name = "black_dress"
# print(f"üé® Generating '{outfit_name}' outfit...")
# alt_result = generate_variation(seed=42, outfit_prompt=outfits[outfit_name])
# alt_result.save(OUTPUT_DIR / f"{outfit_name}.png")
#
# plt.figure(figsize=(8, 12))
# plt.imshow(alt_result)
# plt.title(f"Alternative Outfit: {outfit_name}", fontsize=14)
# plt.axis('off')
# plt.show()
# print(f"‚úÖ Saved to: {OUTPUT_DIR / f'{outfit_name}.png'}")


---

## ‚úÖ POC Complete!

### üìä Summary

This proof of concept demonstrates:
1. **Identity Preservation** using IP-Adapter with face embeddings from InsightFace
2. **Pose Control** using ControlNet with OpenPose skeleton detection
3. **Outfit Generation** using descriptive prompts with SDXL 1.0

### üìÅ Files Generated
- `outputs/result.png` - Main generated image
- `outputs/pose_map.png` - Extracted pose skeleton
- `outputs/comparison.png` - Side-by-side comparison
- `outputs/variations.png` - Multiple seed variations

Run the cell below to download all generated files!


In [None]:
# üì• Package all generated files
import shutil

# Create a zip of all outputs
zip_path = shutil.make_archive('fitting_room_results', 'zip', OUTPUT_DIR)
print(f"üì¶ Results packaged: {zip_path}")

# Download for Google Colab users
if USE_COLAB:
    from google.colab import files
    files.download('fitting_room_results.zip')
    print("üì• Download started!")
else:
    print(f"\nüìÇ All results saved in: {OUTPUT_DIR.resolve()}")
    print(f"üì¶ Zip archive created: {Path(zip_path).resolve()}")
    print("\nGenerated files:")
    for f in OUTPUT_DIR.glob("*"):
        print(f"   - {f.name}")
