In [None]:
from IPython.display import display, Markdown

display(Markdown(r"""
# DA623 Course Project  
## Instruction-Guided Image Editing with InstructPix2Pix  
**Harsh Vardhan**  
**210104044**

---

### Motivation

InstructPix2Pix caught my attention due to its intuitive approach to image editing—using natural language instructions instead of 
complex editing tools, masks, or full image captions. I was fascinated by how large models like GPT-3 and Stable Diffusion can 
be composed to create synthetic training data, enabling a powerful model to edit real-world images based solely on instructions 
like *"Add snow"* or *"Replace the sky with fireworks."*

---

### Historical Context & Multimodal Connections

Multimodal learning has evolved rapidly, especially at the intersection of language and vision. Prior works include:

- **Text-guided editing with CLIP** (e.g., StyleCLIP, Text2Live)  
- **Latent diffusion models** (Stable Diffusion, SDEdit)  
- **Instruction tuning in LMs** (InstructGPT, RLHF)

InstructPix2Pix stands out by combining these paradigms into a model that directly follows image editing instructions without 
the need for inversion or complex captioning.

---

### Method Overview

#### Two-Stage Pipeline:

**1. Dataset Generation**  
- Fine-tune GPT-3 on ~700 caption-instruction-edited_caption triplets  
- Generate 454K caption-instruction pairs from LAION  
- Render image pairs using **Stable Diffusion + Prompt-to-Prompt**  
- Filter pairs using **CLIP similarity** to ensure edit consistency  

**2. Model Training**  
- Train a latent diffusion model conditioned on image + instruction  
- No need for inversion or per-example optimization  
- Uses **Classifier-Free Guidance (CFG)** with dual scales:  
    - `s_I`: fidelity to input image  
    - `s_T`: strength of edit  

---

### Key Insights from the Paper & GitHub

- **One-pass editing**: No iterative optimization  
- **Quick inference (~9s/image)**  
- **Dual CFG** allows fine-grained control over edits  
- **Generalizes well to real user instructions**  
- **Codebase is clean** with a working Gradio demo  

---

### Code / Notebook (Sample Walkthrough)

```bash
# Clone and set up the environment
git clone https://github.com/timothybrooks/instruct-pix2pix.git
cd instruct-pix2pix
conda env create -f environment.yaml
conda activate ip2p
bash scripts/download_checkpoints.sh

# Run a single image edit
python edit_cli.py \
    --input imgs/input.jpg \
    --output imgs/output.jpg \
    --edit "Turn him into a cartoon character"

# Launch the interactive demo
python edit_app.py
"""))
