## Stable Diffusion Image Generation

- Use of Stable Diffusion for Image Generation

### Table of Contents
1. [Hugging Face Login](#section1)
2. [Stable Diffusion Pipeline](#section2)
3. [Image Generation](#section3)
4. [Evaluation for Generated Images](#section4)

In [None]:
!pip install diffusers transformers accelerate scipy safetensors
# diffusers: Stable Diffusion models
# transformers: Required backend tools
# accelerate: Handles hardware (CPU/GPU) management
# safetensors: Secure model loading



### 1. Hugging Face Login-in <a id="section1"></a>

In [None]:
# Login into hugging face
from huggingface_hub import login

login(token="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")  # Paste your token here


In [None]:
import torch
torch.cuda.is_available()


False

### 2. Stable Diffusion Pipeline <a id="section2"></a>

In [None]:
# Loading the Stable Diffusion Pipeline
import torch
from diffusers import StableDiffusionPipeline

# Load with float16 for GPU use
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

In [None]:
# Folder to save generated images
import os
output_folder = "Generated_images_SD"
os.makedirs(output_folder, exist_ok = True)

### 3. Image Generation <a id="section3"></a>

In [None]:
import pandas as pd
from tqdm import tqdm

# Load CSV
df = pd.read_csv("Image_prompts_modified.csv")

# Loop through prompts
for idx, row in tqdm(df.iterrows(), total=len(df)):
    prompt = row["Prompt"]
    image = pipe(prompt).images[0]
    image.save(f"Generated_images_SD/image_{idx+1:03d}.png")

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  1%|          | 1/100 [00:09<15:52,  9.62s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  2%|▏         | 2/100 [00:17<13:39,  8.37s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  3%|▎         | 3/100 [00:24<12:58,  8.02s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  4%|▍         | 4/100 [00:32<12:33,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  5%|▌         | 5/100 [00:40<12:20,  7.79s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  6%|▌         | 6/100 [00:47<12:10,  7.77s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  7%|▋         | 7/100 [00:55<12:07,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  8%|▊         | 8/100 [01:03<12:03,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  9%|▉         | 9/100 [01:11<11:58,  7.90s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 10%|█         | 10/100 [01:19<11:54,  7.93s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 11%|█         | 11/100 [01:27<11:44,  7.91s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 12%|█▏        | 12/100 [01:35<11:33,  7.88s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 13%|█▎        | 13/100 [01:43<11:25,  7.87s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 14%|█▍        | 14/100 [01:50<11:15,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 15%|█▌        | 15/100 [01:58<11:06,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 16%|█▌        | 16/100 [02:06<10:57,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 17%|█▋        | 17/100 [02:14<10:49,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 18%|█▊        | 18/100 [02:22<10:40,  7.81s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 19%|█▉        | 19/100 [02:29<10:33,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 20/100 [02:37<10:25,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 21%|██        | 21/100 [02:45<10:19,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 22%|██▏       | 22/100 [02:53<10:11,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 23%|██▎       | 23/100 [03:01<10:03,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 24%|██▍       | 24/100 [03:09<09:56,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 25%|██▌       | 25/100 [03:17<09:48,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 26%|██▌       | 26/100 [03:24<09:40,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 27%|██▋       | 27/100 [03:32<09:32,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 28%|██▊       | 28/100 [03:40<09:25,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 29%|██▉       | 29/100 [03:48<09:16,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 30%|███       | 30/100 [03:56<09:08,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 31%|███       | 31/100 [04:04<09:01,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 32%|███▏      | 32/100 [04:11<08:53,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 33%|███▎      | 33/100 [04:19<08:44,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 34%|███▍      | 34/100 [04:27<08:36,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 35%|███▌      | 35/100 [04:35<08:28,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 36%|███▌      | 36/100 [04:43<08:20,  7.81s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 37%|███▋      | 37/100 [04:51<08:12,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 38%|███▊      | 38/100 [04:58<08:04,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 39%|███▉      | 39/100 [05:06<07:56,  7.81s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 40/100 [05:14<07:48,  7.81s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 41%|████      | 41/100 [05:22<07:42,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 42%|████▏     | 42/100 [05:30<07:35,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 43%|████▎     | 43/100 [05:38<07:27,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 44%|████▍     | 44/100 [05:45<07:20,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 45%|████▌     | 45/100 [05:53<07:12,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 46%|████▌     | 46/100 [06:01<07:04,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 47%|████▋     | 47/100 [06:09<06:56,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 48%|████▊     | 48/100 [06:17<06:48,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 49%|████▉     | 49/100 [06:25<06:40,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 50%|█████     | 50/100 [06:33<06:32,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 51%|█████     | 51/100 [06:40<06:24,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 52%|█████▏    | 52/100 [06:48<06:16,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 53%|█████▎    | 53/100 [06:56<06:08,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 54%|█████▍    | 54/100 [07:04<06:01,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 55%|█████▌    | 55/100 [07:12<05:53,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 56%|█████▌    | 56/100 [07:20<05:44,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 57%|█████▋    | 57/100 [07:27<05:37,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 58%|█████▊    | 58/100 [07:35<05:29,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 59%|█████▉    | 59/100 [07:43<05:21,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 60/100 [07:51<05:13,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 61%|██████    | 61/100 [07:59<05:06,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 62%|██████▏   | 62/100 [08:07<04:58,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 63%|██████▎   | 63/100 [08:15<04:50,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.
 64%|██████▍   | 64/100 [08:22<04:41,  7.82s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 65%|██████▌   | 65/100 [08:30<04:34,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 66%|██████▌   | 66/100 [08:38<04:26,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 67%|██████▋   | 67/100 [08:46<04:18,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 68%|██████▊   | 68/100 [08:54<04:10,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 69%|██████▉   | 69/100 [09:02<04:02,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 70%|███████   | 70/100 [09:09<03:55,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 71%|███████   | 71/100 [09:17<03:47,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 72%|███████▏  | 72/100 [09:25<03:40,  7.87s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 73%|███████▎  | 73/100 [09:33<03:32,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 74%|███████▍  | 74/100 [09:41<03:23,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 75%|███████▌  | 75/100 [09:49<03:16,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 76%|███████▌  | 76/100 [09:57<03:07,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 77%|███████▋  | 77/100 [10:04<03:00,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 78%|███████▊  | 78/100 [10:12<02:52,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 79%|███████▉  | 79/100 [10:20<02:45,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 80/100 [10:28<02:37,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 81%|████████  | 81/100 [10:36<02:29,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 82%|████████▏ | 82/100 [10:44<02:21,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 83%|████████▎ | 83/100 [10:52<02:13,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 84%|████████▍ | 84/100 [10:59<02:05,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 85%|████████▌ | 85/100 [11:07<01:57,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 86%|████████▌ | 86/100 [11:15<01:49,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 87%|████████▋ | 87/100 [11:23<01:41,  7.83s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 88%|████████▊ | 88/100 [11:31<01:34,  7.84s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 89%|████████▉ | 89/100 [11:39<01:26,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 90%|█████████ | 90/100 [11:46<01:18,  7.87s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 91%|█████████ | 91/100 [11:54<01:10,  7.88s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 92%|█████████▏| 92/100 [12:02<01:03,  7.90s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 93%|█████████▎| 93/100 [12:10<00:55,  7.89s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 94%|█████████▍| 94/100 [12:18<00:47,  7.87s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 95%|█████████▌| 95/100 [12:26<00:39,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 96%|█████████▌| 96/100 [12:34<00:31,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 97%|█████████▋| 97/100 [12:42<00:23,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 98%|█████████▊| 98/100 [12:49<00:15,  7.85s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 99%|█████████▉| 99/100 [12:57<00:07,  7.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 100/100 [13:05<00:00,  7.86s/it]


### 4. Evaluation for the generated image <a id="section14"></a>

In [None]:
!pip install transformers torch torchvision ftfy regex tqdm


Collecting ftfy
  Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Downloading ftfy-6.3.1-py3-none-any.whl (44 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ftfy
Successfully installed ftfy-6.3.1


In [None]:
!pip install git+https://github.com/openai/CLIP.git

Collecting git+https://github.com/openai/CLIP.git
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-fb2d_x_k
  Running command git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /tmp/pip-req-build-fb2d_x_k
  Resolved https://github.com/openai/CLIP.git to commit dcba3cb2e2827b402d2701e7e1c7d9fed8a20ef1
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: clip
  Building wheel for clip (setup.py) ... [?25l[?25hdone
  Created wheel for clip: filename=clip-1.0-py3-none-any.whl size=1369490 sha256=135b19aee47fd34faee75fc7bca2fa95c6d13d135db1203dc18ca02df85eaca2
  Stored in directory: /tmp/pip-ephem-wheel-cache-6an4snha/wheels/3f/7c/a4/9b490845988bf7a4db33674d52f709f088f64392063872eb9a
Successfully built clip
Installing collected packages: clip
Successfully installed clip-1.0


In [None]:
import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"

model, preprocess = clip.load("ViT-B/32", device=device)


100%|███████████████████████████████████████| 338M/338M [00:29<00:00, 12.0MiB/s]


In [None]:

import torch
import clip
from PIL import Image
import pandas as pd
import os

# Load CLIP model and device
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# Load your prompts
df = pd.read_csv("Image_prompts.csv").dropna(subset=["Prompt"])

# Initialize results list
clip_scores = []

# Process top 100 prompts and generated images
for idx, row in df.head(100).iterrows():
    prompt = row["Prompt"]
    image_path = f"Generated_images_SD/image_{idx+1:03d}.png"

    if not os.path.exists(image_path):
        print(f"Image {image_path} not found, skipping.")
        continue

    # Load and preprocess image and text
    image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
    text = clip.tokenize([prompt]).to(device)

    # Compute similarity
    with torch.no_grad():
        image_features = model.encode_image(image)
        text_features = model.encode_text(text)
        similarity = torch.nn.functional.cosine_similarity(image_features, text_features)

    score = similarity.item()
    clip_scores.append((image_path, prompt, score))
    print(f"[{idx+1}] CLIP Score: {score:.4f}")

# Save results to CSV
results_df = pd.DataFrame(clip_scores, columns=["image_path", "Prompt", "Clip_score"])
results_df.head()

[1] CLIP Score: 0.2961
[2] CLIP Score: 0.2986
[3] CLIP Score: 0.3425
[4] CLIP Score: 0.3604
[5] CLIP Score: 0.3755
[6] CLIP Score: 0.3430
[7] CLIP Score: 0.3225
[8] CLIP Score: 0.3142
[9] CLIP Score: 0.3230
[10] CLIP Score: 0.3035
[11] CLIP Score: 0.3452
[12] CLIP Score: 0.3362
[13] CLIP Score: 0.3313
[14] CLIP Score: 0.3171
[15] CLIP Score: 0.3501
[16] CLIP Score: 0.3179
[17] CLIP Score: 0.3425
[18] CLIP Score: 0.3062
[19] CLIP Score: 0.3284
[20] CLIP Score: 0.2859
[21] CLIP Score: 0.2937
[22] CLIP Score: 0.2832
[23] CLIP Score: 0.3042
[24] CLIP Score: 0.3157
[25] CLIP Score: 0.3052
[26] CLIP Score: 0.2573
[27] CLIP Score: 0.2659
[28] CLIP Score: 0.3101
[29] CLIP Score: 0.3730
[30] CLIP Score: 0.3330
[31] CLIP Score: 0.3623
[32] CLIP Score: 0.3435
[33] CLIP Score: 0.3276
[34] CLIP Score: 0.3430
[35] CLIP Score: 0.3562
[36] CLIP Score: 0.3247
[37] CLIP Score: 0.2942
[38] CLIP Score: 0.3191
[39] CLIP Score: 0.3347
[40] CLIP Score: 0.3250
[41] CLIP Score: 0.3181
[42] CLIP Score: 0.3298
[

Unnamed: 0,image_path,Prompt,Clip_score
0,Generated_images_SD/image_001.png,"A colossal, ancient tree, its bark a swirling ...",0.296143
1,Generated_images_SD/image_002.png,"A sleek, chrome-plated spacecraft, resembling ...",0.298584
2,Generated_images_SD/image_003.png,"A lone samurai, silhouetted against a blood-or...",0.342529
3,Generated_images_SD/image_004.png,"A whimsical teacup city, built on the back of ...",0.360352
4,Generated_images_SD/image_005.png,"An underwater library, filled with glowing jel...",0.375488
