In [None]:
"""
Experiment 8: Text-to-Image Generation using Stable Diffusion

Goal:
- Generate images from text prompts
- Use open-source Stable Diffusion
- Industry-style clean implementation

What is Stable Diffusion? :::::::::::::::::::::::::::::::::::::::::::::::::

Stable Diffusion is an AI model that creates images from text descriptions.

                       or

Stable Diffusion is an open-source AI model that generates images from text by gradually removing noise to create realistic pictures.

 Why is it called “Diffusion”?  :::

Because it uses a method where:

Noise is added

Then learned to remove noise gradually

So it “diffuses” and “de-diffuses” data.

Why is it called “Diffusion”?

Because it uses a method where:

Noise is added

Then learned to remove noise gradually

So it “diffuses” and “de-diffuses” data.

"""

# -------------------------------------------------
# Step 1: Import Required Libraries
# -------------------------------------------------
# PyTorch is a deep learning library for building and running neural networks.

# It handles tensors (multidimensional arrays) efficiently on CPU and GPU.

# In Stable Diffusion, PyTorch is used for model inference, tensor operations, and GPU acceleration

import torch

# diffusers is a Hugging Face library for diffusion models.

# StableDiffusionPipeline is a ready-to-use class that:

# Loads the Stable Diffusion model.

# Handles text-to-image generation.

# Provides a simple interface to convert prompts → images without manually coding U-Net, scheduler, or tokenizer.
from diffusers import StableDiffusionPipeline

# PIL (Python Imaging Library) is used to handle images in Python.

# Required for:

# Saving generated images.

# Displaying images with image.show().

from PIL import Image                # Image handling

# -------------------------------------------------
# Step 2: Device Configuration
# -------------------------------------------------
"""
Industry practice:
- Automatically use GPU if available
- Fall back to CPU otherwise
"""

device = "cuda" if torch.cuda.is_available() else "cpu"

# -------------------------------------------------
# Step 3: Load Stable Diffusion Model
# -------------------------------------------------
"""
Model choice:
- runwayml/stable-diffusion-v1-5
- Widely used
- Open-source
- Production-tested

⚠️ First run downloads ~4GB model
"""


model_id = "runwayml/stable-diffusion-v1-5"

# 1:::::::::::  This is the pretrained Stable Diffusion model identifier on Hugging Face. :::::::::::::::::::::

# v1-5 is a widely used, open-source, production-ready model.

# Downloads the model weights (~4GB) on the first run.

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32 #Float32 on CPU is standard precision
)

# 2::::::::::   Downloads and loads the model into memory.

# The pipeline handles:

# Text encoding (CLIP tokenizer & encoder).

# Noise generation and iterative denoising.

# Scheduler logic for inference steps.

pipe = pipe.to(device)

# 3.::::::::::::::  Moves the entire pipeline (model weights) to GPU or CPU.
# Essential because computations happen on the same device.

# -------------------------------------------------
# Step 4: Define Text Prompt
# -------------------------------------------------
"""
Prompt engineering matters a LOT in real products.
Companies test thousands of prompts.
"""

prompt = (
    "A futuristic city at sunset, ultra realistic, "
    "cinematic lighting, high detail, 4k resolution"
)
# It is a string variable using implicit string concatenation inside parentheses for better readability.


# -------------------------------------------------
# Step 5: Generate Image
# -------------------------------------------------
"""
Key parameters:
- guidance_scale → how strongly model follows prompt
- num_inference_steps → quality vs speed tradeoff
"""

image = pipe(
    prompt=prompt,
    guidance_scale=7.5,        # Higher = more prompt adherence
    num_inference_steps=40     # More steps = better quality
).images[0]

# 1:::::::::::::   pipe(prompt=prompt, ...)

                      # Runs the pipeline:

                      # Text → embedding (CLIP encoder)

                       # Generate random noise image

                       # Iteratively denoise to match the prompt

# 2:::::::::::::     guidance_scale=7.5

                   # Controls how strongly the generated image follows the prompt.

                   # Higher → more faithful to the text.

                  # Lower → more creativity / randomness.

                  # Typical range: 5–15.

# 3:::::::::::::::::::  num_inference_steps=40

                        # Number of denoising iterations.

                        # More steps = higher quality but slower.

                        # Fewer steps = faster but may be blurry or low-quality.

                        # Common range: 25–50.

# 4.:::::::::::::::::  .images[0]

                       # The pipeline returns a list of generated images.

                       # We pick the first image in the batch.

                       # Batch size defaults to 1

# -------------------------------------------------
# Step 6: Save and Display Image
# -------------------------------------------------
image.save("generated_image.png")
image.show()

print("✅ Image generated and saved as generated_image.png")

Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/196 [00:00<?, ?it/s]

CLIPTextModel LOAD REPORT from: /root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/451f4fe16113bff5a5d2269ed5ad43b0592e9a14/text_encoder
Key                                | Status     |  | 
-----------------------------------+------------+--+-
text_model.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Loading weights:   0%|          | 0/396 [00:00<?, ?it/s]

StableDiffusionSafetyChecker LOAD REPORT from: /root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/451f4fe16113bff5a5d2269ed5ad43b0592e9a14/safety_checker
Key                                               | Status     |  | 
--------------------------------------------------+------------+--+-
vision_model.vision_model.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


  0%|          | 0/40 [00:00<?, ?it/s]

✅ Image generated and saved as generated_image.png
