# AI Image Generation with Stable Diffusion

This notebook demonstrates how to generate images from textual descriptions using a pre-trained Stable Diffusion model. We will walk through the entire process, from setting up the environment to loading the model and generating an image based on a custom prompt. The specific model used is `runwayml/stable-diffusion-v1-5`, a powerful and popular checkpoint for creating high-quality images.

### 1. Setting Up the Environment

Before we begin, we need to install the necessary Python libraries. The `requirements.txt` file specifies all the dependencies for this project. The command below uses `pip` to install them.

* **`diffusers`**: A core Hugging Face library that provides pre-trained diffusion models and pipelines, like Stable Diffusion.
* **`transformers`**: Another essential Hugging Face library that provides general-purpose architectures (like CLIP) used by the Stable Diffusion pipeline.
* **`accelerate`**: A library from Hugging Face that optimizes PyTorch code to run efficiently on various hardware setups, including GPUs.
* **`scipy`**: A scientific computing library that is a dependency for some schedulers in the `diffusers` library.
* **`safetensors`**: A secure and fast file format for storing model weights.

In [2]:
!pip install -r requirements.txt



### 2. Importing Necessary Libraries

With the dependencies installed, we can now import the modules we'll use throughout the notebook.

* **`StableDiffusionPipeline`**: The primary class from the `diffusers` library that encapsulates the entire text-to-image generation process.
* **`torch`**: The fundamental deep learning framework (PyTorch) used to perform tensor computations and manage GPU operations.
* **`PIL.Image`**: A module from the Pillow library for opening, manipulating, and saving many different image file formats.
* **`os`**: A standard Python library for interacting with the operating system, which is useful for creating directories.
* **`datetime`**: A library for working with dates and times, which we'll use to create unique filenames for our generated images.

In [2]:
from diffusers import StableDiffusionPipeline
import torch
from PIL import Image
import os
from datetime import datetime

### 3. Hardware Configuration

To ensure fast performance, we'll first verify that a CUDA-enabled GPU is available and then set it as the primary device for all computations. Image generation is very resource-intensive, and a GPU is essential for reasonable processing times.

This block will:
* Confirm **GPU availability**.
* Select `"cuda"` as the computation **device**.
* Set the data type to `torch.float16` for optimized memory usage and speed on modern GPUs.

In [3]:
import torch
# For checking if GPU is available
if not torch.cuda.is_available(): #If Torch discovers no GPU
    raise RuntimeError("GPU Unavailable, Ensure PyTorch was installed with CUDA support.")
    #Raises a Runtime Error stating that the local device Has no CUDA enabled GPU

print("CUDA Available", torch.cuda.is_available()) #If CUDA is available prints "CUDA Available"
print("Using device:", torch.cuda.get_device_name()) #If CUDA is available prints the CUDA device being used
print("PyTorch version:", torch.__version__) #Prints the version of Torch being used
print("GPU Memory Summary:", torch.cuda.memory_summary()) #Prints the summary of the GPU memory

CUDA Available True
Using device: NVIDIA GeForce RTX 4070 Laptop GPU
PyTorch version: 2.7.1+cu126
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Active memory         |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      

### 4. Configuring the Inference Device

Here, we explicitly set the computational device to `"cuda"`. This instructs PyTorch to allocate the model and perform all related calculations (inferences) on the GPU. Using `torch.float16` instead of the default `torch.float32` allows for faster computations and reduced memory usage on compatible GPUs by using half-precision floating-point numbers.

In [4]:
device = "cuda"

### 5. Loading the Stable Diffusion Model

This is where we load the pre-trained **Stable Diffusion v1.5 model** from the Hugging Face Hub.

* `StableDiffusionPipeline.from_pretrained(...)` downloads the model weights and configuration. **Note: The first time you run this, it will download several gigabytes of data.**
* We pass our `model_id` and specify the `torch_dtype` for half-precision optimization.
* `pipe.to(device)` moves the entire pipeline onto the GPU memory.
* The optional `safety_checker` line disables a feature that filters potentially unsafe content. This can prevent images from being blacked out if the model's content classifier is overly sensitive.

In [5]:
model_id = "runwayml/stable-diffusion-v1-5"

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32
)
pipe = pipe.to(device)

#Optional: Turn off safety checker
pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

model.safetensors:  49%|####9     | 1.18G/2.40G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:  23%|##2       | 1.00G/4.44G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.hf.co/repos/66/6f/666f465fa70158515404e8de2c6bc6fe2f90c46f9296293aa14daededeb32c52/19da7aaa4b880e59d56843f1fcb4dd9b599c28a1d9d9af7c1143057c8ffae9f1?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27diffusion_pytorch_model.safetensors%3B+filename%3D%22diffusion_pytorch_model.safetensors%22%3B&Expires=1751445740&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc1MTQ0NTc0MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy82Ni82Zi82NjZmNDY1ZmE3MDE1ODUxNTQwNGU4ZGUyYzZiYzZmZTJmOTBjNDZmOTI5NjI5M2FhMTRkYWVkZWRlYjMyYzUyLzE5ZGE3YWFhNGI4ODBlNTlkNTY4NDNmMWZjYjRkZDliNTk5YzI4YTFkOWQ5YWY3YzExNDMwNTdjOGZmYWU5ZjE%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=qL-5w58vFsWoYLqRl7lg6mfBnGVcmQtFA9TWgUwnKpWnkvk%7ENeIO6vbsLQjXMeLb0LkyLhdwQ28RHXavx1ugbmpFFf5AYxrIIhvUFVSFC42Tdw9ypIOTwrn7C2k-rpDRRbLl%7EPPvtjFYFujwGLSpSCT5gnZp4nInbwrB8tzqiVxQkbBEA3dyRUO7N1%7EF0eh37naGrEjZRqu7S703ahODcF3DutIF9ep3K

diffusion_pytorch_model.safetensors:  37%|###6      | 2.02G/5.46G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.hf.co/repos/66/6f/666f465fa70158515404e8de2c6bc6fe2f90c46f9296293aa14daededeb32c52/19da7aaa4b880e59d56843f1fcb4dd9b599c28a1d9d9af7c1143057c8ffae9f1?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27diffusion_pytorch_model.safetensors%3B+filename%3D%22diffusion_pytorch_model.safetensors%22%3B&Expires=1751445740&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc1MTQ0NTc0MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy82Ni82Zi82NjZmNDY1ZmE3MDE1ODUxNTQwNGU4ZGUyYzZiYzZmZTJmOTBjNDZmOTI5NjI5M2FhMTRkYWVkZWRlYjMyYzUyLzE5ZGE3YWFhNGI4ODBlNTlkNTY4NDNmMWZjYjRkZDliNTk5YzI4YTFkOWQ5YWY3YzExNDMwNTdjOGZmYWU5ZjE%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=qL-5w58vFsWoYLqRl7lg6mfBnGVcmQtFA9TWgUwnKpWnkvk%7ENeIO6vbsLQjXMeLb0LkyLhdwQ28RHXavx1ugbmpFFf5AYxrIIhvUFVSFC42Tdw9ypIOTwrn7C2k-rpDRRbLl%7EPPvtjFYFujwGLSpSCT5gnZp4nInbwrB8tzqiVxQkbBEA3dyRUO7N1%7EF0eh37naGrEjZRqu7S703ahODcF3DutIF9ep3K

diffusion_pytorch_model.safetensors:  48%|####8     | 3.19G/6.63G [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

### 6. Generating an Image from a Text Prompt

Now for the creative part! We define a text `prompt` that describes the image we want to create.

* **`prompt`**: The textual description that guides the AI. The more descriptive the prompt, the more detailed the output.
* **`num_inference_steps`**: This parameter controls how many denoising steps the model takes. A higher number can lead to a more refined image but will take longer to generate. A value between 20-50 is typically a good balance.

The `pipe()` function takes the prompt and other parameters, runs the generation process, and returns a result object. We access the generated image with `.images[0]` and use `.show()` to display it.

In [19]:
prompt = "A robotic Cat"

num_inference_steps = 30

generated_image = pipe(prompt, num_inference_steps=num_inference_steps).images[0]
generated_image.show()

  0%|          | 0/30 [00:00<?, ?it/s]

Detected locale "C" with character encoding "ANSI_X3.4-1968", which is not UTF-8.
Qt depends on a UTF-8 locale, and has switched to "C.UTF-8" instead.
If this causes problems, reconfigure your locale. See the locale(1) manual
for more information.
