<a href="https://colab.research.google.com/github/mariap2021/Bayesiana/blob/main/Stable_Diffusion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1 - Stable Diffusion

Stable Diffusion is a powerful text-to-image generative model developed by [Stability AI](https://stability.ai/). It uses a type of deep learning called diffusion models to generate highly realistic images from textual descriptions (called prompts). Unlike earlier models that were often slow or required massive hardware, Stable Diffusion is designed to be efficient and open-source, allowing individuals to run it on consumer-grade GPUs.

**Key features include:**

- üî§ **Text-to-Image Generation:** You input a sentence like ‚Äúa cyberpunk city at night,‚Äù and it generates a unique image based on that.

- üß† **Latent Diffusion:** Instead of working directly in pixel space, it operates in a compressed ‚Äúlatent‚Äù space, which makes it faster and more memory-efficient.

- üõ†Ô∏è **Customizability:** Users can fine-tune it or use tools like ControlNet, LoRA, and DreamBooth to guide image generation.

- üåç **Open Source:** Anyone can download and run it, sparking a large community of artists, developers, and hobbyists.


**Fig 1. Applications of Computer Vision** - [Stable Diffusion Explanation](https://www.youtube.com/watch?v=QdRP9pO89MY)
<img src="Computer Vision Applications.png" alt="Computer Vision" width="1000" height="1000">


In [None]:
!pip install huggingface_hub transformers accelerate torch torchvision imageio diffusers json imageio[ffmpeg]



## Hugging Face

[Hugging Face](https://huggingface.co/) is a company and open-source community that builds tools for natural language processing (NLP), machine learning, and AI research. It‚Äôs best known for developing the Transformers library, which provides access to powerful pre-trained models like BERT, GPT, T5, RoBERTa, Stable Diffusion and many others.

**Key features:**

- üß† **Transformers Library:** Easy-to-use interface for working with state-of-the-art models for tasks like text classification, translation, question answering, and more.

- üåê **Model Hub:** A massive online repository ([https://huggingface.co](https://huggingface.co)) where you can find, upload, and share thousands of pre-trained models.

- üõ†Ô∏è **Trainer API:** High-level API for training and fine-tuning models with just a few lines of code.

- ü§ó **Community & Open Source:** Actively maintained by a huge community of researchers, developers, and enthusiasts.

- üì¶ **Integration:** Works seamlessly with PyTorch, TensorFlow, and JAX.

In [None]:
import json #Create your file as AUTH.json -> {"API_KEY":"My API KEY"}
with open('./AUTH.json', 'r') as file:
    creds = json.load(file)

In [None]:
from huggingface_hub import login

# Put here your personal token -> login("MY_API_KEY")
login(creds["API_KEY"])

## Stable Diffusion v1.5

[**Stable Diffusion v1.5**](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) is a latent text-to-image diffusion model capable of generating photo-realistic images from textual descriptions. It was fine-tuned from the v1.2 checkpoint using 595,000 steps at a resolution of 512x512 on the "laion-aesthetics v2 5+" dataset. To enhance classifier-free guidance sampling, 10% of the text-conditioning was dropped during training.

---

### ‚öôÔ∏è Key Features

- **Model Type:** Latent Diffusion Model (LDM)  
- **Text Encoder:** CLIP ViT-L/14  
- **Architecture Components:** Variational Autoencoder (VAE), U-Net, and a fixed text encoder  
- **Training Data:** Images from the "laion-aesthetics v2 5+" dataset  
- **License:** CreativeML Open RAIL-M  

---

### ‚ö†Ô∏è Limitations

- **Biases:** The model may reflect societal biases present in the training data (e.g., gender, race, age).  
- **NSFW Content:** Capable of generating not safe for work (NSFW) or offensive content, including nudity or violence.  
- **Hallucinations:** It may generate inaccurate or unrealistic representations, especially for abstract prompts.  
- **No Real-World Understanding:** The model does not understand the world; it only generates patterns learned from data.  
- **Ethical Risks:** Potential misuse includes deepfakes, misinformation, or offensive imagery.  

---

### üìú License: CreativeML Open RAIL-M

- **Permissive Use:** Free for commercial and non-commercial use.  
- **Restrictions:**  
  - You cannot use the model to deliberately create or disseminate harmful, illegal, or offensive content.  
  - You must credit the original creators when publishing outputs or derived models.  
- **Modifications:** You can modify the model, but redistribution must include the same license terms.

> This license aims to balance open access with responsible use.

In [None]:
from diffusers import StableDiffusionPipeline
import torch

pipe_1 = StableDiffusionPipeline.from_pretrained(
    "sd-legacy/stable-diffusion-v1-5"
)

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

## Applications of Stable Diffusion

1. **Text-to-Image Generation**  
Turn a prompt into a unique image.

*Example:*  
Prompt ‚Üí "A futuristic city at sunset"  
‚Üí Generates a visually rich image of that scene.

2. **Image-to-Image Transformation (img2img)**  
Modify existing images based on new prompts.

*Example:*  
Turn a sketch into a detailed artwork.

3. **Inpainting (Image Editing)**  
Fill in or edit specific parts of an image using a prompt.

*Example:*  
Remove objects, change backgrounds, or "paint" missing parts.


## üñºÔ∏è 1. Text-to-Image Generation ‚Äì My Golden Retriever üêï

One warm evening, as the sun dipped low on the horizon, I imagined a beautiful scene and typed into the AI:  
*"a golden retriever running in a field during sunset, 4k photo, motion blur, natural lighting."*

Almost instantly, an image appeared‚Äîa golden retriever sprinting joyfully across a vast open field. The fading sunlight cast a warm glow over the scene, highlighting the dog‚Äôs flowing fur. Motion blur captured the speed and energy of his run, while the natural light bathed everything in soft, golden hues.

With this simple prompt, I was able to bring to life a moment of pure happiness and freedom‚Äîcapturing not just a dog, but a feeling.

Thanks to text-to-image generation, memories and dreams like this can be created again and again, just by describing them in words.

<img src="golden retriever.png" alt="Golden Retriever" width="300" height="300">

In [None]:
# Generate image
prompt = "a golden retriever running in a field during sunset, 4k photo, motion blur, natural lighting"
image = pipe_1(prompt, num_inference_steps=50).images[0]

# Show and save image
image.show()
image.save("golden retriever.png")

  0%|          | 0/49 [00:00<?, ?it/s]

## Instruct-pix2pix

**InstructPix2Pix** is an advanced image editing model that allows users to modify images based on natural language instructions. Developed by Tim Brooks, Aleksander Holynski, and Alexei A. Efros, it combines the capabilities of Stable Diffusion and GPT-3 to understand and apply edits described in plain English.

---

### ‚öôÔ∏è Key Features

- **Instruction-Based Editing:** Modify images by providing textual prompts (e.g., "make the sky sunset-colored").  
- **No Fine-Tuning Required:** Performs edits in a single forward pass without the need for additional training.  
- **Rapid Processing:** Generates edited images in seconds.  
- **Versatile Applications:** Suitable for tasks like object addition/removal, style changes, and more.

---

### ‚ö†Ô∏è Limitations

- **Biases in Output:** May reflect societal biases present in the training data.  
- **Limited Spatial Transformations:** Not ideal for significant spatial changes like altering camera angles or object positions.  
- **Research-Oriented Weights:** The provided model weights are intended for research purposes and may not be suitable for commercial applications without additional safety measures.

---

### üìú License

- **MIT License:** Permissive license allowing for use, modification, and distribution.  
- **Use-Based Restrictions:** Prohibits use of the model and its derivatives for unlawful purposes or in violation of applicable laws.


In [None]:
from diffusers import StableDiffusionInstructPix2PixPipeline
from PIL import Image
import torch

pipe_2 = StableDiffusionInstructPix2PixPipeline.from_pretrained(
    "timbrooks/instruct-pix2pix"
)

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

## üñºÔ∏è Image-to-Image Transformation ‚Äì *I Wanna Sell My Wallet*

I had a simple photo of my old leather wallet‚Äîworn but full of character. I wanted to make it look fresh and appealing for an online sale. So, I used an image-to-image model.

By feeding the original photo and a prompt like *Extract the wallet of the image and put it in a white background‚Äù*, the AI transformed it beautifully. The wrinkles softened, the leather gleamed under bright light, and even the zipper looked pristine.

This powerful tool helped me turn a casual snapshot into a professional-looking product photo, ready to attract buyers.

With image-to-image editing, selling something online doesn‚Äôt mean you need expensive equipment or perfect photos‚Äîjust smart AI and a clear vision.

<img src="wallet.jpeg" alt="My Wallet" width="300" height="300">

In [None]:
# Load an image to modify
init_image = Image.open("wallet.jpeg").convert("RGB").resize((512, 512))

# Define your prompt
prompt = "Extract the wallet of the image and put it in a white background"

# Run img2img
image = pipe_2(prompt=prompt, image=init_image, strength=0.6, guidance_scale=7.5, num_inference_steps=50).images[0]

# Save or show result
image.save("wallet without background.jpg")
image.show()

### The result
<img src="wallet without background.jpg" alt="My modified Wallet" width="300" height="300">

## 2.1 Improving my prompt

In [None]:
# Load an image to modify
init_image = Image.open("wallet.jpeg").convert("RGB").resize((512, 512))

# Define your prompt
prompt = "Isolate the wallet and place it on a plain white background. Make it look like a professional product photo."

# Run img2img
image = pipe_2(prompt=prompt, image=init_image, strength=0.6, guidance_scale=7.5, num_inference_steps=50).images[0]

# Save or show result
image.save("wallet without background with a optimized prompt.jpg")
image.show()

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

### The result
<img src="wallet without background with a optimized prompt.jpg" alt="wallet without background with a optimized prompt.jpg" width="300" />

## üé• Creating My Futuristic City in a Video

I dreamed of building a breathtaking vision of the future‚Äîa cityscape bathed in the gentle light of sunrise. To bring this idea to life, I used an AI-driven text-to-video model with the prompt:  
*"A futuristic city at sunrise, cinematic, high detail, 8k."*

The AI generated a stunning video sequence showcasing towering skyscrapers with sleek designs, glowing softly in the early morning light. The cinematic angles and ultra-high resolution made every detail pop‚Äîthe reflections on glass, the subtle haze, and the vibrant colors of dawn.

Watching the city come alive on screen felt like stepping into a sci-fi movie. This technology allowed me to transform a simple description into a dynamic, immersive experience‚Äîa vision of the future unfolding frame by frame.

Thanks to AI-powered video generation, creating complex and vivid scenes is no longer limited by traditional tools or budgets. Now, imagination is truly the only limit.

In [None]:
import os
import imageio
from PIL import Image
import numpy as np
# Paso 4: Crear una carpeta para guardar las im√°genes
os.makedirs("frames", exist_ok=True)

# 4. Imagen inicial desde texto
initial_prompt = "A futuristic city at sunrise, cinematic, high detail, 8k"
init_image = pipe_1(initial_prompt, strength=0.75, guidance_scale=7.5, num_inference_steps=50).images[0]
init_image.save("frames/frame_000.png")

# 5. Prompts interpolados (puedes modificar el tema gradualmente)
prompts = [
    "A futuristic city at morning",
    "A futuristic city at noon",
    "A futuristic city at evening",
    "A futuristic city at sunset",
    "A futuristic city at night"
]

# 6. Generar frames fluidos con img2img
prev_image = init_image
for i, prompt in enumerate(prompts):
    print(f"Generando frame {i+1} - Prompt: {prompt}")

    # Convertir PIL a formato compatible
    image = prev_image.resize((512, 512)).convert("RGB")

    new_image = pipe_1(
        prompt=prompt,
        image=image,
        strength=0.6,  # Cambia la fuerza para m√°s/menos diferencia
        guidance_scale=7.5
    ).images[0]

    frame_name = f"frames/frame_{i+1:03d}.png"
    new_image.save(frame_name)

    # Usar esta imagen como base para la siguiente
    prev_image = new_image

def interpolate_frames(img1, img2, steps=5):
    # Convertir a PIL si es necesario
    img1 = Image.fromarray(img1)
    img2 = Image.fromarray(img2)
    return [Image.blend(img1, img2, alpha=i/steps) for i in range(1, steps)]

frames = []
for i in range(len(prompts)):
    frame1 = imageio.imread(f"frames/frame_{i:03d}.png")
    frame2 = imageio.imread(f"frames/frame_{i+1:03d}.png")

    frames.append(frame1)
    transition_frames = interpolate_frames(frame1, frame2, steps=5)
    frames.extend([np.array(f) for f in transition_frames])

# A√±adir el √∫ltimo frame
frames.append(imageio.imread(f"frames/frame_{len(prompts):03d}.png"))

imageio.mimsave("a_futuristic_city_video.mp4", frames, fps=10)

  0%|          | 0/50 [00:00<?, ?it/s]

Generando frame 1 - Prompt: A futuristic city at morning


  0%|          | 0/50 [00:00<?, ?it/s]

Generando frame 2 - Prompt: A futuristic city at noon


  0%|          | 0/50 [00:00<?, ?it/s]

Generando frame 3 - Prompt: A futuristic city at evening


  0%|          | 0/50 [00:00<?, ?it/s]

Generando frame 4 - Prompt: A futuristic city at sunset


  0%|          | 0/50 [00:00<?, ?it/s]

Generando frame 5 - Prompt: A futuristic city at night


  0%|          | 0/50 [00:00<?, ?it/s]

  frame = imageio.imread(f"frames/frame_{i:03d}.png")


üé• Video guardado como smooth_video.mp4


## Activities

1. Create your own image with a nifty prompt (free topic)
2. Take a picture with your picture and modified it with a optimized prompt (free topic) -> E.g. Changes of environments or style, remove background, etc...
3. Make your own video with a sequence prompt (free topic)

## üåê Other Resources about Stable Diffusion (Beyond Hugging Face)

### üß† 1. [Stability AI (Official)](https://stability.ai)
- Official creators of Stable Diffusion.
- Offers research papers, blog posts, updates, and commercial tools.

---

### üñºÔ∏è 2. [DreamStudio](https://dreamstudio.ai)
- Web-based tool developed by Stability AI.
- Generate images from text with adjustable parameters (steps, guidance scale, etc.).

---

### üì¶ 3. GitHub Repositories
- [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion): The original open-source release.
- [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui): Popular web UI for running Stable Diffusion locally with extensive features.

---

### üìö 4. Research Papers
- [Original Paper on arXiv](https://arxiv.org/abs/2112.10752): *High-Resolution Image Synthesis with Latent Diffusion Models*.
- Explore new innovations like SDXL, ControlNet, and LoRA via [arXiv.org](https://arxiv.org).

---

### üåê 5. [Civitai](https://civitai.com)
- Community platform for sharing and downloading custom models, LoRAs, embeddings, and fine-tuned styles for Stable Diffusion.

---

### üß∞ 6. [Runway ML](https://runwayml.com)
- User-friendly platform for creative AI tools.
- Offers text-to-image, image-to-video, and more, including Stable Diffusion-based models.

---

### üí¨ 7. Community Forums
- [r/StableDiffusion](https://www.reddit.com/r/StableDiffusion/)
- [r/StableDiffusionAI](https://www.reddit.com/r/StableDiffusionAI/)
- Many GitHub and tool websites link to their own Discord servers for support and discussion.



# References

1. Stability AI, ‚ÄúStability.Ai,‚Äù Stability.Ai, 2024. https://stability.ai/
2. Hugging Face, ‚ÄúHugging Face ‚Äì On a mission to solve NLP, one commit at a time.,‚Äù huggingface.co, 2024. https://huggingface.co/
3. Render Realm, ‚ÄúStable Diffusion explained (in less than 10 minutes),‚Äù YouTube, Mar. 29, 2024. https://www.youtube.com/watch?v=QdRP9pO89MY (accessed Nov. 26, 2024).
4. ‚Äútimbrooks/instruct-pix2pix ¬∑ Hugging Face,‚Äù Huggingface.co, 2025. https://huggingface.co/timbrooks/instruct-pix2pix (accessed Jun. 03, 2025).
‚Äå