# Stable diffusion vs. Diffusion (Ho et al.)

```mermaid
graph TD
    subgraph "Original Diffusion Model (Ho et al.)"
        A1[Random Noise] --> B1[Denoising U-Net]
        B1 --> C1[Generated Image]
    end

    subgraph "Stable Diffusion Model"
        A2[Text Prompt] --> B2[Text Encoder]
        B2 --> |Text Embeddings| C2[Denoising U-Net]
        D2[Random Noise] --> C2
        C2 --> E2[Refined Latents]
        E2 --> F2{Finished?}
        F2 -->|No| C2
        F2 -->|Yes| G2[VAE Decoder]
        G2 --> H2[Generated Image]
    end

    classDef input fill:#bbf,stroke:#333,stroke-width:2px;
    classDef process fill:#f9f,stroke:#333,stroke-width:2px;
    classDef output fill:#bfb,stroke:#333,stroke-width:2px;

    class A1,A2,D2 input;
    class B1,B2,C2,G2 process;
    class C1,H2 output;
```

# Core classes 

```mermaid
classDiagram
    class StableDiffusionXLPipeline {
        +tokenizer
        +text_encoder
        +unet
        +vae
        +scheduler
        +__call__(prompt, negative_prompt, etc.)
        Orchestrates the entire image generation process
    }

    class Tokenizer {
        +encode(text)
        Converts text into numerical tokens
        the model can understand
    }

    class TextEncoder {
        +encode(tokens)
        Transforms tokens into latent
        representations capturing text semantics
    }

    class UNet {
        +forward(latents, timestep, context)
        Iteratively refines image latents
        guided by text encodings and timestep
    }

    class VAE {
        +encode(image)
        +decode(latents)
        Encodes images to latent space and
        decodes latents back to images
    }

    class Scheduler {
        +set_timesteps(num_inference_steps)
        +step(model_output, timestep, sample)
        Manages the diffusion process,
        controlling noise levels and step sizes
    }

    StableDiffusionXLPipeline --> Tokenizer
    StableDiffusionXLPipeline --> TextEncoder
    StableDiffusionXLPipeline --> UNet
    StableDiffusionXLPipeline --> VAE
    StableDiffusionXLPipeline --> Scheduler
```

# Core interactions

```mermaid
flowchart TD
    A[Start] --> B[Input Prompt]
    B --> C[Tokenizer]
    C --> |Tokens| D[Text Encoder]
    D --> |Text Embeddings| E[UNet]
    
    F[Random Noise] --> E
    G[Scheduler] --> |Timesteps| E
    
    E --> |Refined Latents| H{Finished?}
    H --> |No| E
    H --> |Yes| I[VAE Decoder]
    I --> J[Generated Image]
    J --> K[End]

    subgraph "Iterative Refinement Loop"
        E
        H
    end

    classDef process fill:#f9f,stroke:#333,stroke-width:2px;
    class C,D,E,I process;
    classDef input fill:#bbf,stroke:#333,stroke-width:2px;
    class B,F input;
    classDef output fill:#bfb,stroke:#333,stroke-width:2px;
    class J output;
```