<center>
    <h1>Text to Image Generation with LLMs</h1>
</center>

# Brief Recap of Text to Image Generation with LLMs

- Text-to-image generation involves understanding the relationship between language (text input) and visual elements (image output).
- DALL-E was trained on large-scale datasets of text-image pairs, enabling it to learn these associations effectively.

- For example, when given a prompt like "an astronaut riding a horse in space," DALL-E can produce an image that aligns with this imaginative concept.

## Introduction to DALL-E

- DALL-E is a revolutionary AI model developed by OpenAI, designed to generate detailed and coherent images from textual descriptions.

- It extends the capabilities of Large Language Models (LLMs) like GPT, moving beyond text generation to image synthesis.

- The model interprets natural language prompts and transforms them into unique visual representations, making it highly versatile for creative and design applications.

## Key Characteristics of DALL-E



- **Multimodal Learning**: DALL-E processes both text and visual data, bridging the gap between these two domains.

- **Generative Creativity**: It can generate novel visual content, even for combinations of objects and attributes that do not exist in reality.

- **High-Resolution Outputs**: The model is capable of producing high-quality, detailed images that accurately reflect the input prompt, making it suitable for a variety of use cases.

<center>
    <img src="static/image1.gif" alt="DALL-E" style="width:50%;">
</center>

## Architecture Overview of DALL-E

DALL-E's architecture is built upon several key components that work together to generate images from textual prompts:

- **Text Input Processing**:
  - The input to DALL-E is a sequence of words that forms a prompt, which is tokenized and transformed into embeddings.
  - **Tokenization**: The text is split into tokens, which are then converted into numerical representations using an embedding layer.
  - **Positional Encoding**: Like other transformer models, DALL-E uses positional encoding to maintain the order and structure of the words in the sentence, which is crucial for understanding context.

- **Image Decoding via Vector Quantized Variational AutoEncoder (VQ-VAE)**:
  - **Discrete Image Representation**: Instead of generating pixel-by-pixel outputs, DALL-E breaks down an image into discrete tokens (image patches) using VQ-VAE. Each token represents a small part of the image, allowing the model to generate images more efficiently.
  - **Latent Space**: The image is represented in a latent space where each token corresponds to a learned visual concept. The VQ-VAE helps map these tokens back to a complete image.

- **Cross-Attention Mechanism**:
  - DALL-E employs a cross-attention mechanism that aligns the text tokens (from the input) with the image tokens (being generated).
  - This ensures that specific words or phrases from the prompt are linked to corresponding visual features, maintaining coherence between the description and the generated image.

- **Transformer Decoder**:
  - Similar to how transformers generate text in language models, DALL-E’s decoder generates sequences of image tokens, which are later decoded into the final visual output.
  - This sequential token generation allows the model to create images with spatial coherence, ensuring that parts of the image fit together seamlessly (e.g., an astronaut’s helmet correctly placed on their head).

- **Training**:
  - DALL-E is trained on a massive dataset of text-image pairs using autoregressive modeling. The model learns to predict the next visual token given the previous tokens and the text prompt.
  - During training, the objective is to minimize the difference between the generated image and the true image associated with the text input, allowing the model to generate accurate visual representations.

- **Generation Process**:
  - Once trained, DALL-E can generate images by taking a new text prompt, processing it through its transformer layers, and outputting a sequence of visual tokens. These tokens are then decoded back into a full image that corresponds to the input description.

<center>
    <img src="static/image2.jpg" alt="DALL-E Architecture" style="width:50%;">
</center>

## Major Advantages of DALL-E

- **1. Creativity and Flexibility**:
  - DALL-E excels at producing creative and original images from diverse text inputs, making it highly useful for design, art, and advertising.
  - It can combine concepts that don’t normally coexist, such as “a cat dressed as a king,” creating visually imaginative outputs that are difficult to achieve with traditional models.
  - The flexibility of DALL-E allows it to interpret and generate content even for prompts that involve abstract or complex scenarios, enhancing its creative potential.

- **2. High-Quality and Detail**:
  - DALL-E generates high-fidelity images with intricate details that closely align with the provided text.
  - The model captures fine visual elements, from color schemes to object placement, ensuring that even nuanced descriptions are reflected in the output.
  - This capability is particularly valuable in fields like marketing, product design, and digital content creation, where visual quality is critical.

- **3. Generalization to Unseen Concepts**:
  - Unlike models restricted to specific image categories, DALL-E can generalize to novel combinations of objects and attributes, allowing it to generate content for previously unseen prompts.
  - This is achieved through its training on diverse text-image pairs, which helps the model learn broad visual-linguistic relationships.

- **4. Multimodal Integration**:
  - DALL-E effectively integrates language understanding with visual generation, providing an end-to-end solution for multimodal tasks.
  - This makes it useful for applications that require both textual and visual outputs, such as creating interactive content or generating visuals for storytelling.

- **5. Scalability**:
  - The architecture of DALL-E is scalable, meaning it can handle increasingly complex prompts and generate more detailed images as the model size and dataset grow.
  - As DALL-E is exposed to more data and larger models, it can continue to improve its ability to generate diverse and high-quality images from varied and complex text inputs.

- **6. Applicability Across Industries**:
  - DALL-E’s ability to generate images from text has broad applicability, from aiding in content creation and advertising to being used in education, healthcare, and entertainment.
  - For example, in product design, DALL-E can quickly generate visual prototypes based on verbal descriptions, saving time and resources in the design process.
  - In the gaming and entertainment industry, DALL-E can generate concept art or in-game assets based on creative prompts, enhancing the speed and diversity of content production.

# Interacting with DALL-E

## Step 1: Create an OpenAI Account


1. Visit OpenAI's website (https://openai.com)
2. Click on "Sign Up" or "Get Started"
3. Enter your email address
4. Create a secure password
5. Verify your email through the confirmation link sent to your inbox

## Step 2: Complete Account Setup

1. Fill in your user profile information
2. Provide your full name
3. Set up two-factor authentication (recommended)
4. Complete any additional verification steps if prompted

## Step 3: Set Up Billing

1. Navigate to the API section in your dashboard
2. Click on "Billing" in the left sidebar
3. Click "Set up paid account"
4. Add a payment method (credit card required)
5. Set usage limits (recommended to avoid unexpected charges)
6. Consider setting a monthly spending cap

## Step 4: Generate API Key

1. Go to the API Keys section in your dashboard
2. Click "Create new secret key"
3. Give your key a descriptive name
4. Copy and save the API key immediately (it won't be shown again)
5. Store the key securely

## Step 5: Configure Development Environment

1. Install required Python packages:
    ```bash
    pip install openai requests pillow
    ```

2. Set up your development environment with the API key:
    ```python
    from openai import OpenAI

    client = OpenAI(api_key='your-api-key-here')
    ```

## Step 6: Test DALL-E Access

1. Create a simple test script:
    ```python
    response = client.images.generate(
        prompt="a simple test image",
        n=1,
        size="1024x1024"
    )
    image_url = response.data[0].url
    ```

## Step 7: Understanding Usage and Pricing

1. Each image generation costs credits
2. Monitor your usage in the dashboard
3. Be aware of different size options and their costs:
   - 1024x1024 (standard size)
   - 1024x1792 or 1792x1024 (rectangular options)
   - HD quality available for higher quality outputs

# Let's Build a sample project to understand the concept of Text to Image Generation with DALL-E better

# Project: AI Art Gallery Creator - Learning DALL-E Image Generation

## Problem Description

This hands-on project will introduce you to AI-powered image generation using DALL-E-3, teaching fundamental concepts of API integration, image processing, and modern AI applications.

## Configuration Class Explanation

The `Config` class is designed to manage configuration settings for the DALL-E API integration. Here's a detailed breakdown:

### Class Initialization

```python
def __init__(self):
    self._api_key = None
    self.default_image_size = "1024x1024"
    self.default_quality = "standard"
    self.default_model = "dall-e-3"
    self.output_directory = "generated_images"
```

**Instance Variables:**
- `self._api_key`: Private variable (denoted by underscore) to store the OpenAI API key securely
- `self.default_image_size`: Sets default image dimensions to 1024x1024 pixels
- `self.default_quality`: Sets image quality to "standard" (alternatives: "hd")
- `self.default_model`: Uses DALL-E 3 as the default model
- `self.output_directory`: Specifies where generated images will be saved


### API Key Property

```python
@property
def api_key(self):
    if not self._api_key:
        raise ValueError("API key not set. Call set_api_key() first.")
    return self._api_key
```

**Key Features:**
- Uses Python's `@property` decorator for controlled access to the API key
- Implements a getter method that checks if the API key is set
- Raises an error if attempting to access an unset API key
- Provides secure access to the private `_api_key` variable

### API Key Setter

```python
def set_api_key(self, key):
    self._api_key = key
```

**Functionality:**
- Simple method to set the API key
- Takes a single parameter `key` which is the OpenAI API key
- Assigns the key to the private `_api_key` variable

### Important Notes:

1. **Security:**
   - The API key is stored in a private variable
   - Access is controlled through property decorator
   - Prevents direct modification of the API key

2. **Default Settings:**
   - Image size: 1024x1024 (square format)
   - Quality: standard (balanced option)
   - Model: DALL-E 3 (latest version)

3. **Best Practices:**
   - Always set the API key before attempting to use the configuration
   - Use the property getter to access the API key
   - Don't modify the API key directly through the private variable

In [None]:
class Config:
    def __init__(self):
        self._api_key = None
        self.default_image_size = "1024x1024"
        self.default_quality = "standard"
        self.default_model = "dall-e-3"
        self.output_directory = "generated_images"

    @property
    def api_key(self):
        if not self._api_key:
            raise ValueError("API key not set. Call set_api_key() first.")
        return self._api_key

    def set_api_key(self, key):
        self._api_key = key

In [None]:
# Importing required libraries
import os
from openai import OpenAI
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt


## DallEDemo Class Explanation

This class serves as the main interface for interacting with DALL-E's image generation capabilities.

### Class Initialization

```python
def __init__(self, config: Config):
    self.config = config
    self.client = OpenAI(api_key=self.config.api_key)
    self.output_dir = self.config.output_directory
    os.makedirs(self.output_dir, exist_ok=True)
```

**Parameters:**
- `config`: Configuration object containing API key and settings
- Creates OpenAI client instance
- Sets up output directory for saving images

### Image Generation Method

```python
def generate_image(self, prompt, size=None, quality=None, n=1):
```

**Parameters:**
- `prompt`: Text description for the image to generate
- `size`: Optional image dimensions (e.g., "1024x1024", "1024x1792")
- `quality`: Optional image quality ("standard" or "hd")
- `n`: Number of images to generate (default: 1)


**Implementation Details:**
```python
response = self.client.images.generate(
    model=self.config.default_model,
    prompt=prompt,
    size=size or self.config.default_image_size,
    quality=quality or self.config.default_quality,
    n=n
)
```

**Returns:**
- `image_path`: Path to saved image
- `revised_prompt`: DALL-E's interpretation of the input prompt

### Variation Creation Method

```python
def create_variation(self, image_path, n=1, size=None):
```

**Parameters:**
- `image_path`: Path to source image
- `n`: Number of variations (default: 1)
- `size`: Optional size for variations

**Implementation Details:**
```python
response = self.client.images.create_variation(
    model="dall-e-2",
    image=open(image_path, "rb"),
    n=n,
    size=size or self.config.default_image_size
)
```

**Returns:**
- List of paths to generated variation images

### Image Saving Method

```python
def _save_image(self, image_url, filename):
```

**Parameters:**
- `image_url`: URL of the generated image
- `filename`: Name for saving the image

**Implementation:**
- Downloads image from URL using requests
- Saves to specified output directory
- Returns path to saved image


### Image Display Method

```python
def display_image(self, image_path):
```

**Parameters:**
- `image_path`: Path to image file

**Implementation:**
- Opens image using PIL
- Displays using matplotlib
- Configures display settings (10x10 figure size)

In [None]:
class DallEDemo:
    def __init__(self, config: Config):
        """
        Initialize DALL-E Demo with configuration
        """
        self.config = config
        self.client = OpenAI(api_key=self.config.api_key)
        self.output_dir = self.config.output_directory
        os.makedirs(self.output_dir, exist_ok=True)

    def generate_image(self, prompt, size=None, quality=None, n=1):
        """
        Generate an image using DALL-E 3
        """
        try:
            response = self.client.images.generate(
                model=self.config.default_model,
                prompt=prompt,
                size=size or self.config.default_image_size,
                quality=quality or self.config.default_quality,
                n=n
            )
            
            image_url = response.data[0].url
            image_path = self._save_image(image_url, f"generated_{len(os.listdir(self.output_dir))}.png")
            return image_path, response.data[0].revised_prompt
        
        except Exception as e:
            print(f"Error generating image: {e}")
            return None, None

    def create_variation(self, image_path, n=1, size=None):
        """
        Create variations of an existing image
        """
        try:
            response = self.client.images.create_variation(
                model="dall-e-2",  # DALL-E 2 for variations
                image=open(image_path, "rb"),
                n=n,
                size=size or self.config.default_image_size
            )
            
            variations = []
            for i, data in enumerate(response.data):
                image_path = self._save_image(
                    data.url, 
                    f"variation_{len(os.listdir(self.output_dir))}_{i}.png"
                )
                variations.append(image_path)
            return variations
        
        except Exception as e:
            print(f"Error creating variations: {e}")
            return None

    def _save_image(self, image_url, filename):
        """
        Save image from URL to local file
        """
        response = requests.get(image_url)
        image_path = os.path.join(self.output_dir, filename)
        
        with open(image_path, 'wb') as f:
            f.write(response.content)
        return image_path

    def display_image(self, image_path):
        """
        Display an image using matplotlib
        """
        img = Image.open(image_path)
        plt.figure(figsize=(10, 10))
        plt.imshow(img)
        plt.axis('off')
        plt.show()

## Driver Code

In [None]:
def main():
    try:
        # Initialize configuration
        config = Config()
        
        # Set API key
        api_key = input("Please enter your OpenAI API key: ").strip()
        config.set_api_key(api_key)
        
        # Initialize DALL-E demo with config
        demo = DallEDemo(config)
        
        # Generate image
        print("\nGenerating new image...")
        prompt = "A futuristic city with flying cars and neon lights"
        image_path, revised_prompt = demo.generate_image(prompt)
        
        if image_path:
            print(f"Original prompt: {prompt}")
            print(f"Revised prompt: {revised_prompt}")
            print(f"Image saved to: {image_path}")
            demo.display_image(image_path)
        
        # Create variations
        if image_path:
            print("\nCreating variations...")
            variations = demo.create_variation(image_path, n=2)
            if variations:
                print(f"Created {len(variations)} variations")
                for var_path in variations:
                    demo.display_image(var_path)

    except Exception as e:
        print(f"An error occurred: {str(e)}")
        print("\nTroubleshooting steps:")
        print("1. Make sure your API key is correct")
        print("2. Check your internet connection")
        print("3. Verify you have sufficient API credits")

if __name__ == "__main__":
    main()