# Lesson 3 Project: Image Generation & Editing with DALL-E

## Introduction

Welcome to Lesson 3 of our course on multimodal AI! Today, you're stepping into the fascinating world of AI-powered image generation and editing using DALL-E. Imagine being able to create or modify images simply by describing what you want in words.

In this lesson, you'll learn the art of crafting effective text prompts to generate the images you envision, and discover how to harness DALL-E's power for image editing tasks.

By the end of this lesson, you will be able to:
- Use DALL-E API for generating images based on text prompts
- Implement an image editing feature using DALL-E
- Combine text and image generation in a single application

Get ready to turn your words into visuals and push the boundaries of creativity with DALL-E!

## Setting Up OpenAI Development Environment

Refer to the Python Crash Course lesson to learn how to set up your OpenAI development environment.

In [None]:
# Install dependencies
!pip install Pillow

In [None]:
# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()

## Using DALL-E API for Generating Images

DALL-E is a powerful AI model that can generate images from textual descriptions.

First, you want to import some libraries.

In [None]:
# Import necessary libraries
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
import base64

Here, you imported:

- `requests` for making HTTP requests.
- `PIL` (from `Pillow`) for image manipulation.
- `BytesIO` (from `io`) for handling binary data.
- `matplotlib.pyplot` for displaying images.
- `base64` for encoding and decoding image data.

## Writing Helper Functions

Next, you want to define a function to generate an image using DALL-E.

In [None]:
# Define a function to generate an image using DALL-E
def generate_image(client, model, prompt, size, quality=None, style=None, response_format='url', n=1):
    params = {
        'model': model,
        'prompt': prompt,
        'size': size,
        'n': n,
        'response_format': response_format
    }
    if style:
        params['style'] = style
    if quality:
        params['quality'] = quality
    response = client.images.generate(**params)
    return response

Here, you defined a function generate_image that takes several parameters to generate an image using the DALL-E model. This function allows you to specify the model, prompt, size, quality, style, response format, and the number of images to generate.

To generate an image, you use the client.images.generate method. It accepts `model`, `prompt`, `size`, and other optional arguments. The `style` and `quality` parameters are only supported by the DALL-E 3 model, not by the DALL-E 2 model. That's why these parameters are only defined if they are explicitly provided.

Now, you want to define a function to display an image from a URL.

In [None]:
# Define a function to display an image from a URL
def display_image_from_url(image_url):
    response = requests.get(image_url)
    img = Image.open(BytesIO(response.content))
    plt.imshow(img)
    plt.axis('off')
    plt.show()

Here, you defined a function `display_image_from_url` that takes an image URL as input. This function downloads the image using the `requests` library, opens it with `PIL`, and displays it using `matplotlib.pyplot`.

Next, you want to define a function to display an image from a base64 string.

In [None]:
# Define a function to display an image from base64
def display_image_from_base64(b64_string):
    img_data = base64.b64decode(b64_string)
    img = Image.open(BytesIO(img_data))
    plt.imshow(img)
    plt.axis('off')
    plt.show()

Here, you defined a function `display_image_from_base64` that takes a base64-encoded string as input. This function decodes the base64 string using the `base64` module, opens the image with `PIL`, and displays it using `matplotlib.pyplot`.

Sometimes you don't want to display the image, but you want to save the image. Define a function to save an image to local storage.

In [None]:
# Define a function to save an image to local storage
def save_image_to_local(image_url, filename):
    response = requests.get(image_url)
    img = Image.open(BytesIO(response.content))
    img.save(filename)

Here, you defined a function `save_image_to_local` that takes an image URL and a filename as input. This function downloads the image using the `requests` library, opens it with `PIL`, and saves it to the specified filename.

Next, you want to define a function to display multiple images in a grid. This is useful when you generate more than 1 image.

In [None]:
# Define a function to display multiple images in a grid
def display_images_in_grid(image_urls):
    num_images = len(image_urls)
    grid_size = int(num_images**0.5)
    fig, axes = plt.subplots(grid_size, grid_size, figsize=(10, 10))
    for i, image_url in enumerate(image_urls):
        response = requests.get(image_url)
        img = Image.open(BytesIO(response.content))
        row, col = divmod(i, grid_size)
        axes[row, col].imshow(img)
        axes[row, col].axis('off')
    plt.show()

Here, you defined a function `display_images_in_grid` that takes a list of image URLs as input. This function calculates the grid size based on the number of images, downloads each image using the `requests` library, opens it with `PIL`, and displays all images in a grid using `matplotlib.pyplot`.

## Generating Images using DALL-3

After defining functions, now you can generate and display an image with DALL-E 3.

In [None]:
# Generate and display an image with DALL-E 3
dalle_model = "dall-e-3"
dalle_prompt = "a samurai cat is eating ramen"
image_size = "1024x1792"
image_quality = "standard"

response = generate_image(client, dalle_model, dalle_prompt, image_size, image_quality)
image_url = response.data[0].url
display_image_from_url(image_url)

Here, you set up parameters for the DALL-E 3 model, including the prompt, image size, and quality. You then use the `generate_image` function to create the image and get the URL of the generated image. Finally, you use the `display_image_from_url` function to display the image. You use the `dall-e-3` model, the `1024x1792` image size, the `standard` image quality, and the most important parameter, the `prompt` parameter. You conjure the image of "a samurai cat is eating ramen" with the `prompt` parameter. Your wish is granted!

You now want to change the image quality and size while generating an image with DALL-E 3. Why don't you create a landscape image?

In [None]:
# Change the image quality and size
image_quality = "hd"
image_size = "1792x1024"

response = generate_image(client, dalle_model, dalle_prompt, image_size, image_quality)
image_url = response.data[0].url
display_image_from_url(image_url)

Here, you updated the image quality to `hd` and the image size to `1792x1024`. You then used the `generate_image` function to create the image with the new parameters and get the URL of the generated image. Finally, you used the `display_image_from_url` function to display the image.

You now want to change the image quality and style while generating an image with DALL-E 3.

In [None]:
# Change image quality and style
image_quality = "standard"
image_style = "natural"

response = generate_image(client, dalle_model, dalle_prompt, image_size, image_quality, image_style)
image_url = response.data[0].url
display_image_from_url(image_url)

Here, you updated the image quality to `standard` and added a new parameter, `image_style`, set to `natural`. You then used the `generate_image` function to create the image with the new parameters and get the URL of the generated image. Finally, you used the `display_image_from_url` function to display the image.

As you can see, the image doesn't have the high-definition quality.

## Generating Images Using DALL-2

You now want to generate multiple images with DALL-E 2 using the value 4 for the `n` parameter and display them in a grid. The value bigger than 1 for the `n` parameter is only supported in DALL-E 2.

However, the `style` and `quality` parameters are not available, and only square sizes like `256x256`, `512x512`, or `1024x1024` are supported.

In [None]:
# Generate multiple images with DALL-E 2
dalle_model = "dall-e-2"
image_size = "512x512"
dalle_prompt = "a samurai cat is singing on a stage"

response = generate_image(client, dalle_model, dalle_prompt, image_size, n=4)
image_urls = [img.url for img in response.data]
display_images_in_grid(image_urls)

Here, you set up the parameters for the DALL-E 2 model, including the prompt and image size. You then used the `generate_image` function to create four images and obtained their URLs. Finally, you used the `display_images_in_grid` function to display the images in a grid.

## Response Format

You can receive images not from URLs but with the base64 string from the response. Generate an image in base64 format and display it.

In [None]:
# Generate an image in base64 format
image_response_format = "b64_json"

response = generate_image(client, dalle_model, dalle_prompt, image_size, response_format=image_response_format)
b64_string = response.data[0].b64_json
display_image_from_base64(b64_string)

Here, you specify the response format as `b64_json` to get the image in base64 format. You then use the `generate_image` function to create the image and obtain the base64 string. Finally, you use the `display_image_from_base64` function to display the image.

## Saving Images to Local Storage

You now want to save the generated image to a local file.

In [None]:
# Save the image to a local file
file_path = "samurai_cat_singing_on_stage.png"

response = generate_image(client, dalle_model, dalle_prompt, image_size, image_quality)
image_url = response.data[0].url
save_image_to_local(image_url, file_path)

Here, you specify the file path where you want to save the image. After generating the image, you use the `save_image_to_local` function to save the image to the specified file path. There will be a new image file inside the current directory.

## Using DALL-E 2 API for Variations

With DALL-E 2, you can create variations of an image. First, you'll want to view an image, such as the Kodeco logo.

In [None]:
# Create variations of an image with DALL-E 2

# Define the path to the logo image
logo_path = "images/kodeco.png"

# Open the image
with open(logo_path, "rb") as f:
    # Call the API to create variations
    response = client.images.create_variation(
      model="dall-e-2",
      image=f,
      n=4,
      size="512x512"
    )

# Display images in a grid
image_urls = [img.url for img in response.data]
display_images_in_grid(image_urls)

## Implementing Image Editing with DALL-E API

DALL-E also allows you to edit existing images, but this feature is currently available only with DALL-E 2, so you must use the correct model.

First, take a look at the image you want to edit with DALL-E—it's a cat CEO image!

In [None]:
# Display the original image

# Image path
cat_ceo_image_path = "images/cat_ceo.png"

# Open the image
img = Image.open(cat_ceo_image_path)

# Display the image
plt.imshow(img)
plt.axis('off')
plt.show()

This is a cool cat CEO! But you want to make it even cooler by adding a computer on the table. To do this, you need to delete parts of the table and window and replace them with transparent pixels, creating a mask image.

To create the mask, use an image editor like Photoshop, Gimp, or Photopea. Remove the parts of the image where you want the DALL-E API to generate new pixels. You can also use an online tool like Online PNG Tools, which is convenient if you don't want to install software.

Here's how to do it with Gimp:

In [None]:
from IPython.display import Video

video_path = 'videos/add_transparency_in_gimp.mp4'

Video(video_path, width=600, height=400)

Now that you have both the image and the mask image, you can edit the image using the DALL-E API.

In [None]:
# Edit an image using DALL-E 2

# Define the paths to the original image and mask image
original_image_path = "images/cat_ceo.png"
mask_image_path = "images/cat_ceo_mask.png"

# Define the prompt for the edit
image_prompt = "Show a dog CEO."

# Call the API to edit the image
with open(original_image_path, "rb") as image_file, open(mask_image_path, "rb") as mask_file:
    edit_response = client.images.edit(
        image=image_file,
        mask=mask_file,
        prompt=image_prompt,
        n=1,
        size="1024x1024"
    )

After generating the edited image, download and display it!

In [None]:
# Download and display the edited image

# Retrieve the image URL
image_url = edit_response.data[0].url

# Download the image
response = requests.get(image_url)

# Create an image object
img = Image.open(BytesIO(response.content))

# Display the image
plt.imshow(img)
plt.axis('off')
plt.show()

## Combining Text and Image Generation

Next, you'll create a simple application that combines text and image generation. This application will be a food recipe generator that provides both a recipe and an image of the dish. For text generation, you'll use a different OpenAI API, not the DALL-E API. Please refer to the previous course on OpenAI text generation for guidance.

In [None]:
# Combine text and image generation

# Function to generate a recipe with an image of the food
def generate_recipe(food: str) -> str:

    # Generate ingredients
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You're an expert in culinary and cooking."},
            {
                "role": "user",
                "content": f"Provide recipe of {food}."
            }
        ]
    )
    
    # Extract the recipe description
    recipe_description = completion.choices[0].message.content

    # Prompt for DALL-E image generation
    dalle_prompt = f"a hyper-realistic image of {food}"
    # DALL-E model and image size
    dalle_model = "dall-e-3"
    image_size = "1792x1024"

    # Image Generation
    response = client.images.generate(
      model=dalle_model,
      prompt=dalle_prompt,
      size=image_size,
      n=1,
    )
    
    # Retrieve the image URL
    image_url = response.data[0].url
    
    # Download the image
    response = requests.get(image_url)
    
    # Open the image
    img = Image.open(BytesIO(response.content))

    # Displaying the image
    plt.imshow(img)
    plt.axis('off')
    plt.show()

    # You can also save the image if you want

    # Return the recipe description
    return recipe_description

To execute the function, why not start by getting the recipe for Chicken Tikka Masala?

In [None]:
# Generate a recipe for Chicken Tikka Masala
chicken_tikka_masala_recipe = generate_recipe("Chicken Tikka Masala")
print(chicken_tikka_masala_recipe)

Then, test the application with another dish. How about Spaghetti Bolognese?

In [None]:
# Generate a recipe for Spaghetti Bolognese
spaghetti_bolognese_recipe = generate_recipe("Spaghetti Bolognese")
print(spaghetti_bolognese_recipe)