# Lesson 3 Project: Image Generation & Editing with DALL-E

## Introduction

Welcome to Lesson 3 of our course on multimodal AI! Today, you're stepping into the fascinating world of AI-powered image generation and editing using DALL-E. Imagine being able to create or modify images simply by describing what you want in words.

In this lesson, you'll learn the art of crafting effective text prompts to generate the images you envision, and discover how to harness DALL-E's power for image editing tasks.

By the end of this lesson, you will be able to:
- Use DALL-E API for generating images based on text prompts
- Implement an image editing feature using DALL-E
- Combine text and image generation in a single application

Get ready to turn your words into visuals and push the boundaries of creativity with DALL-E!

## Setting Up OpenAI Development Environment

Refer to the Python Crash Course lesson to learn how to set up your OpenAI development environment.

In [None]:
# Install the libraries
!pip install openai python-dotenv Pillow matplotlib

# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
# Make sure OPENAI_API_KEY=... exists in .env
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()

## Using DALL-E API for Generating Images

DALL-E is a powerful AI model that can generate images from textual descriptions. To begin, you can generate an image using DALL-E 3.

In [None]:
dalle_model = "dall-e-3"

dalle_prompt = "a samurai cat is eating ramen"

# Choose a size between 1024x1024, 1024x1792 or 1792x1024 for DALL-E 3
image_size = "1024x1792"

image_quality = "standard"

response = client.images.generate(
  model=dalle_model,
  prompt=dalle_prompt,
  size=image_size,
  quality=image_quality,
  n=1,
)

image_url = response.data[0].url

Now, you can download the image.

In [None]:
# Import libraries
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt

# Download the image
response = requests.get(image_url)

To view the image, you can use the `matplotlib` library.

In [None]:
# Open the image
img = Image.open(BytesIO(response.content))

# Display the image
plt.imshow(img)
plt.axis('off')
plt.show()

### Parameters for DALL-E 3

You can experiment with a different quality by using the `hd` value and select a different size, like `1792x1024`.

In [None]:
image_quality = "hd"

image_size = "1792x1024"

response = client.images.generate(
  model=dalle_model,
  prompt=dalle_prompt,
  size=image_size,
  quality=image_quality,
  n=1,
)

image_url = response.data[0].url

response = requests.get(image_url)

img = Image.open(BytesIO(response.content))

plt.imshow(img)
plt.axis('off')
plt.show()

Another parameter is `style`. The default value, `vivid`, produces a hyper-realistic image. You can try the `natural` style instead.

In [None]:
image_quality = "standard"

image_style = "natural"

response = client.images.generate(
  model=dalle_model,
  prompt=dalle_prompt,
  size=image_size,
  quality=image_quality,
  style=image_style,
  n=1,
)

image_url = response.data[0].url

response = requests.get(image_url)

img = Image.open(BytesIO(response.content))

plt.imshow(img)
plt.axis('off')
plt.show()

### Parameters for DALL-E 2

With DALL-E 2, you can generate multiple images in a single API call. However, the `style` and `quality` parameters are not available, and only square sizes like `256x256`, `512x512`, or `1024x1024` are supported.

In [None]:
dalle_model = "dall-e-2"

image_size = "512x512"

dalle_prompt = "a samurai cat is singing on a stage"

# Generate 4 images with DALL-E 2
response = client.images.generate(
  model=dalle_model,
  prompt=dalle_prompt,
  size=image_size,
  n=4,
)

fig, axes = plt.subplots(2, 2, figsize=(10, 10))

# Loop through the images and display them
for i in range(4):
    image_url = response.data[i].url
    img_response = requests.get(image_url)
    img = Image.open(BytesIO(img_response.content))

    # Determine the position in the grid
    row, col = divmod(i, 2)
    axes[row, col].imshow(img)
    axes[row, col].axis('off')

plt.show()

### Response Format

You've previously displayed images inline from URLs. If you want to save images to local storage, you can follow these steps:

In [None]:
response = client.images.generate(
  model=dalle_model,
  prompt=dalle_prompt,
  size=image_size,
  n=1,
)

image_url = response.data[0].url

response = requests.get(image_url)

img = Image.open(BytesIO(response.content))
img.save("samurai_cat_singing.png")

Instead of receiving an image via a URL, you can also get the image in `base64` format by using the `response_format` parameter. The default value is `url`, but you can switch to the `b64_json` value.

In [None]:
image_response_format = "b64_json"

response = client.images.generate(
  model=dalle_model,
  prompt=dalle_prompt,
  size=image_size,
  response_format=image_response_format,
  n=1
)

Next, extract the `base64` string from the JSON response before loading it into the `Image` object.

In [None]:
import base64

img = response.data[0]

b64_string = img.b64_json 

# Decode the base64 string
img_data = base64.b64decode(b64_string)

img = Image.open(BytesIO(img_data))

plt.figure(figsize=(10, 10))
plt.imshow(img)
plt.axis('off')
plt.show()

## Using DALL-E 2 API for Variations

With DALL-E 2, you can create variations of an image. First, you'll want to view an image, such as the Kodeco logo.

In [None]:
logo_path = "images/kodeco.png"

image = Image.open(logo_path)

plt.imshow(image)
plt.axis('off')
plt.show()

Now, generate variations of this logo. This feature is particularly useful when designing a logo for your next startup.

In [None]:
with open(logo_path, "rb") as f:
    response = client.images.create_variation(
      model="dall-e-2",
      image=f,
      n=4,
      size="512x512"
    )

fig, axes = plt.subplots(2, 2, figsize=(10, 10))

# Loop through the images and display them
for i in range(4):
    image_url = response.data[i].url
    img_response = requests.get(image_url)
    img = Image.open(BytesIO(img_response.content))

    # Determine the position in the grid
    row, col = divmod(i, 2)
    axes[row, col].imshow(img)
    axes[row, col].axis('off')

plt.show()

## Implementing Image Editing with DALL-E API

DALL-E also allows you to edit existing images, but this feature is currently available only with DALL-E 2, so you must use the correct model.

First, take a look at the image you want to edit with DALL-E—it's a cat CEO image!

In [None]:
cat_ceo_image_path = "images/cat_ceo.png"

img = Image.open(cat_ceo_image_path)

plt.imshow(img)
plt.axis('off')
plt.show()

This is a cool cat CEO! But you want to make it even cooler by adding a computer on the table. To do this, you need to delete parts of the table and window and replace them with transparent pixels, creating a mask image.

To create the mask, use an image editor like Photoshop, Gimp, or Photopea. Remove the parts of the image where you want the DALL-E API to generate new pixels. You can also use an online tool like Online PNG Tools, which is convenient if you don't want to install software.

Here's how to do it with Gimp:

In [None]:
from IPython.display import Video

video_path = 'videos/add_transparency_in_gimp.mp4'

Video(video_path, width=600, height=400)

Now that you have both the image and the mask image, you can edit the image using the DALL-E API.

In [None]:
original_image_path = "images/cat_ceo.png"
mask_image_path = "images/cat_ceo_mask.png"

image_prompt = "a highly detailed laptop on top of the table."

with open(original_image_path, "rb") as image_file, open(mask_image_path, "rb") as mask_file:
    edit_response = client.images.edit(
        image=image_file,
        mask=mask_file,
        prompt=image_prompt,
        n=1,
        size="1024x1024"
    )

After generating the edited image, download and display it!

In [None]:
image_url = edit_response.data[0].url

response = requests.get(image_url)

img = Image.open(BytesIO(response.content))

plt.imshow(img)
plt.axis('off')
plt.show()

## Combining Text and Image Generation

Next, you'll create a simple application that combines text and image generation. This application will be a food recipe generator that provides both a recipe and an image of the dish. For text generation, you'll use a different OpenAI API, not the DALL-E API. Please refer to the previous course on OpenAI text generation for guidance.

In [None]:
def generate_recipe(food: str) -> str:

    # Generate ingredients
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You're an expert in culinary and cooking."},
            {
                "role": "user",
                "content": f"Provide recipe of {food}."
            }
        ]
    )
    
    recipe_description = completion.choices[0].message.content

    dalle_prompt = f"a hyper-realistic image of {food}"
    dalle_model = "dall-e-3"
    image_size = "1792x1024"

    # Image Generation
    response = client.images.generate(
      model=dalle_model,
      prompt=dalle_prompt,
      size=image_size,
      n=1,
    )
    
    image_url = response.data[0].url
    
    response = requests.get(image_url)
    
    img = Image.open(BytesIO(response.content))

    # Displaying the image
    plt.imshow(img)
    plt.axis('off')
    plt.show()

    # You can also save the image if you want

    return recipe_description

To execute the function, why not start by getting the recipe for Chicken Tikka Masala?

In [None]:
chicken_tikka_masala_recipe = generate_recipe("Chicken Tikka Masala")
print(chicken_tikka_masala_recipe)

Then, test the application with another dish. How about Spaghetti Bolognese?

In [None]:
spaghetti_bolognese_recipe = generate_recipe("Spaghetti Bolognese")
print(spaghetti_bolognese_recipe)