# Import Required Libraries

This notebook demonstrates how to use image generation capabilities. The examples include generating images from text prompts and editing images using masks.

```python
# Example Python code
import openai
import sys

# Initialize the client
client = openai.Client()
```

The above code initializes the OpenAI client. Ensure you have the API key set up in your environment.

# Generating Images

To generate an image, use the `images.create` method. Provide a text prompt and specify the desired image size.

```python
# Generate an image
response = client.images.create(
  model="gpt-image-1",
  prompt="A futuristic cityscape at sunset",
  n=1,
  size="1024x1024"
)

# Save the image
with open("generated_image.png", "wb") as f:
    f.write(response["data"][0]["image"])
```

The above code generates an image and saves it as `generated_image.png`.

# Editing Images

You can edit an existing image by providing a mask and a prompt. The mask specifies the areas to edit.

```python
# Edit an image
response = client.images.edit(
  model="gpt-image-1",
  image=open("input_image.png", "rb"),
  mask=open("mask.png", "rb"),
  prompt="Add a tree to the empty field",
  n=1,
  size="1024x1024"
)

# Save the edited image
with open("edited_image.png", "wb") as f:
    f.write(response["data"][0]["image"])
```

This code edits the input image based on the mask and saves the result.

# Image Quality

The `quality` parameter allows you to specify the quality of the generated image. Higher quality images take longer to generate.

```python
# Generate a high-quality image
response = client.images.create(
  model="gpt-image-1",
  prompt="A detailed painting of a mountain landscape",
  n=1,
  quality="high",
  size="2048x2048"
)

# Save the high-quality image
with open("high_quality_image.png", "wb") as f:
    f.write(response["data"][0]["image"])
```

This code generates a high-quality image and saves it.

# Background Parameter

The `background` parameter can be used to specify the background of the generated image. This is useful for creating images with transparent or specific backgrounds.

```python
# Generate an image with a transparent background
response = client.images.create(
  model="gpt-image-1",
  prompt="A logo for a tech company",
  n=1,
  background="transparent",
  size="512x512"
)

# Save the image with a transparent background
with open("logo.png", "wb") as f:
    f.write(response["data"][0]["image"])
```

This code generates an image with a transparent background.

### Using masks

Editing mode provides a mask parameter that we can use to specify the areas where the image should be edited. The mask must be a PNG image of at most 4 MB and have the same size as the image. Areas with 100% transparency correspond to the areas that GPT Image 1 is allowed to edit.

We provide the mask in the same way as the image, except it isn't a list in this case:

```python
img = client.images.edit(
  model="gpt-image-1",
  image=[
    open(sys.argv[1], "rb"),
  ],
  # We provide the mask like this
  mask=open("mask.png", "rb"),
  prompt=prompt,
  n=1,
  quality="high",
  size="1536x1024",
)
```

However, when I experimented with it, it didn't work very well, and I've seen reports online of people with similar issues.

Here's an example:

![](../resources/img/gpt-image-1-masks.png)

I've also tried using it to add elements at specific locations, and it didn't work consistently. Just like using the background parameter for image generation, I found that describing what I want in the prompt works best.

### Using multiple images

The model can process and combine multiple images at once. In the example below, we use it to create a marketing poster combining the images of these three individual drinks:

![](../resources/img/gpt-image-1-multi-imgs.png)

We provide the three images as a list in the `image` parameter, as follows:

```python
prompt = """
Create a vibrant and eye-catching marketing poster to 
promote the cold drinks offerings at our coffee shop.
"""

img = client.images.edit(
  model="gpt-image-1",
  # We can provide multiple images at once
  image=[
    open("latte.png", "rb"),
    open("americano.png", "rb"),
    open("icetea.png", "rb"),
  ],
  prompt=prompt,
  size="1536x1024",
)
```

Here’s the result:

![](../resources/img/gpt-image-2-multi-imgs.png)

## GPT-Image-1 Pricing

Generating images is charged based on:

1. The number of tokens in the text prompt.
2. The number of tokens in the input images.
3. The number of tokens in the output image.

![](../resources/img/gpt-image-pricing.png)

Sometimes it’s hard to get an idea of what these costs represent because we don’t know how many tokens an image consists of. 

Because the dimensions of the output images are known, we know how many tokens are required for each, so we can give precise values for the output image tokens price (which is the most expensive part):

![](../resources//img/gpt-image-pricing-1.png)

This pricing depends on to quality of the image and the size. For more details, check the [GPT Image 1 pricing](https://platform.openai.com/docs/models/gpt-image-1) page.

When we generate an image, the API returns the number of tokens it used, so we can combine it with the above information to know exactly how much it costs. 

We can display the amount of tokens used by printing out the usage field of the result:

```python
img = client.images.generate(
  model="gpt-image-1",
  prompt=prompt,
  background="transparent",
  n=1,
  quality="medium",
  size="1024x1024",
  moderation="auto",
  output_format="png",
)
# Add this to see the usage
print("Prompt tokens:", img.usage.input_tokens_details.text_tokens)
print("Input images tokens:", img.usage.input_tokens_details.image_tokens)
print("Output image tokens:", img.usage.output_tokens)
```

Output:

```bash
Prompt tokens: 8
Input images tokens: 0
Output image tokens: 272
```
 

## Conclusion

Despite a few shortcomings from the API, like masking and transparency not being reliable enough, the model can execute with high precision the instructions provided in the prompt.

I think this model opens up many possibilities for building around it. In this tutorial, we learned the basics of how to use it. Here are a few ideas you might wanna explore to build on top of what you learned here:

* Streamlining the conversion of phone food photos into beautiful food photography to be used by restaurants in their menus.
* Based on a photo of a friend or a selfie, create a sticker pack expressing several emotions to be used in chat apps.
* Create a tool that, given the descriptions of individual scenes, creates a comic book strip from those scenes.

