# Building an Image Generation Application 

LLMs aren't just for generating text. You can also create images from text descriptions. Having images as another way to interact can be extremely helpful in fields like MedTech, architecture, tourism, game development, and more. In this chapter, we'll explore the two most popular image generation models: DALL-E and Midjourney.

## Introduction 

In this lesson, we'll cover:

- What image generation is and why it's useful.
- What DALL-E and Midjourney are, and how they work.
- How to build an image generation app.

## Learning Goals 

By the end of this lesson, you will be able to:

- Build an image generation application.
- Set boundaries for your app using meta prompts.
- Work with DALL-E and Midjourney.

## Why build an image generation application?

Image generation apps are a great way to see what Generative AI can do. They can be used for things like:  

- **Image editing and synthesis**. You can create images for many different purposes, such as editing or combining images.  

- **Useful across many industries**. They can also be used to generate images for industries like MedTech, tourism, game development, and more. 

## Scenario: Edu4All 

In this lesson, we'll keep working with our startup, Edu4All. The students will create images for their assignments. What kind of images is up to them—they might make illustrations for their own fairy tales, design a new character for their story, or create visuals to help explain their ideas and concepts. 

For example, if Edu4All's students are working on a lesson about monuments, they could generate something like this:

![Edu4All startup, class on monuments, Eifel Tower](../../../../translated_images/startup.94d6b79cc4bb3f5afbf6e2ddfcf309aa5d1e256b5f30cc41d252024eaa9cc5dc.en.png)

using a prompt like 

> "Dog next to Eiffel Tower in early morning sunlight"

## What is DALL-E and Midjourney? 

[DALL-E](https://openai.com/dall-e-2?WT.mc_id=academic-105485-koreyst) and [Midjourney](https://www.midjourney.com/?WT.mc_id=academic-105485-koreyst) are two of the most popular image generation models. They let you use prompts to create images.

### DALL-E

Let's start with DALL-E, which is a Generative AI model that creates images from text descriptions. 

> [DALL-E is a combination of two models, CLIP and diffused attention](https://towardsdatascience.com/openais-dall-e-and-clip-101-a-brief-introduction-3a4367280d4e?WT.mc_id=academic-105485-koreyst).  

- **CLIP** is a model that creates embeddings, which are numerical representations of data, from both images and text.  

- **Diffused attention** is a model that generates images from those embeddings. DALL-E is trained on a dataset of images and text, and can generate images from text descriptions. For example, DALL-E can create an image of a cat in a hat, or a dog with a mohawk. 

### Midjourney
 
Midjourney works similarly to DALL-E—it generates images from text prompts. You can use Midjourney to create images with prompts like “a cat in a hat” or “a dog with a mohawk”. 

 

![Image generated by Midjourney, mechanical pigeon](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Rupert_Breheny_mechanical_dove_eca144e7-476d-4976-821d-a49c408e4f36.png/440px-Rupert_Breheny_mechanical_dove_eca144e7-476d-4976-821d-a49c408e4f36.png?WT.mc_id=academic-105485-koreyst)

*Image credit: Wikipedia, image generated by Midjourney*

## How does DALL-E and Midjourney Work 

First, [DALL-E](https://arxiv.org/pdf/2102.12092.pdf?WT.mc_id=academic-105485-koreyst). DALL-E is a Generative AI model based on the transformer architecture with an *autoregressive transformer*.  

An *autoregressive transformer* is a way for a model to generate images from text descriptions. It creates one pixel at a time, then uses the pixels it has already made to generate the next one. This process goes through multiple layers in a neural network until the image is finished.  

With this approach, DALL-E can control the attributes, objects, characteristics, and more in the images it creates. However, DALL-E 2 and 3 offer even more control over the generated images,


## Building your first image generation application

So what do you need to build an image generation application? You’ll need the following libraries:

- **python-dotenv**: It’s highly recommended to use this library to keep your secrets in a *.env* file, separate from your code.
- **openai**: This is the library you’ll use to interact with the OpenAI API.
- **pillow**: For working with images in Python.
- **requests**: To help you make HTTP requests.

1. Create a *.env* file with the following content:

    ```text
    AZURE_OPENAI_ENDPOINT=<your endpoint>
    AZURE_OPENAI_API_KEY=<your key>
    ```

    You can find this information in the Azure Portal for your resource under the "Keys and Endpoint" section.


1. Gather the above libraries in a file named *requirements.txt* as follows:

    ```text
    python-dotenv
    openai
    pillow
    requests
    ```

1. Next, create a virtual environment and install the libraries:


In [None]:
# create virtual env
! python3 -m venv venv
# activate environment
! source venv/bin/activate
# install libraries
# pip install -r requirements.txt, if using a requirements.txt file 
! pip install python-dotenv openai pillow requests

> [!NOTE]
> For Windows, use the following commands to create and activate your virtual environment:

    ```bash
    python3 -m venv venv
    venv\Scripts\activate.bat
    ```

1. Add the following code in a file called *app.py*:

    ```python
    import openai
    import os
    import requests
    from PIL import Image
    import dotenv
    
    # import dotenv
    dotenv.load_dotenv()
    
    # Get endpoint and key from environment variables
    openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
    openai.api_key = os.environ['AZURE_OPENAI_API_KEY']     
    
    # Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
    openai.api_version = '2023-06-01-preview'
    openai.api_type = 'azure'
    
    
    try:
        # Create an image by using the image generation API
        generation_response = openai.Image.create(
            prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
            size='1024x1024',
            n=2,
            temperature=0,
        )
        # Set the directory for the stored image
        image_dir = os.path.join(os.curdir, 'images')
    
        # If the directory doesn't exist, create it
        if not os.path.isdir(image_dir):
            os.mkdir(image_dir)
    
        # Initialize the image path (note the filetype should be png)
        image_path = os.path.join(image_dir, 'generated-image.png')
    
        # Retrieve the generated image
        image_url = generation_response["data"][0]["url"]  # extract image URL from response
        generated_image = requests.get(image_url).content  # download the image
        with open(image_path, "wb") as image_file:
            image_file.write(generated_image)
    
        # Display the image in the default image viewer
        image = Image.open(image_path)
        image.show()
    
    # catch exceptions
    except openai.InvalidRequestError as err:
        print(err)

    ```

Let's break down this code:

- First, we import the necessary libraries, including the OpenAI library, the dotenv library, the requests library, and the Pillow library.

    ```python
    import openai
    import os
    import requests
    from PIL import Image
    import dotenv
    ```

- Next, we load the environment variables from the *.env* file.

    ```python
    # import dotenv
    dotenv.load_dotenv()
    ```

- After that, we set the endpoint, key for the OpenAI API, version, and type.

    ```python
    # Get endpoint and key from environment variables
    openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
    openai.api_key = os.environ['AZURE_OPENAI_API_KEY'] 

    # add version and type, Azure specific
    openai.api_version = '2023-06-01-preview'
    openai.api_type = 'azure'
    ```

- Next, we generate the image:

    ```python
    # Create an image by using the image generation API
    generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )
    ```

    The code above returns a JSON object that contains the URL of the generated image. We can use this URL to download the image and save it to a file.

- Finally, we open the image and use the default image viewer to display it:

    ```python
    image = Image.open(image_path)
    image.show()
    ```

### More details on generating the image

Let's take a closer look at the code that generates the image:

```python
generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )
```

- **prompt** is the text prompt used to generate the image. In this example, we're using the prompt "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils".
- **size** is the size of the generated image. Here, we're generating an image that is 1024x1024 pixels.
- **n** is the number of images to generate. In this case, we're generating two images.
- **temperature** is a parameter that controls the randomness of the output from a Generative AI model. The temperature is a value between 0 and 1, where 0 means the output is deterministic and 1 means the output is random. The default value is 0.7.

There are more things you can do with images, which we'll cover in the next section.

## Additional capabilities of image generation

So far, you've seen how we can generate an image with just a few lines of Python. But there are more things you can do with images.

You can also:

- **Edit images**. By providing an existing image, a mask, and a prompt, you can modify an image. For example, you can add something to a specific part of an image. Imagine our bunny image—you could add a hat to the bunny. To do this, you provide the image, a mask (to identify the area to change), and a text prompt describing what should be done.

    ```python
    response = openai.Image.create_edit(
      image=open("base_image.png", "rb"),
      mask=open("mask.png", "rb"),
      prompt="An image of a rabbit with a hat on its head.",
      n=1,
      size="1024x1024"
    )
    image_url = response['data'][0]['url']
    ```

    The base image would only have the rabbit, but the final image would show the rabbit with a hat.

- **Create variations**.
    Check out our [OpenAI notebook for more information](./oai-assignment.ipynb?WT.mc_id=academic-105485-koreyst).



---

**Disclaimer**:
This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we strive for accuracy, please be aware that automated translations may contain errors or inaccuracies. The original document in its native language should be considered the authoritative source. For critical information, professional human translation is recommended. We are not liable for any misunderstandings or misinterpretations arising from the use of this translation.
