# Week 9: Generating images

**Artificial intelligence** has made significant progress in content generation, spanning from translating text into visuals to crafting intricate artworks and animations. This is particularly evident in image synthesis, where tools like **Midjourney, Stable Diffusion and DALL-E** have streamlined the process. Generative AI models underpin these capabilities, proving vital for individual creators and businesses. These models employ intricate algorithms to swiftly create high-quality images akin to input data, revolutionizing traditional approaches. Applications range from art and design, pushing creative boundaries, to medicine, where synthetic medical images aid diagnosis and training, leading to improved patient outcomes. Additionally, these models enhance immersive virtual environments for entertainment and gaming, fostering innovation across industries.

<img width="955" alt="Dall-E_Robot_Illustration" src="https://user-images.githubusercontent.com/108916132/221384925-ef576131-8b61-4d87-94e4-b1e44b1d61a2.png">

<img width="956" alt="Dall-E_Robot_2_Illustration" src="https://user-images.githubusercontent.com/108916132/221384935-f0516537-f0b3-4ea9-80a2-0f089cd09ada.png">

We discuss generative AI models for image synthesis, their importance, use cases and more.

### Table of contents:

* What are generative AI models?
* Understanding image synthesis and its importance
* Choosing the right dataset for modelling
* Preparing data for training
* Midjourney AI- How does it work?
* How to use Midjourney AI?
* DALL·E 2- How does it work?
* How to use DALL·E 2?
* Generating creative prompts automatically through code
* Implementing DALL·E 2 through python script
* Superpowers of generative AI models
* Use Generative AI models effectively
* Applications of generative AI models for image synthesis
* Quiz

### What are generative AI models?
Generative AI models belong to a category of machine learning algorithms capable of producing novel content by learning patterns from extensive training data. These models employ deep learning techniques to grasp patterns and features from the training data, which they leverage to generate new data instances.

These models find applications in diverse domains like image, text, code, video and music generation. One prominent example is the Generative Adversarial Network (GAN), comprising a generator network crafting new data and a discriminator network assessing the authenticity of generated samples.

With the potential to swiftly create novel and distinct content, generative AI models hold promise in revolutionizing sectors like entertainment, art, and fashion.

### Understanding image synthesis and its importance
Generative AI models are a subset of artificial intelligence that craft novel images resembling their training set. This image synthesis employs deep learning algorithms to learn patterns from extensive photo databases. These models can rectify missing or unclear elements, yielding impressive, lifelike visuals.

They can enhance the quality of low-resolution images, making them appear expertly captured. Moreover, they're adept at combining portraits or deriving features for lifelike synthetic human faces.

The crux of generative AI's worth in image synthesis lies in its capacity to create entirely new visuals. This has wide-reaching implications for creativity, product design, marketing, and scientific fields, including anatomical and medical modeling.

Commonly used models encompass variational autoencoders (VAEs), transformers, autoregressive models, and generative adversarial networks (GANs).

### Choosing the right dataset for modelling
Generative AI models heavily depend on their training dataset to create diverse and high-quality images. The dataset's size must adequately cover the picture domain's variety, especially for fields like medical imaging where illnesses, organs, and modalities are diverse.

Accurate labeling is essential. Images need proper labels for the model to grasp semantic properties. Labels indicate depicted objects or scenes and can be added through manual or automated methods.

Dataset quality matters. It must be error-free, artifact-free, and unbiased. Biases can lead the model to replicate undesired patterns in generated images.

Picking the right dataset is pivotal for successful generative AI models in image synthesis. It should be sizable, diverse, correctly labeled, and of high quality to ensure accurate and unbiased learning of the target picture domain.

### Preparing data for training
Preparing data for training a generative AI model intended for image synthesis encompasses data collection, preprocessing, augmentation, normalization, and division into training, validation, and testing sets. Each phase is vital to ensure accurate pattern and feature learning, enhancing image synthesis precision.

* **Data Collection:**<br>
The initial step involves gathering requisite data to train the generative AI model for image synthesis. The quality and quantity of collected data significantly impact the model's performance. Data can originate from web databases, stock photo archives, or commissioned projects.
<br>

* **Data Preprocessing:**<br>
Raw data undergoes preprocessing to render it usable and comprehensible to the model. In image data scenarios, this often involves cleaning, resizing, and formatting images to a consistent standard compatible with the model.
<br>

* **Data Augmentation:**<br>
This phase entails applying transformations to the original dataset, generating additional training examples artificially. It broadens training data diversity, crucial when working with limited datasets. Augmentation aids in preventing overfitting, a common machine learning challenge where models excel at training data but perform poorly on new data.
<br>

* **Data Normalization:**<br>
Pixel values are scaled to a predetermined range, usually 0 to 1. Normalization accelerates pattern and characteristic learning, preventing overfitting.
<br>

* **Data Division:**<br>
The data is split into training, validation, and testing sets. The validation set fine-tunes model hyperparameters, the testing set evaluates performance, and the training set is used for model training. The split ratio, often 70% training, 15% validation, and 15% testing, may vary based on dataset size.

### Midjourney AI- How does it work?

Midjourney is an example of generative AI that can convert natural language prompts into images. In some cases, images from Midjourney have even deceived experts in photography and other domains. Likewise, we have seen some extremely convincing AI-generated images on social media. Examples range from Pope Francis dressed in a puffer jacket to Trump supposedly getting arrested days before the actual event.

Midjourney relies on two relatively new machine learning technologies, namely **large language models and diffusion models**. A large language model first helps Midjourney understand the meaning of whatever we type into the prompts. This is then converted into what is known as a vector, which we can imagine as a numerical version of the prompt. Finally, the vector guides another complex process known as diffusion.

___Sidenote:___
<br>___Diffusion models___
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Diffusion models utilize the transformer architecture to generate data. The process begins by taking a piece of information, such as an image, and introducing increasingly more randomness over a series of time steps until it becomes unrecognizable. Then, the model attempts to restore the image back to its original form. Through this process, the model learns to create new images or other forms of data.

<img width="604" alt="Dall-E_Diffusion_Illustration" src="https://user-images.githubusercontent.com/108916132/221385104-9c373595-441c-4107-89d3-07333b8ba005.png">

The process of adding noise, depicted in the illustration, is seen as a Markov chain that modifies an image through incremental amounts of randomness until it becomes indistinguishable from pure Gaussian noise. The Diffusion Model is trained to retrace the steps of this process by removing the noise over a series of time steps. After completing training, the Diffusion Model can be divided into two parts, one of which can be utilized to generate new images by randomly sampling Gaussian noise and then using the model to remove the noise and produce a realistic image.

When we pair this mastery of image regeneration with the AI’s understanding of text-image relationships, we get a powerful tool that can generate realistic images based on your creative prompts.

### How to use Midjourney AI?

#### Step 1: Sign up for AI Midjourney
Go to https://www.midjourney.com/home/ and log in. You will then be redirected to discord.

#### Step 2: Look around in a newcomer group
If you found Midjourney's channel on discord, you will find the Newcomer Groups on the left side in the list. This is usually a busy place! Ideally, you should find a group that has less traffic at the moment.

![6414c6e08bc16c1e82ae49b8_Was ist AI Midjourney und wie funktioniert es_ - 23](https://github.com/Hmittal15/MedIQ-ChatBot/assets/108916132/c4a11eb6-3d32-4d83-a293-de76a82a0124)

#### Step 3: Create text prompts
In the chat box below you tell the tool what you want to have as an AI image. Short descriptions, as crisp and to the point as possible, work best. It is important to mention that you enter **/imagine** in the text field. After that a field will open and you can start with your image description.

<img width="517" alt="Screenshot 2023-08-18 153526" src="https://github.com/Hmittal15/MedIQ-ChatBot/assets/108916132/43a1a5cf-9411-4cdf-acb5-9e5966a5823f">

### DALL·E 2- How does it work?
DALL·E is a state-of-the-art artificial intelligence program developed by OpenAI that creates images from textual descriptions. Given a textual prompt, DALL·E generates an original image based on that description, showcasing its ability to understand and execute creative tasks. The model has been trained on a diverse dataset of images and can generate a wide range of images, from photorealistic to highly stylized illustrations. This technology has the potential to revolutionize fields such as advertising, product design, and visual storytelling.

At its core, DALL-E 2 operates in a straightforward manner:

* First, a text prompt is fed into a text encoder that has been trained to convert the prompt into a representative space.
* Then, a component referred to as the 'prior' takes the text encoding and maps it to a corresponding image encoding that reflects the semantic content of the text encoding.
* Finally, an image decoder generates a visual representation of this semantic information in the form of an image.

**Step 1: Text encoding**

The process begins by feeding a text encoder with the prompt which converts it into _'text embeddings'_. But what is a text embedding? An embedding can be described as an organized set of numbers that depict a text or image. A basic illustration of embeddings is seen in the ASCII representation of characters and numbers. However, unlike ASCII, embeddings are not predetermined but rather acquired through a neural network's learning process.

The text embeddings used by DALL·E 2 are produced by a separate neural network known as _'CLIP'_, developed by OpenAI.

The basic concept behind CLIP's training is straightforward.
<br>
1. Produce encodings of both image and text for each image-caption pair.
2. Determine the cosine similarity between each image and text embedding pair.
3. Continuously reduce the cosine similarity between mismatched image-caption pairs and increase the cosine similarity between corresponding image-caption pairs.

<img width="519" alt="Dall-E_CLIPTraining_Illustration" src="https://user-images.githubusercontent.com/108916132/221385094-d8a2e8db-882b-4985-bce5-c87c9ef89fe5.png">

**Step 2: Image encoding**


The generation of text and image encoding intermediates in the DALL·E 2 model are referred to as CLIP embeddings, however it is not the CLIP encoder itself that produces the image embeddings. Instead, a separate model called the _'prior'_ is utilized to generate the _'image embeddings'_ based on the text embeddings generated by the CLIP text encoder. During the development of DALL·E 2, two options were considered for the prior: an Autoregressive model and a Diffusion model. Both options showed similar performance, but the Diffusion model was found to be more computationally efficient and was ultimately chosen as the preferred prior for DALL·E 2. The use of a prior enables DALL·E 2 to create variations of images.
<br>The diffusion prior is a method that helps generate new image embeddings based on a given text description. This method uses a special type of machine learning model called a _'Transformer'_, which is trained on several different inputs.

**Step 3: Decoding**

OpenAI has also developed a model known as _'GLIDE'_ (Guided Language to Image Diffusion for Generation and Editing) as a variation of the decoder. Unlike an autoencoder which solely aims to reconstruct an image given its embedding, GLIDE's objective is to generate an image that preserves the key features of the original image based on its embedding. GLIDE represents an advanced form of the Diffusion Model by incorporating textual information into the training process. Unlike the standard Diffusion Model, which starts from random Gaussian noise and lacks direction, GLIDE enhances the Diffusion Model's generative capabilities by integrating additional textual embeddings into the model's existing time step embedding. This leads to the capability of text-conditional image generation. DALL-E 2 further enhanced the GLIDE model by integrating four additional context tokens into the output sequence of the GLIDE text encoder, resulting in the ability to edit images through the use of text prompts.

<img width="468" alt="Dall-E_GLIDE_Illustration" src="https://user-images.githubusercontent.com/108916132/221385139-d08b6323-c460-4e47-ba9b-ba2a979e0fa1.png">

### How to use DALL·E 2?

#### Step 1: Create an account on OpenAI's DALL-E 2's website
Before you start generating images, visit DALL-E 2's website (https://openai.com/dall-e-2) and click the "Try DALL-E" button. The site will either prompt you to create an account or sign in. You can do this on either a mobile or web browser.

#### Step 2: Start creating
Once you're logged in, you're ready to start creating. At the top of the webpage is the search bar. You will also find other images created by DALL-E 2 artists underneath it.

<img width="939" alt="Screenshot 2023-08-18 155706" src="https://github.com/Hmittal15/MedIQ-ChatBot/assets/108916132/8922d4f4-5419-48b3-8913-2cbdf3f1e6ba">

#### Step 3. Type a specific phrase into the search bar
Once you're ready to give your own idea a try, type a phrase into the search bar.

#### Step 4. Generate and modify
Select "Generate" when you're satisfied with your search term and DALL-E 2 will produce four different images for you to preview. If you're not satisfied with the images, identify any common denominator between the four and tweak your search phrase accordingly.

For example, you may want to move a subject to the foreground or background, or reorder the terms in the phrase so that one thing comes before the other.

If you're satisfied with the output, select the image that you'd like to save.

<img width="486" alt="Screenshot 2023-08-18 160303" src="https://github.com/Hmittal15/MedIQ-ChatBot/assets/108916132/ae3503a3-c432-4cd2-90df-38d9df546dae">

#### Step 5. Save and share
Once you've made a selection, the image will appear in full size. From here, you can save the image to your DALL-E 2 gallery by selecting the save button on the top right corner. You can either save to your "Favorites" collection, which will constantly appear in a window to the right of your DALL-E 2 workspace, or to a specific collection you've started.

There is also an option to download the artwork in the upper right corner of the image. Your creation will save to your device and you'll be able to send it to friends and family. Or, you can print the photo to have on display.

### Generating creative prompts automatically through code

Now the question arises, how we can efficiently utilize the power of these Generative AI models to generate vivid, beautiful, and varied images? We can automate the process of generating numerous prompts to get images of objects of our choice.

This code generates a series of imaginative prompts inspired by skateboarding scenes. Developed with inspiration from **AISkunks**, the code employs randomness to generate unique scenarios for skateboarding enthusiasts. It utilizes various attributes such as skateboard brands, styles, colors, deck shapes, and themes to craft vivid descriptions. With a collection of predefined scenes including street skating, artistic expression, and futuristic skateboarding, the code produces 100 prompts that allow readers to envision themselves skateboarding in dynamic and captivating settings. Whether it's cruising along urban streets, performing tricks at a skatepark, or embracing artistic designs, these prompts offer a thrilling glimpse into the world of skateboarding adventures.

In [None]:
# Prompt generation for Skateboards (Code inspired from AISkunks)

import random

skateboard_brands = ["Element", "Santa Cruz", "Plan B", "Girl", "Zero", "Creature", "Powell Peralta"]
skateboard_styles = ["Street", "Vert", "Cruiser", "Longboard", "Freestyle", "Park"]
colors = ["Black", "Red", "Blue", "White", "Silver", "Yellow", "Green"]
deck_shapes = ["Popsicle", "Cruiser", "Old School", "Longboard", "Fish", "Penny"]
themes = ["Classic", "Modern", "Retro", "Futuristic", "Artistic"]
riding_styles = ["Urban commuting", "Tricks and flips", "Downhill cruising", "Freestyle tricks"]

Street_Skating = ["City Streets", "Skaters", "Urban Environment", "Staircase", "Rail Grind"]
Cruiser_Adventure = ["Cruising", "Beach Boardwalk", "Sunset", "Cruiser Board", "Relaxing Ride"]
Ramp_Riding = ["Vert Ramp", "Skatepark", "Air", "Skateboarding Tricks"]
Artistic_Expression = ["Artistic Skateboard", "Graffiti Wall", "Artistic Design", "Creative Freestyle"]
Future_Skate = ["Futuristic Electric Skateboard", "Neon Lights", "Futuristic Cityscape", "Innovative Design"]

prompts = []

for _ in range(100):  # Generate 100 prompts
    prompt = ""

    brand = random.choice(skateboard_brands)
    style = random.choice(skateboard_styles)
    color = random.choice(colors)
    deck_shape = random.choice(deck_shapes)
    theme = random.choice(themes)
    riding_style = random.choice(riding_styles)

    category = random.choice([
        Street_Skating, Cruiser_Adventure, Ramp_Riding,
        Artistic_Expression, Future_Skate
    ])

    prompt += f"Imagine yourself skateboarding on a {color} {brand} {style}.\n"
    prompt += f"The {deck_shape} deck provides stability and maneuverability for your ride.\n"
    prompt += f"Customize the skateboard with a {theme} aesthetic that represents your style.\n"
    prompt += f"Visualize an exciting skateboarding session in {random.choice(category)}.\n"
    prompt += f"Consider your preferred riding style: {riding_style}.\n"
    prompt += "Envision the thrill of the ride as you cruise on your skateboard.\n"

    prompts.append(prompt)

# Print the generated prompts
for i, prompt in enumerate(prompts):
    print(f"Prompt {i+1}: {prompt}\n")


Prompt 1: Imagine yourself skateboarding on a Black Zero Park.
The Longboard deck provides stability and maneuverability for your ride.
Customize the skateboard with a Retro aesthetic that represents your style.
Visualize an exciting skateboarding session in Urban Environment.
Consider your preferred riding style: Freestyle tricks.
Envision the thrill of the ride as you cruise on your skateboard.


Prompt 2: Imagine yourself skateboarding on a Blue Santa Cruz Street.
The Cruiser deck provides stability and maneuverability for your ride.
Customize the skateboard with a Classic aesthetic that represents your style.
Visualize an exciting skateboarding session in Creative Freestyle.
Consider your preferred riding style: Urban commuting.
Envision the thrill of the ride as you cruise on your skateboard.


Prompt 3: Imagine yourself skateboarding on a Red Powell Peralta Vert.
The Penny deck provides stability and maneuverability for your ride.
Customize the skateboard with a Modern aesthetic 

### Implementing DALL·E 2 through python script

Below is an implementation of image generation from DALL·E 2 through the OpenAI DALL·E 2 API. To use the DALL-E 2 API, we will need to sign up for an API key on the OpenAI website. Once we have an API key, we can make API calls to generate images. The API is a REST API, which means you can make requests to it using HTTP methods like GET, POST, and DELETE.

__NOTE:__ For the purpose of demonstration, I have stored my OpenAI account's secret API key in a Google drive file. Then I'm importing that JSON file and reading the secret key to establish connection.

In [None]:
#The drive module from the google.colab library is imported to mount Google Drive for access to a JSON file with the OpenAI API key.

from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
#The json module is also imported to read the API key from the JSON file.

import json
with open('/content/gdrive/MyDrive/OpenAI_API_secret.json') as f:
    secrets = json.load(f)

Below Python code defines a class called 'Dalle2_picture_generation' that streamlines image generation using the OpenAI DALL-E 2.0 model. Through methods like 'create_image', it takes prompts and generates images, while 'create_number' handles bulk image generation. The 'create_with_download' method combines image generation and download, and 'action_proceedings' monitors the task's progress. Images can be downloaded using the 'download_image method'. Overall, the class simplifies interfacing with the OpenAI API for generating and managing images, offering an efficient solution for developers.

In [None]:
import base64
import json
import urllib
import urllib.request
import math
import time
from pathlib import Path
import os
import requests
import pandas as pd

pd.options.display.max_colwidth = 1000

class Dalle2_picture_generation():

    #initializes some important parameters, such as the bearer token and batch sizes for inpainting and regular images.
    def __init__(self, bearer):
        self.bearer = bearer
        self.batch_size = 4
        self.inpainting_batch_size = 4
        self.task_sleep_seconds = 2

    #takes in a prompt as an argument and creates an image based on the prompt using the OpenAI API.
    def create_image(self, prompt):
        body = { "task_type": "text2im", "prompt": { "caption": prompt, "batch_size": self.batch_size } }
        return self.action_proceedings(body)

    #takes in a prompt and a number of images to generate and returns a list of generated images.
    def create_number(self, prompt, number_image):
        if number_image < self.batch_size:
            raise ValueError("Please provide more number of images to generate")
        return [self.create_image(prompt) for _ in range (math.ceil (number_image / self.batch_size))]

    #takes in a prompt and an optional image directory and returns a downloaded image.
    def create_with_download(self, prompt, image_dir=os.getcwd()):
        my_image = self.create_image(prompt)
        if not my_image:
            return None
        return self.download_image(my_image, image_dir)

    #takes in a JSON body as an argument, which contains the prompt and other necessary information.
    #returns the task ID. It then sends GET requests to the API to check the status of the task until the task is either completed, failed, or rejected.
    def action_proceedings(self, body):
        openAIUrl= "https://labs.openai.com/api/labs/tasks"
        headers= { 'Authorization': "Bearer " + self.bearer, 'Content-Type': "application/json" }

        response= requests.post (openAIUrl, headers=headers, data=json.dumps(body))
        if response.status_code != 200:
            print(response.text)
            return None
        data= response.json()
        print(f"Task created and respecive ID is: {data['id']}")
        print("Task in progress...")

        while True:
            requested_url = f"https://labs.openai.com/api/labs/tasks/{data['id']}"
            response = requests.get(requested_url, headers=headers)
            data = response.json()

            if data["status"] == "failed":
                print(f"Requested task failed. Please find the details: {data['status_information']}")
                return None

            if data["status"] == "rejected":
                print(f"Requested task got rejected. Please find the details: {data['status_information']}")
                return None

            if not response.ok:
                print(f"Request got failed. Please find the status: {response.status_code}, data: {response.json()}")
                return None

            if data["status"] == "succeeded":
                print("Task completed successfully!")
                return data["generations"]["data"]

            time.sleep(self.task_sleep_seconds)

    #takes in a generated image and an optional directory path and downloads the image to the specified directory.
    #It returns a dictionary containing the URL and path of the downloaded image.
    def download_image(self, my_image, image_dir=os.getcwd()):
        if not my_image:
            raise ValueError("Couldn't create image as data is empty!")

        file_directory = []
        url_directory = []

        for image in my_image:
            url_image = image["generation"]["image_path"]
            file_path = Path(image_dir, image['id']).with_suffix('.webp')
            file_directory.append(str(file_path))
            url_directory.append(str(url_image))
            urllib.request.urlretrieve (url_image, file_path)
            print(f"Image downloaded successfully: {file_path}")

        df= pd.DataFrame(columns=['Image URL','Downloaded image filepath'])
        for i, fd in enumerate(file_directory):
          df.loc[i]= [url_directory[i], file_directory[i]]

        return df

To generate the desired imges, we'll pass on the text prompt as an argument in "create_with_download()" method. This would return web URLs for the generated images under "Image URL" column heading, and would also download the images into local directory.

In [None]:
dalle = Dalle2_picture_generation(secrets['Dalle_API_KEY'])
file_path=dalle.create_with_download("a skunk robot drawing a picture of skunk robot")
file_path

Task created and respecive ID is: task-oF25PRHGDKwzcLOn5YYqfUVM
Task in progress...
Task completed successfully!
Image downloaded successfully: /content/generation-SPC1RvtwqA5RKmIyLSMjquG9.webp
Image downloaded successfully: /content/generation-RuWV1034kvx6TR0KkU6Q1Jgd.webp
Image downloaded successfully: /content/generation-sjqFVVe2u2W9ixxJFhgxFcOF.webp
Image downloaded successfully: /content/generation-GYWi4SBO21Bxeck5Oet5hTAN.webp


Unnamed: 0,Image URL,Downloaded image filepath
0,https://openailabsprodscus.blob.core.windows.net/private/user-C6HPBjjEuSDHAnBvdXYgjma5/generations/generation-SPC1RvtwqA5RKmIyLSMjquG9/image.webp?st=2023-02-26T00%3A36%3A40Z&se=2023-02-26T02%3A34%3A40Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/webp&skoid=15f0b47b-a152-4599-9e98-9cb4a58269f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-02-26T01%3A33%3A08Z&ske=2023-03-05T01%3A33%3A08Z&sks=b&skv=2021-08-06&sig=D%2B7XmjG4QKPzTWdwbTbAUtx0Vg4nGqb1PDCrs4RjVf8%3D,/content/generation-SPC1RvtwqA5RKmIyLSMjquG9.webp
1,https://openailabsprodscus.blob.core.windows.net/private/user-C6HPBjjEuSDHAnBvdXYgjma5/generations/generation-RuWV1034kvx6TR0KkU6Q1Jgd/image.webp?st=2023-02-26T00%3A36%3A40Z&se=2023-02-26T02%3A34%3A40Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/webp&skoid=15f0b47b-a152-4599-9e98-9cb4a58269f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-02-26T01%3A33%3A08Z&ske=2023-03-05T01%3A33%3A08Z&sks=b&skv=2021-08-06&sig=aOfhABjDUnLUFvoEhwU0rJfwpCjiaIzO1RD10Tt/THM%3D,/content/generation-RuWV1034kvx6TR0KkU6Q1Jgd.webp
2,https://openailabsprodscus.blob.core.windows.net/private/user-C6HPBjjEuSDHAnBvdXYgjma5/generations/generation-sjqFVVe2u2W9ixxJFhgxFcOF/image.webp?st=2023-02-26T00%3A36%3A40Z&se=2023-02-26T02%3A34%3A40Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/webp&skoid=15f0b47b-a152-4599-9e98-9cb4a58269f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-02-26T01%3A33%3A08Z&ske=2023-03-05T01%3A33%3A08Z&sks=b&skv=2021-08-06&sig=50MXOvg2GLNvnI/hef7Lp%2B2gG7MlP3c3VOE6LK7afaw%3D,/content/generation-sjqFVVe2u2W9ixxJFhgxFcOF.webp
3,https://openailabsprodscus.blob.core.windows.net/private/user-C6HPBjjEuSDHAnBvdXYgjma5/generations/generation-GYWi4SBO21Bxeck5Oet5hTAN/image.webp?st=2023-02-26T00%3A36%3A40Z&se=2023-02-26T02%3A34%3A40Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/webp&skoid=15f0b47b-a152-4599-9e98-9cb4a58269f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-02-26T01%3A33%3A08Z&ske=2023-03-05T01%3A33%3A08Z&sks=b&skv=2021-08-06&sig=xmztI6HGa%2B3vKn1wJmmX4keyRdNgLKkV9Y6uCkCN7Zw%3D,/content/generation-GYWi4SBO21Bxeck5Oet5hTAN.webp


### Superpowers of generative AI models

The model is trained on a large dataset of text-image pairs and can generate diverse images with different styles and contents.

Here's a list of what superpowers DALL·E has and what can it do:

* __Image synthesis__: One of the key capabilities of DALL·E 2 is its ability to generate new images from textual descriptions. For example, if we provide the model with a description such as "a giraffe reading a book on a sunny day," it can generate a unique, high-resolution image of such a scene. The model is capable of synthesizing images of diverse objects, scenes, and animals, and can generate images with intricate details and textures.
<img width="956" alt="Dall-E_Giraffe_Illustration" src="https://user-images.githubusercontent.com/108916132/217738637-f58cd1bf-7c00-45d9-b88d-9cd70328a3a5.png">

* __Style transfer__: DALL·E 2 can also apply specific styles to existing images. For instance, it can turn a photograph into a painting or a comic book, or make a piece of architecture look like it was made of candy. The model can transfer the style of any reference image to a target image, resulting in a unique and creative output.
<img width="955" alt="Dall-E_Style_Illustration" src="https://user-images.githubusercontent.com/108916132/217738643-468a5649-c227-4ae9-a5ea-7040217cf2a3.png">

* __Image completion__: DALL·E 2 can also complete missing parts of an image based on the surrounding context. For example, if we provide the model with an image of a partially-obscured object, it can fill in the missing parts based on the visible parts and the surrounding context. This capability can be useful in a variety of applications, such as restoring damaged or corrupted images.

Original image:
![hospital](https://user-images.githubusercontent.com/108916132/217738646-40acec67-5526-465a-9c7e-839eae972e1b.png)

Completed image:
<img width="1104" alt="Dall-E_ImageFrameAdd_Illustration" src="https://user-images.githubusercontent.com/108916132/217738638-01c093bc-5951-4da7-92fe-61ef31d7c502.png">

### Use Generative AI models effectively

Here are some tips on how to feed effective prompts to DALLE-2 to get good images:

1. __Be clear and specific:__
<br>DALLE2 works based on the prompts given to it, so it's important to be clear and specific about what we want the image to be. This includes details such as the *style, subject, and colors* we want in the image.

Before: "Generate a picture of a car"
<img width="953" alt="Dall-E_Car_Illustration" src="https://user-images.githubusercontent.com/108916132/221385142-f751a475-f9fc-4f04-be86-24939392bdff.png">

After: "Please generate an image of a red sports car, with a glossy finish and racing stripes"
<img width="952" alt="Dall-E_Car_2_Illustration" src="https://user-images.githubusercontent.com/108916132/221385144-430ab39d-6d6d-4f48-9f2f-c5a17afd17c9.png">

By adding more details to the prompt such as the color, finish, and design of the car, DALLE2 can create a more specific image that matches the requester's expectations.

2. __Use simple language:__
<br>Use simple and straightforward language when writing prompts for DALLE2. Avoid complex or convoluted phrasing, which could confuse the AI and result in suboptimal images.

Before: "Please generate a picture of a big red balloon floating in the sky"
<img width="953" alt="Dall-E_Balloon_Illustration" src="https://user-images.githubusercontent.com/108916132/221385154-104eeaf9-f91b-4f20-b39e-df57ae19d58a.png">

After: "Please generate an image of a large red balloon in the sky, floating gracefully above the clouds"
<img width="953" alt="Dall-E_Balloon_2_Illustration" src="https://user-images.githubusercontent.com/108916132/221385159-65314b17-936d-4880-aa67-f08fc4c1ef35.png">

Simplifying the language and using straightforward instructions can help DALLE2 better understand the prompt and generate more accurate images.

3. __Provide context:__
<br>Provide context for the image we want DALLE2 to create. This could include information on the intended use of the image, the target audience, and any specific requirements or constraints.

Before: "Generate a picture of a cake"
<img width="953" alt="Dall-E_Cake_Illustration" src="https://user-images.githubusercontent.com/108916132/221385167-d200afc4-5881-4c07-bbdf-20c3b451bb5d.png">

After: "Please generate an image of a three-tiered wedding cake, with white frosting and pink flowers, suitable for a traditional wedding reception"
<img width="950" alt="Dall-E_Cake_2_Illustration" src="https://user-images.githubusercontent.com/108916132/221385174-7b379f6e-e6aa-4ef3-9383-ac1fd6f0b3a3.png">

Providing context for the image, such as its intended use for a wedding reception, can help DALLE2 better understand the requirements and generate an image that meets the requester's needs.

4. __Use examples:__
<br>Providing examples of images that we like or want to emulate can help DALLE2 understand what we're looking for. We could include links to images or describe them in detail.

Before: "Generate a picture of a house"
<img width="950" alt="Dall-E_Home_Illustration" src="https://user-images.githubusercontent.com/108916132/221385179-2cab772f-14ff-43ed-b539-f2e0733a9b8e.png">

After: "Please generate an image of a large two-story house, with a wrap-around porch and white picket fence, similar to this picture [link to a reference image]"
<img width="949" alt="Dall-E_Home_2_Illustration" src="https://user-images.githubusercontent.com/108916132/221385183-8d19c785-cf82-4d14-8a7f-6437a9ec20c7.png">

Providing examples or reference images can help DALLE2 better understand the requester's preferences and generate an image that matches their expectations.

5. __Iterate and refine:__
<br>DALLE2 may not always produce the exact image we're looking for on the first try. Try giving different variations of the same prompt and providing feedback to DALLE2 to help it improve its output.

Before: "Generate a picture of a tree"
<img width="949" alt="Dall-E_Tree_Illustration" src="https://user-images.githubusercontent.com/108916132/221385191-33f9adf0-193d-43e4-b600-943db016d6a6.png">

After: "Please generate an image of an oak tree with sprawling branches and golden leaves, as shown in the attached reference image. Could you make the leaves more vibrant and the trunk thicker?"
<img width="950" alt="Dall-E_Tree_2_Illustration" src="https://user-images.githubusercontent.com/108916132/221385196-dfc8bac4-bba7-4d68-9041-345922f1e4f0.png">

Providing feedback and making minor adjustments to the prompt can help DALLE2 generate images that are more closely aligned with the requester's needs and preferences.

DALLE2 is an AI model and can only work with the information given to it in the prompt. By providing clear and specific instructions, we can help it produce high-quality images that meet our needs.

### Applications of generative AI models for image synthesis
Genrative AI tools such as Midjourney, Stable Diffusion, DALL-E, etc., have a wide range of potential applications in various industries, some of which include:

* **Art and Design:**<br>
The ability to generate unique and imaginative images based on textual descriptions opens up new possibilities for artists and designers. For example, an artist could describe a concept they have in mind, and DALL-E could generate an image that captures the essence of that idea. This could be especially useful for digital artists who work with 3D modeling, animation, and other digital media.
<br>

* **Advertising:**<br>
Generative AI models can be used to generate images for advertisements, making it easier for marketers to create engaging and memorable campaigns. For example, a company could describe the product or service they want to advertise, and DALL-E could generate images that capture the essence of that offering. This could be useful for creating visual content for social media, online advertisements, or print materials.
<br>

* **Product Visualization:**<br>
Midjourney can be used to generate images of products, allowing companies to quickly and easily showcase new offerings. For example, a company could describe a new product, and Midjourney could generate images that give consumers an idea of what the product looks like and how it functions. This could be especially useful for companies in the retail or e-commerce industries that need to showcase their products in a visually appealing way.
<br>

* **Movie and Video Game Concept Art:**<br>
We can generate concept art for movies and video games. For example, a movie director or video game designer could describe a scene or character they have in mind, and DALL-E could generate images that capture the essence of that idea. This could be useful for quickly generating initial concept art or for exploring different visual directions for a project.
<br>

* **Web Design:**
AI could be used to generate unique and imaginative images for websites. For example, a web designer could describe the look and feel they want for a particular section of a website, and Stable Diffusion could generate images that match that description. This could be useful for creating custom graphics or backgrounds for websites.
<br>

These are just a few examples of the potential applications of GenAI. The technology is still new, and it’s likely that creative individuals and businesses will find new and innovative ways to use it in the future.

### Assessment Quiz

1. What is the process of image synthesis in generative AI?

a) Recognizing objects in images<br>
b) Enhancing image resolution<br>
c) Generating images from patterns learned from data<br>
d) Converting images to text<br>

_(Correct Answer: c)_

2. What is an essential factor when choosing an image dataset for training a generative AI model?

a) Dataset size and color variety<br>
b) Dataset licensing fees<br>
c) Dataset popularity on social media<br>
d) Dataset compatibility with multiple platforms<br>


_(Correct Answer: a)_

3. How does data augmentation contribute to improving the performance of generative AI models?

a) By increasing the size of the dataset<br>
b) By reducing the diversity of training examples<br>
c) By decreasing the complexity of the model<br>
d) By introducing variations in the training data<br>

_(Correct Answer: d)_

4. What is a text embedding, which is generated by CLIP model in DALL·E 2?

a) A pre-determined set of numbers that represent a text or image<br>
b) A set of numbers acquired through a neural network’s learning process that depict a text or image<br>
c) A type of attention mechanism used in natural language processing and computer vision tasks<br>
d) A method that introduces increasingly more randomness over a series of time steps until an image becomes unrecognizable<br>

_(Correct Answer: b)_

5. What is CLIP model?

a) A type of image embedding used in DALL·E 2<br>
b) A model that generates new image embeddings based on a given text description<br>
c) A type of machine learning model used in natural language processing and computer vision tasks<br>
d) A neural network model that determines the most fitting caption for a given image<br>

_(Correct Answer: d)_

6. In generative AI models, what is the role of an "encoder" network?

a) To generate new data samples<br>
b) To generate adversarial examples<br>
c) To map input data to a latent space<br>
d) To evaluate the quality of generated images<br>

_(Correct Answer: c)_

7. What is the Diffusion model?

a) A method that introduces increasingly more randomness over a series of time steps until an image becomes unrecognizable<br>
b) A type of machine learning model used in natural language processing and computer vision tasks<br>
c) A model that generates new image embeddings based on a given text description<br>
d) A model that learns to create new images or other forms of data by adding noise and then removing it<br>

_(Correct Answer: d)_

8. What is a Transformer?

a) A type of attention mechanism used in natural language processing and computer vision tasks<br>
b) A model that generates new image embeddings based on a given text description<br>
c) A method that introduces increasingly more randomness over a series of time steps until an image becomes unrecognizable<br>
d) A type of machine learning model used in natural language processing and computer vision tasks that transforms input data into a different representation by considering the relationships between different elements in the input data<br>

_(Correct Answer: d)_