# Hands-On with Generative AI

This notebook demostrates how to generate text and images from input prompts. We show how to perform both tasks by calling APIs and with open models that we run locally in this notebook.

Before getting into the examples we will install some packages that we will need.

In [None]:
# Install packages that we will use
!pip install openai==1.59.9
!pip install accelerate==1.2.1
!pip install diffusers==0.32.2

---

## Use large language models to generate text

In [None]:
# Import packages that you will use for accessing the OpenAI API
import json
from google.colab import drive
from openai import OpenAI

You will need an API key to access the [OpenAI API](https://openai.com/index/openai-api). We could load our API key from a file on Google drive. We do this to avoid hardcoding our personal API key in this notebook, which would make it visible to anyone who uses the notebook.

Our API key could be in a file containing a JSON object of the form
```
{
  "api_key": "<MY_API_KEY>"
}
```

Or we could use Google Colab's built in secrets feature. We will use that instead.

In [None]:
# For the Mount Google Drive option uncomment.
# We will get our OpenAI API key from a file that we stored in Google Drive.
# drive.mount("/content/gdrive")
# # Read in API key
# with open("/content/gdrive/MyDrive/OpenAI/keys.json", "r") as f:
#   api_key = json.loads(f.read())["api_key"]

# for the other option
import getpass
import os
from google.colab import userdata

if not os.environ.get("OPENAI_API_KEY"):
    if not userdata.get("OPENAI_API_KEY"):
        os.environ["OPENAI_API_KEY"] = getpass.getpass()
    else:
        os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

We will use the OpenAI API to generate a response from an input prompt. To call the API, we will create an OpenAI client object. We can later use this same client object to generate images using the DALL-E model.

In [None]:
# Create an OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

We can now generate text from a given prompt using a single API call. Here we pass the prompt "do snakes have ears?", then use a GPT model to generate a response.

In [None]:
[
    {'role': 'system', 'content': 'do snakes have ears?'},
    {'role': 'user', 'content': 'do snakes have ears?'},
    {'role': 'assistant', 'content': 'No, snakes do not have external ears like mammals do. Instead, they have inner ears that are responsible for sensing vibrations and sound waves. This allows them to hear low-frequency sounds and detect movements in their environment.'},
    {'role': 'user', 'content': 'XXXX'}
]

In [None]:
model_name = 'gpt-4o-mini'
openai_response = client.chat.completions.create(
    model = model_name,
    messages = [
         {'role': 'user', 'content': 'do snakes have ears?'}
    ]
)

# Print the response
openai_response.choices[0].message.content

In [None]:
openai_response.choices[0].message

---
Now we will download a model to this notebook and generate a response locally from this model. We will use the Hugging Face Transformers package to accomplish this. There are various open models we can use with Transformers. We will use Microsoft's Phi-3-mini model, which is a relatively small (3.8 billion parameters) but capable model.

In [None]:
# Import the Transformers package
from transformers import AutoModelForCausalLM, AutoTokenizer

We will need a tokenizer to convert our text input into a sequence of tokens and a model to generate a response from the provided context tokens.

In [None]:
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct",
                                             device_map="cuda",
                                             torch_dtype="auto",
                                             trust_remote_code=True
                                             )
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

We will evaluate the same prompt as above ("do snakes have ears?") with the Phi-3-mini model. We will first convert our prompt to tokens, generate the output tokens, then convert back to text to view the output.

In [None]:
# Tokenize the input context
messages = [{"role": "user", "content": "do snakes have ears?"}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")

In [None]:
tokenizer.encode("Generative")

In [None]:
tokenizer.decode(3251)

In [None]:
inputs[0]

In [None]:
result = model(inputs.to("cuda"))

In [None]:
# Pass the context to the model and generate an output
outputs = model.generate(inputs.to("cuda"), max_new_tokens=128)

In [None]:
for token in outputs[0]:
  print(token, tokenizer.decode(token))

In [None]:
text = tokenizer.batch_decode(outputs)[0]
text

---

## Use diffusion models to generate images

We can now generate an image from given prompt using a single API call. We will use the same OpenAI API client that we opened before to call OpenAI's DALL-E model. Here we pass the prompt "a realistic photograph of a snake with ears", then use DALL-E 3 to generate an image.

In [None]:
# Use the OpenAI API to generate an Image from DALL-E
response = client.images.generate(
  model="dall-e-3",
  prompt="FRANKENSTEIN",
  size="1024x1024",
  quality="standard",
  n=1,
)

image_url = response.data[0].url

In [None]:
image_url

The DALL-E API generates an image, then provides a URL that can be used to access the image. We pull the image data from this URL below, then display in the notebook.

In [None]:
# Import packages for displaying the generated image
import urllib.request
from io import BytesIO
from PIL import Image

# Load and display the image
with urllib.request.urlopen(image_url) as url:
    img = Image.open(BytesIO(url.read()))
display(img.resize((500, 500)))
#display(img)

---

Now we will download a model to this notebook and generate an image locally from this model. We will use the Hugging Face Diffusers package to accomplish this. There are various open models we can use with Diffusers. We will use Stability AI's Stable Diffusion XL model for this.

The free Colab T4 instance might run out of GPU memory if we run the code below after running the language model. You can restart the runtime before running the code below to free up the GPU memory.

In [None]:
# Check on GPU memory usage
!nvidia-smi

In [None]:
# Import the Diffusers package
from diffusers import DiffusionPipeline
import torch

Below we load the Stable Diffusion XL model into a pipeline that combines all of the components used to generate an image (e.g., a text encoder, a text-conditioned U-Net, a scheduler, and a variational autoencoder).

In [None]:
# Load the Stable Diffusion XL model
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
)
pipe.to("cuda")

Now that our model pipeline is loaded, we will pass the same prompt that we provided to DALL-E above ("a realistic photograph of an astronaut riding a snake") to Stable DIffusion XL.

In [None]:
# Generate an image from a prompt
prompt = "a realistic photograph of an astronaut riding a snake"
image = pipe(prompt=prompt, width=512, height=512, num_inference_steps=50).images[0]
image

In [None]:
pipe