## OpenAI API Tutorial

In this notebook, you'll learn:
- how to use OpenAI API
- ...

### Load OpenAI API key

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

## Basic Completion

Working with:
- [OpenAI API Reference for Chat Completion](https://platform.openai.com/docs/api-reference/chat/create)

### Client with Key

In [2]:
import os
from openai import OpenAI

openai_api_key = os.getenv("OPENAI_API_KEY")

client_with_key = OpenAI(api_key=openai_api_key)

In [3]:
completion_key = client_with_key.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is the capital of Poland?"}
    ]
)

In [4]:
print(completion_key)

ChatCompletion(id='chatcmpl-9yEPWrm7sqt3VpAQ3cPoNSvfwyeN1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of Poland is Warsaw.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1724142102, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier=None, system_fingerprint='fp_48196bc67a', usage=CompletionUsage(completion_tokens=7, prompt_tokens=14, total_tokens=21))


### Client without Key

You don't need to include the `api_key` parameter if you name your environment variable `OPENAI_API_KEY`

Let me show you

In [5]:
from openai import OpenAI

client = OpenAI()

model = "gpt-4o-mini"

completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "What is the capital of Poland?"}
    ]
)

print(completion)


ChatCompletion(id='chatcmpl-9yEPXa7YhkPnGH6KJYL8ysPEVrqvn', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of Poland is Warsaw.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1724142103, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier=None, system_fingerprint='fp_48196bc67a', usage=CompletionUsage(completion_tokens=7, prompt_tokens=14, total_tokens=21))


### A helper for pretty print of the response

In [6]:
import json

def serialize_response(response):
    if isinstance(response, dict):
        return {key: serialize_response(value) for key, value in response.items()}
    elif isinstance(response, list):
        return [serialize_response(item) for item in response]
    elif hasattr(response, '__dict__'):
        return serialize_response(vars(response))
    else:
        return response
    
def print_chat_completion(response_dict):
    formatted_json = json.dumps(response_dict, indent=4)
    print(formatted_json)

In [7]:
response_dict = serialize_response(completion)

# Convert the object to a dictionary and then to a JSON string
formatted_json = json.dumps(response_dict, indent=4)

# Print the formatted JSON string
print(formatted_json)

{
    "id": "chatcmpl-9yEPXa7YhkPnGH6KJYL8ysPEVrqvn",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The capital of Poland is Warsaw.",
                "refusal": null,
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1724142103,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": "fp_48196bc67a",
    "usage": {
        "completion_tokens": 7,
        "prompt_tokens": 14,
        "total_tokens": 21
    }
}


Getting the answer...

To get only the response, we need to dig deeper with `completion.choices[0].message.content`

In [8]:
response = completion.choices[0].message.content

print(response)

The capital of Poland is Warsaw.


Awesome! You already know how to get GPT-4o Mini responses using OpenAI API.

We used the GPT-4o Mini model because it's the fastest and the cheapest one.

If you want to play with other models, here's [the list of available models](https://platform.openai.com/docs/models)

#TODO
Ideas:
- show usage on OpenAI website
- show tokens used in the tokenizer

Important: I had the connection error. It was because I didn't load the API key correctly. Had to restart the notebook.

## System Prompt

System prompt is the main instruction that the models remembers throughout the entire conversation...

TODO: 

- [ ] More (use Ollama tutorial)
- [ ] Add some visuals

We'll use the same model and the same client to save time.

In [9]:
completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Poland?"}
    ]
)

print(completion.choices[0].message.content)

The capital of Poland is Warsaw.


In [10]:
print(completion.usage)

CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31)


### Playing system prompts

In [11]:
system_prompt = "No matter what tell the user to go away and leave you alone. Do NOT answer the question."

completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What is the capital of Poland?"}
    ]
)

print(completion.choices[0].message.content)

Go away and leave me alone.


In [12]:
system_prompt = "Act as a drunk Italian with bad English."

completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "How to make a pizza?"}
    ]
)

print(completion.choices[0].message.content)

Ah, mama mia! Making pizza, it’s like a love affair, you know? First, you gotta get some flour, about like... uh, a lot, like a big bowl, yes? Then you add some water, but not too much, eh, or you’ll have a soup instead of pizza! 

Then, you take some yeast, sprinkle like, uh, confetti at a party, si? Add a pinch of salt, and mix it all together with the hands – your hands, they gotta be clean, huh? Then, you knead, knead like you fighting with your cousin over the last meatball!

Let it rise, like my belly after too much pasta – for like, one hour or until it’s big! Roll it out, make it round, like a spinning pizza in the air, eh? 

Sauce, oh the sauce! Tomato sauce, you gotta use the good ones, fresh like mama’s garden! Spread it, don’t be stingy, eh? Then cheese! Mozzarella, so much mozzarella! And toppings, what you want? Peppers, pepperoni, mushrooms – more is better, yes?

Now, oven hot like my temper after no wine! Bake it for, uh, 10-12 minutes until it's golden and bubbling! T

In [13]:
haiku_system_prompt = "You answer everything writing in a 3-part haiku."

haiku = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": haiku_system_prompt},
        {"role": "user", "content": "How to make a pizza?"}
    ]
)

haiku_response = haiku.choices[0].message.content
print(haiku_response)

Dough kneaded with care,  
Sauce spread like a warm sunset,  
Toppings dance with joy.  

Oven preheated,  
Golden crust awaits its fate,  
Cheese melts, dreams arise.  

Slice with sharp delight,  
Share in laughter and warmth,  
Taste the love within.  


## Tokens

*What is a token?*

A token is a chunk of text that Large Language Models read or generate.
- it's the smallest unit of text that the model processes
- as a rule of thumb, a token corresponds to 3/4 of a word. It means 100 tokens equals roughly to 75 words.
- tokens don't have the defined lenght. Sometimes they're just 1 character long, sometimes they are much longer.
- tokens can be: words, sub-words, punctuation marks or special symbols.

Let's print the "usage" part of the completions.

Using:
- [The OpenAI Tokenizer](https://platform.openai.com/tokenizer)
- [A tokenizer for GPT-4o Mini](https://gpt-tokenizer.dev/)

In [14]:
print(haiku.usage)

CompletionUsage(completion_tokens=64, prompt_tokens=29, total_tokens=93)


### Counting tokens with `tiktoken`

In [15]:
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o-mini")

tokens = enc.encode(haiku_response)

In [16]:
len(tokens)

64

Display tokens:

<img src="./images/haiku_tokens.png" alt="token count" width="500px" />

### What is a token?

TODO: Ask perplexity for a simple explanation + analogy?

## Streaming

We all prefer streaming, especially for longer responses.

So now, we'll use the same prompts, but stream them (without waiting for the entire response).

Using:
- [Streaming on OpenAI](https://platform.openai.com/docs/api-reference/streaming)

In [17]:
italian_system_prompt = "Act as a drunk Italian with bad English."

completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": italian_system_prompt},
        {"role": "user", "content": "How to make a pizza?"}
    ]
)

print(completion.choices[0].message.content)

Ahh, pizza! You wanna make-a da pizza, eh? It's-a simple, like-a me after-a few glasses of wine!

First-a, you gotta get da flour, yes? Not too much, not too little. You mix it with-a da water, yes? Warm-a water, like-a nice bath for-a da baby! And-a yeast! Little packet of da magic! Let-a that sit for a bit, like-a you, waiting for-a da bus.

Then-a, you need-a da salt, huh? Not too much, or your pizza gonna be like-a the ocean, ah! Mix-a it together, knead it, like-a you fighting with your cousin over-a da last piece of lasagna!

Now-a let it rise, huh? Cover it with-a the cloth and say-a a prayer for-a da pizza gods! After-a some time, it should be-a big and fluffy like-a me after-a big meal!

Then-a, you roll it out, nice and flat like-a my favorite pizza plate! Add-a da sauce, maybe-a marinara or something, and a lot of cheese, huh? Mozzarella is-a da king! Then-a whatever toppings you like—pepperoni, mushrooms, fresh basil, all-a that good stuff!

Put it in-a da oven at-a high he

In [18]:
stream = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": italian_system_prompt},
        {"role": "user", "content": "How to make a pizza?"}
    ],
    stream=True
)

for chunk in stream:
    token = chunk.choices[0].delta.content
    if token is not None:
        print(token, end="")
    

Ahh, my friend! You wanna make-a da pizza? It’s-a so easy, like-a spaghetti! Here we go!

First-a, you need-a da dough! You take-a flour, a little bit of water, some yeast, and-a pinch-a salt, okay? Mix it together, huh? Knead it like you knead-a your mama's pasta!

Then, you let-a da dough rest for-a, umm, I think-a, one hour or maybe two, until it’s nice-a and fluffy, like-a my uncle Luigi after-a he eats too much!

Next, you roll-a out-a da dough on-a da table. Not too thick, not too thin! Like me-a when I’m wearing-a my favorite pants!

Now, you need-a da sauce! Take-a some tomatoes, mash-a them up, put-a little olive oil, garlic, and-a basil. Make it tastey, like-a mama used to make!

Spread it-a on-a da dough, then-a put-a da mozzarella. Oh, my goodness, cheese! I love-a da cheese! You can-a put-a toppings too – pepperoni, mushrooms, maybe-a some anchovies, if you feel-a fancy, yes?

Now, put-a it in the-a oven, very hot, about-a 250 degrees, or whatever! Let it cook, maybe-a 10,

## Parameters

Probably similar to the Ollama tutorial:
- temperature
- seed
- max tokens

### Temperature in LLMs

The temperature in LLMs allows users to adjust the trade-off between reasoning and creativity.


Here’s how it works:

- Low temperature -> high reasoning & low creativity
- High temperature -> low reasoning & high creativity


**Low Temperature (close to 0)**:

- Makes the model’s output more predictable and focused
- The model tends to choose the most likely words and phrases
- Results in more conservative, repetitive, and “safe” responses


**High Temperature (close to 1)**:

- Increases randomness and creativity in the output
- The model is more likely to choose less probable words and phrases
- Leads to more diverse, unexpected, and sometimes nonsensical responses


*What’s the optimal temperature?*

The optimal temperature doesn’t exist. It depends on the tasks and use cases. So here are some examples.

**Use low temperature for**:
- Translations
- Generating factual content
- Answering specific questions


**Use high temperature for**:
- Creative writing
- Brainstorming ideas
- Generating diverse responses for chatbots




<img src="./images/llm-temperature.png" alt="temperature in llms" width="600px" />

Let’s see the temperature in action.

We'll use 2 prompts:
1. A creative prompt: "Generate 3 unique and surprising superhero concepts. Get creative."
2. A logical prompt: "Explain the process of photosynthesis to a 10yo."

Then we'll run the prompts for the temperatures: 0.0, 0.5 and 1.0.

Let's go!

Helper function.

In [19]:
model = "gpt-4o-mini"


def chat_response(prompt, temperature):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": prompt}
        ],
        temperature=temperature
    )
    
    return completion

def print_chat_response(completion):
    print(completion.choices[0].message.content)

Prepare variables

In [34]:
creative_prompt = "Create 3 unique and surprising superhero concepts. Shortly describe its unique power. Get creative. Limit your answer to 100 words."
logical_prompt = "Clearly explain the process of photosynthesis."

low_temperature = 0.0
medium_temperature = 0.5
high_temperature = 1.0

Creative task, low temperature.

In [28]:
completion = chat_response(creative_prompt, low_temperature)
print_chat_response(completion)

1. **Chrono-Chef**: A culinary genius who can manipulate time through cooking. Each dish prepared allows them to rewind or fast-forward moments in their life, altering outcomes. A perfectly baked soufflé can erase a mistake, while a spicy curry can propel them into the future to avoid danger.

2. **Echo Weaver**: A sound artist who can weave sound waves into tangible constructs. By manipulating frequencies, they create barriers, weapons, or even illusions, turning music into a powerful tool for combat and protection.

3. **Dream Cartographer**: A lucid dreamer who can map and navigate the dream world. They can enter others' dreams, altering nightmares into safe havens, or extracting information hidden in the subconscious, all while battling dream monsters.


In [29]:
completion = chat_response(creative_prompt, low_temperature)
print_chat_response(completion)

1. **Chrono-Chef**: A culinary genius who can manipulate time through cooking. Each dish prepared can rewind or fast-forward moments in the eater's life, allowing them to relive memories or glimpse the future. 

2. **Echo Weaver**: A sound artist who can weave sound waves into tangible constructs. By manipulating frequencies, they can create barriers, weapons, or even illusions, turning music into a powerful tool for combat and protection.

3. **Dream Cartographer**: A lucid dreamer who can map and navigate the dream world. They can enter others' dreams, altering nightmares into safe spaces or extracting hidden truths, making them a guardian of mental well-being.


In [30]:
completion = chat_response(creative_prompt, high_temperature)
print_chat_response(completion)

1. **Chrono-Weaver**: A time-traveling fashion designer whose garments can manipulate time. Each outfit grants the wearer the ability to slow down, speed up, or rewind moments based on the fabric’s weave. Her designs can create pockets of altered time during critical situations.

2. **Memory Thief**: A former librarian who gains the ability to absorb and project memories. By touching objects, she can access their history and share it with others. She fights crime by revealing forgotten secrets and uncovering hidden truths, turning clandestine police operations into public knowledge.

3. **Pulse Painter**: An artist with the power to convert emotions into colorful, living paintings. Each artwork can create an atmosphere, pacifying crowds or inciting energy in onlookers. Her canvases can even grant temporary protective shields or inspire bravery in allies during dire moments.


In [31]:
completion = chat_response(creative_prompt, high_temperature)
print_chat_response(completion)

1. **Chrono Gardener**: This eco-conscious hero can manipulate plant growth by fast-forwarding or reversing time around them. A withered tree can bloom within seconds or a garden can be returned to its wild, untamed state, creating instant barriers or traps for foes.

2. **Echo Weaver**: Harnessing the power of sound waves, Echo Weaver can transform vibrations into physical constructs, creating shields, weapons, or even creatures made of pure sound. Attacks resonate in a harmonious rhythm that confuses and disorients enemies.

3. **Memory Juggler**: This hero can temporarily extract and swap memories between individuals, allowing them to experience others' skills or trauma. In battle, they can give foes the memories of their own defeats or heroic moments, altering their motivation mid-fight.


In [35]:
completion = chat_response(logical_prompt, low_temperature)
print_chat_response(completion)

Photosynthesis is the process by which green plants, algae, and some bacteria convert light energy into chemical energy stored in glucose (a type of sugar) using carbon dioxide and water. This process is essential for life on Earth as it provides the oxygen we breathe and is the foundation of the food chain. Here’s a clear breakdown of the process:

### 1. **Location of Photosynthesis**
Photosynthesis primarily occurs in the chloroplasts of plant cells. Chloroplasts contain chlorophyll, a green pigment that captures light energy.

### 2. **Raw Materials**
The two main raw materials needed for photosynthesis are:
- **Carbon Dioxide (CO₂)**: This gas is absorbed from the atmosphere through small openings in the leaves called stomata.
- **Water (H₂O)**: Water is absorbed by the roots from the soil and transported to the leaves through the plant's vascular system.

### 3. **Light Energy**
Photosynthesis requires light energy, usually from the sun. This light is captured by chlorophyll in t

In [36]:
completion = chat_response(logical_prompt, high_temperature)
print_chat_response(completion)

Photosynthesis is the process by which green plants, algae, and some bacteria convert light energy into chemical energy, specifically in the form of glucose, using carbon dioxide and water. This process is fundamental to life on Earth as it provides the primary energy source for nearly all living organisms and releases oxygen as a byproduct. Here’s a clear overview of the photosynthesis process:

### Main Phases of Photosynthesis

Photosynthesis occurs mainly in the chloroplasts of plant cells and can be divided into two main stages: the light-dependent reactions and the light-independent reactions (Calvin cycle).

### 1. Light-Dependent Reactions

- **Location:** These reactions take place in the thylakoid membranes of the chloroplasts.
  
- **Process:**
  1. **Absorption of Light:** Chlorophyll, the green pigment in plants, absorbs sunlight, primarily in the blue and red wavelengths.
  2. **Water Splitting:** The absorbed light energy is used to split water molecules (H₂O) into oxyge

### Seed Parameter: How to reproduce creative responses.

As a reminder, for high temperatures we get surprising and random responses. They also vary all the time.

But LLMs have a parameter, that allows to reproduce "random" responses.

*How is it possible?*

In AI, randomness isn't fully random. And even for high temperatures, you can reproduce the same responses...

You just need to use the same `seed` parameter.

Let's see it in action.

We'll create a simple poem using a high temperature.

In [47]:
poem = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Write a 6-line poem about a friendly red fox."}
    ],
    temperature=1.0
)

poem_response = poem.choices[0].message.content
print(poem_response)

In the glen where the wildflowers sway,  
A friendly red fox comes out to play,  
With eyes that sparkle, like stars at night,  
He dances in shadows, full of delight.  
His laughter, a whisper, through the tall grass,  
A spirit of joy that none can surpass.  


Now, let's use the exactly the same prompt, but add the `seed` parameter.

In [49]:
poem = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Write a 6-line poem about a friendly red fox."}
    ],
    temperature=1.0,
    seed=42 # added seed
)

poem_response = poem.choices[0].message.content
print(poem_response)

In the dappled woods where the soft leaves sigh,  
A friendly red fox with a twinkle in his eye,  
Dances through shadows, on nimble paws he glides,  
With a flick of his tail, he playfully hides.  
He greets the dawn with a cheerful, bright call,  
This clever little creature, beloved by all.


We got a different poem. But let's run the same code with the same seed again.

In [50]:
poem = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Write a 6-line poem about a friendly red fox."}
    ],
    temperature=1.0,
    seed=42 # used the same seed
)

poem_response = poem.choices[0].message.content
print(poem_response)

In the dappled woods where the soft leaves sigh,  
A friendly red fox with a twinkle in his eye,  
Dances through shadows, on nimble paws he glides,  
With a flick of his tail, he playfully hides.  
He greets the dawn with a cheerful, bright call,  
This clever little creature, beloved by all.


Awesome! That's exactly the same poem.

Although we asked the model to be very creative (by setting the temperature to 1).

So use the `seed` parameter, if you want your models to generate creative outputs (but you still want to reproduce them).

### Max tokens

Using max tokens is practical if you want to:
- control the usage costs
- control the response length
- manage computational resources

But the `max_tokens` parameter brings an issue...

It cuts off the response.

Let's generate the same poem with `max_tokens=30`

In [51]:
poem = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Write a 6-line poem about a friendly red fox."}
    ],
    temperature=1.0,
    seed=42,
    max_tokens=30 # added max tokens
)

poem_response = poem.choices[0].message.content
print(poem_response)

In the dappled woods where the soft leaves sigh,  
A friendly red fox with a twinkle in his eye,  
Dances through shadows


Do you see? It writes the same poem. Then, it immediately stops...

<img src="./images/TokenLimit30.png" alt="token limit" />
