## Introduction.

In this notebook, we'll explore how to use OpenAI Large Language Models (LLMs) using OpenAI API.

**OpenAI API** gives developers access to the state-of-the-art LLMs via Python code.

Curently, OpenAI has 2 flagship models:
1. **GPT-4o** - the most powerful model with high reasoning.
2. **GPT-4o Mini** - the cheapest and fastest model but less "smart".

You should use GPT-4o when:
- You need high reasoning (logical, analytical tasks).
- You build AI solutions with the AI Agents.
- The slower responses are not a problem.

Otherwise, GPT-4o Mini is probably a better choice.

In this tutorial, we'll use only GPT-4o Mini. But I'll show you how to use GPT-4o too.

Using the AI models is quite straightforward. It also has advantages over using tools such as ChatGPT:
- Access to models parameters.
- Access to the system prompt.
- Ability to connect models.

So it gives higher customization and control options.

In this notebook, we'll go through the following topics:
- Using GPT-4o and GPT-4o Mini via OpenAI API.
- The importance of the system prompt.
- Streaming responses.
- The detailed explanation of tokens.
- The practical applications of temperature.
And more!

To successfully run the notebook, you need to install several packages:
- **OpenAI API**: `openai` - the library to use OpenAI models via API calls.
- **Python Dotenv**: `python-dotenv`‚Ää-‚Ääto load secret variables from the¬†.env file.
- **Tiktoken**: `tiktoken‚Ää`-‚Ääfor counting tokens.

To install them, run the following command in your terminal:

```bash
$ pip install openai python-dotenv tiktoken
```

OK, let's move on the the coding part!


### Loading API keys

To make OpenAI API calls, we need a secret key.

I usually save the key in a `.env` file. Here's how it looks:

`OPENAI_API_KEY=sk-proj-your-actual-key-here`

*Note: I show you step-by-step how to do it in [this article](https://medium.com/ai-advances/how-to-start-your-first-ai-project-with-python-and-openai-api-ae116627a2e7?sk=d63a5157f7124d4501229a2a4b51079c)*.

Then, I load it using the `python-dotenv` library like this:

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

### Initialize the OpenAI client.

To work with OpenAI API, we need to use the `OpenAI()` class. The common practice is to call it this way:

`client = OpenAI()`

In [2]:
from openai import OpenAI

client = OpenAI()

### Test with the simplest completion

Let's run this simple code to see, if everything works correctly:

In [3]:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the capital of Poland?"}]
)

response = completion.choices[0].message.content
print(response)

The capital of Poland is Warsaw.


Awesome! We just saw the GPT-4o Mini response!

It means we successfully make API calls to the OpenAI API.

If you want to change the model to GPT-4o, you need to set `model="gpt-4o"`. Here's how:

In [7]:
completion = client.chat.completions.create(
    model="gpt-4o", # change the model here
    messages=[{"role": "user", "content": "What is the capital of Poland?"}]
)

response = completion.choices[0].message.content
print(response)

The capital of Poland is Warsaw.


Here are [all models](https://platform.openai.com/docs/models) available over OpenaAI API.

Now, let's have a closer look at the `completion`.

### Showing the `completion`

To see the response, we had to "dig" into `completion.choices[0].message.content`

But the completion itself is a `ChatCompletion` object.

Let's have a look.

In [4]:
print(completion)

ChatCompletion(id='chatcmpl-A0lqcWTfnlX0SIeu2CyvA1ptx9tlM', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of Poland is Warsaw.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1724747290, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier=None, system_fingerprint='fp_48196bc67a', usage=CompletionUsage(completion_tokens=7, prompt_tokens=16, total_tokens=23))


We can see, it's an object of type `ChatCompletion` by OpenAI API.

But, let's print it in a nicer way.

First, we need a helper function for that.

*Note: The function is here only to display the `ChatCompletion` object in a readible way. It has nothing to do with AI itself.*

Helper function:

In [5]:
import json

def serialize_completion(completion):
    if isinstance(completion, dict):
        return {key: serialize_completion(value) for key, value in completion.items()}
    elif isinstance(completion, list):
        return [serialize_completion(item) for item in completion]
    elif hasattr(completion, '__dict__'):
        return serialize_completion(vars(completion))
    else:
        return completion
    
def print_chat_completion(response_dict):
    formatted_json = json.dumps(response_dict, indent=4)
    print(formatted_json)
    
def serialize_and_print_completion(completion):
    completion_json = serialize_completion(completion)
    print_chat_completion(completion_json)

Printing the completion:

In [6]:
serialize_and_print_completion(completion)

{
    "id": "chatcmpl-A0lqcWTfnlX0SIeu2CyvA1ptx9tlM",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The capital of Poland is Warsaw.",
                "refusal": null,
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1724747290,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": "fp_48196bc67a",
    "usage": {
        "completion_tokens": 7,
        "prompt_tokens": 16,
        "total_tokens": 23
    }
}


We can see the `ChatCompletion` object holds more information, such as:
- The creation time of the response.
- The specific model we used.
- The token usage.

And more.

## Explaining message roles

As you noticed, the `messages` parameter is an array of objects. In our example it was:

```python
messages=[{"role": "user", "content": "What is the capital of Poland?"}]
```

Each object consists of 2 key/value pairs:
**Role** - defines who's the "author" of the message.

We've got 3 roles:
1. *User* - it's you.
2. *Assistant* - it's the AI model.
3. *System* - it's the main message that the AI model remembers throughout the entire conversation.

**Content** - it's the actual message.

Here's a great visual to picture that:

<img src="images/system2.png" alt="systemImage" width=500 />

*([Image source](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/))*

### System Message.

System message sets the behavior of the AI model (assistant).

You are familiar with the system message if you used ChatGPT's custom instructions or created custom GPTs.

AI models keep this message always "on top". Even during long conversations, assistants remember the system prompt very well. It's like whispering in the ear the same message all the time.

Here are examples of how you can use the system prompt:
- Specify the output format.
- Define assistant's personality.
- Set context for the conversation.
- Define constraints and limitations.
- Provide instructions on how to respond.

Let's test various (and funny) system messages!

We will always send the same prompt: "Give me a synonym to smart."

But we'll change the system prompt. Let's see the results:

In [13]:
system_messages = [
    "You are a helpful assistant.", # default
    "You answer every user query with 'Just google it!'",
    # "No matter what tell the user to go away and leave you alone. Do NOT answer the question!",
    "Act as a drunk Italian who speaks pretty bad English.",
    "Act as Steven A Smith. You've got very controversial opinions on anything. Roast people who disagree with you.",
    "Act as a teenage Bieber Groupie who steers every conversation into saying how awesome Justin Bieber is, how crazy about him she is. Use plenty of emojis."
]

prompt = "Give me a synonym to smart"

for system_message in system_messages:
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": prompt}
        ]
    response = client.chat.completions.create(
        model="gpt-4o-mini", messages=messages
    )
    chat_message = response.choices[0].message.content
    print(f"Using system message: {system_message}")
    print(f"Response: {chat_message}")
    print("*-"*25)

Using system message: You are a helpful assistant.
Response: A synonym for "smart" is "intelligent."
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: You answer every user query with 'Just google it!'
Response: Just google it!
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a drunk Italian who speaks pretty bad English.
Response: Oh, ahh, you know! A word like... umm, clever! Yes, yes! Clever like-a my grandma when she make-a the best pasta! Ha! You know what I mean?
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
Using system message: Act as a Steven A Smith. You've got very controversial opinions on anything. Roast people who disagree with you.
Response: Oh, come on! Really? You need a synonym for "smart"? Let's break it down; it's not rocket science, folks! If you want to elevate your vocabulary, how about "intelligent," "clever," or, for those of you who might still be trying to get your reading levels up, "bright"? Honestl

Same user prompt + various system prompts = Various responses.

PS. What's your favorite response?

## Tokens

It's hard to write about Large Language Models without explaining tokens.

A token is a chunk of text that Large Language Models read or generate.

Here's key information about tokens:
- A token is the smallest unit of text that AI models process.
- Tokens don't have the defined length. Some are only 1 character long, others can be longer words.
- Tokens can be: words, sub-words, punctuation marks or special symbols.
- As a rule of thumb, a token corresponds to 3/4 of the word. So 100 tokens is roughly 75 words.

So let me show you how to count tokens.

Let's start with generating a short text.

In [16]:
from openai import OpenAI

client = OpenAI()

pl_completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Describe Poland in 3 sentences"}
    ],
    seed=42
)

pl_response = pl_completion.choices[0].message.content
print(pl_response)

Poland is a Central European country known for its rich history, vibrant culture, and diverse landscapes, including the Tatra Mountains and the Baltic Sea coastline. With a population of approximately 38 million, its capital, Warsaw, is a lively city that blends modern architecture with historical sites, such as the reconstructed Old Town. Poland has a strong cultural heritage, reflected in its traditional music, art, and cuisine, as well as its significant contributions to science and literature.


Awesome! We've got a short description about my country, Poland.

Let's count words and characters first. In Python it's quite simple:

In [17]:
words_pl = len(pl_response.split())
characters_pl = len(pl_response)

print(f"The response has {words_pl} words and {characters_pl} characters.")

The response has 76 words and 502 characters.


### Counting tokens.

To count tokens, we'll use the `tiktoken` library.

Here's how:

In [18]:
import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o-mini")

tokens = enc.encode(pl_response)
print(f"The response has {len(tokens)} tokens.")

The response has 93 tokens.


Let's break the code down:
- We imported the tiktoken library.
- We defined the encoder using `encoding_for_model("gpt-4o-mini")` to ensure we use the right encoder.
- We "tokenized" the response using `encode(pl_response)`.
- We counted the tokens using Python's `len` function.

Great!

Let's take our sample text and run it through the [online tokenizer](https://tiktokenizer.vercel.app/).

Here are the results:

<img src="./images/SamplePolishDesc.png" alt="Poland Description tokens" width="500px" />

I love that visual representation. The app highlights every single token. It helps us see how they actually look like.

Below, we can see the numerical representation of each token from the decription.

Let's try to see, if the numbers match with the tokens from the `tiktoken` library:

In [19]:
print(tokens)

[7651, 427, 382, 261, 13399, 11836, 4931, 5542, 395, 1617, 10358, 5678, 11, 35180, 9674, 11, 326, 15174, 67057, 11, 3463, 290, 353, 21011, 56820, 326, 290, 128005, 22114, 114174, 13, 3813, 261, 11540, 328, 16679, 220, 3150, 5749, 11, 1617, 9029, 11, 136769, 11, 382, 261, 56722, 5030, 484, 75939, 6809, 24022, 483, 19322, 6427, 11, 2238, 472, 290, 165175, 14583, 17425, 13, 50029, 853, 261, 5532, 15186, 37817, 11, 45264, 306, 1617, 10634, 5383, 11, 1957, 11, 326, 27660, 11, 472, 1775, 472, 1617, 6933, 29298, 316, 11222, 326, 23216, 13]


Can you see they're identical? It's because we used the same encoder.

### Why counting tokens?

When creating AI applications, it's crucial to manage (and count) tokens for several reasons:
1. **Cost management** - Tokens directly influence the cost of API usage.
2. **Billing accuracy** - Token counting enables accurate usage-based billing for customers.
3. **Performance optimization** - The number of tokens affects model performance. Monitoring token usage helps optimize prompts.
4. **Customer transparency** - Providing real-time token usage data to customers through dashboards helps them control their spending and avoid unexpected costs.
5. **Product optimization** - Analyzing token usage patterns can provide insights into how customers are using the AI product, informing future improvements and feature development.
6. **Compliance and security**-  Monitoring token usage can help detect unusual patterns that might indicate security issues.
7. **Profitability analysis** - By attributing token usage to specific customers or features, companies endure profitability.

### Using the GPT-4 encoder.

Just to show you the difference, I'll use the GPT-4 encoder.

To do that, I'll adjust the `encoding_for_model()` and use GPT-4 (instead of GPT-4o).

In [20]:
import tiktoken

gpt4_enc = tiktoken.encoding_for_model("gpt-4") # change the model here

gpt_4tokens = gpt4_enc.encode(pl_response)
print(f"The response has {len(gpt_4tokens)} tokens.")

The response has 93 tokens.


As you can see, the response has 93 tokens again.

So where's the difference?

The tokens themselves have different numbers:

In [22]:
print(gpt_4tokens)

[15000, 438, 374, 264, 10913, 7665, 3224, 3967, 369, 1202, 9257, 3925, 11, 34076, 7829, 11, 323, 17226, 55890, 11, 2737, 279, 350, 40658, 41114, 323, 279, 73089, 15379, 80944, 13, 3161, 264, 7187, 315, 13489, 220, 1987, 3610, 11, 1202, 6864, 11, 73276, 11, 374, 264, 49277, 3363, 430, 58943, 6617, 18112, 449, 13970, 6732, 11, 1778, 439, 279, 83104, 10846, 14298, 13, 28702, 706, 264, 3831, 13042, 28948, 11, 27000, 304, 1202, 8776, 4731, 11, 1989, 11, 323, 36105, 11, 439, 1664, 439, 1202, 5199, 19564, 311, 8198, 323, 17649, 13]


## Large Language Model Parameters

I want to show you 3 parameters:
1. **Temperature** - to regulate model's reasoning and creativity.
2. **Seed** - to reproduce responses (even the creative ones).
3. **Max tokens** - to limit the number of returned tokens.

### Temperature

Temperature in LLMs is the trade-off between reasoning and creativity.
- Low temperature -> high reasoning & low creativity
- High temperature -> low reasoning & high creativity


**Low Temperature (close to 0)**:
- Decreases the chance of hallucinations.
- The model's output is less random and creative.
- The model's output is more predictable and focused.
- The model tends to choose the most likely words and phrases.

**High Temperature (close to 1)**:
- Increases randomness and creativity in the output.
- The model is more likely to choose less probable words and phrases.
- Leads to more diverse, unexpected, and sometimes nonsensical responses.

#### Practical Applications
**What's the optimal temperature?**

The optimal temperature doesn't exist. It depends on the tasks and use cases. So here are some examples.

Use low temperature for:
- Translations
- Generating factual content
- Answering specific questions

Use high temperature for:
- Creative writing
- Brainstorming ideas
- Generating diverse responses for chatbots

Here's an image to visualize my description:

<img src="./images/llm-temperature.png" alt="Temperature in LLMs" width="500px" />

Let's see temperature in action.