# How to use an OpenAI Chat model
In this notebook we look into:
1. The basic on how to use an OpenAI model (chatGPT models) with a few
lines of code
2. Which settings you can play with to tune the behaviour of the model on your use case.

**Table of content**
>[OpenAI Setup](#scrollTo=7NJMLISB0P3K)

>[Simple inference with OpenAI Chat Model](#scrollTo=DyVzBkA2wdHx)

>[Advanced Options](#scrollTo=LHvj27ypwf2w)

>[Streaming](#scrollTo=NEvt9LQPlWFe)



## OpenAI Setup

In [1]:
import openai

In order to use OpenAI models, you'll need create an api key and configure it in your Google Colab Secrets.
1. Create an openai api key from [here](https://platform.openai.com/settings/organization/api-keys) (you'll need an account on the OpenAI platform, but no need of a ChatGPT subscription).
2. Open your Colab secrets (click on the key icon here on the left)
3. Give a the name, for instance `OPENAI_API_KEY`, and past the value in `Value`.
4. Toggle `Notebook access` to give access to this specific notebook to the API key.


ðŸ”‘ Note that this api key will now be available in your secrets everytime you open or create a new colab notebook. You'll however still need to grant explicit access to each notebook.


ðŸ’¸ Using an OpenAI model you will get charged! Use a small and cheap model for testing and learning like `gpt-4o-mini` then switch to a better model if needed for more complex tasks.

In [2]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

## Simple inference with OpenAI Chat Model
Text generation is very simple. You need to create an OpenAI `client` object. You need to call the `.chat.completions.create()` and pass **the two most important arguments**:

- ðŸ§  `model` the large language model being used.
- ðŸ’¬ `messages` the list of system prompt (optional), user message, and AI assistant responses.

#### OpenAI Models
I recommend testing models in the following order (from cheaper and least capable to most expensive and capable):
1. `gpt-4o-mini`: most cost-efficient small model. The model has 128K context and an October 2023 knowledge cutoff.
2. `gpt-4o`: most intelligent yet affordable flagship GPT model. The model has 128K context and an October 2023 knowledge cutoff.
3. `o1-mini`: faster and cheaper reasoning model particularly good at coding, math, and science.
4. `o1`: reasoning model designed to solve hard problems across domains.

All the models above have a knowledge cut off at October 2023, and a 128k token limit in the context window (so approx. 100k english words maximum in the input messages).

For pricing information: look [here](https://openai.com/api/pricing/).

In [3]:
from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a very short poem about an astronaut on the Moon"
        }
    ]
)

print(completion.choices[0].message.content)

Beneath a quiet, silver sky,  
An astronaut walks, dreams soaring high.  
With each soft step on lunar dust,  
He leaves his mark, in stars, he trusts.  


## Advanced Options

Here are some more exotic parameters you can use. Their meaning is described right after the code.

In [5]:
from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a very short poem about an astronaut on the Moon"
        }
    ],
    max_completion_tokens=256,
    n=3,
    temperature=0.7,
    frequency_penalty=0.5,
    logprobs=True
)

In [6]:
print(completion.choices[0].message.content)

Beneath the stars, in silence vast,  
An astronaut treads where dreams are cast.  
Footprints mark the lunar dust,  
In weightless wonder, hope and trust.  

With Earth a jewel in endless night,  
He dances softly in silver light.  
A quiet glance at the cosmic sea,  
In that stillness, he feels truly free.


In [7]:
print(completion.choices[1].message.content)

In silver dust, where shadows play,  
An astronaut drifts far away.  
Beneath the stars, so cold and bright,  
He whispers dreams in lunar night.  


For the full documentation you can look [here](https://platform.openai.com/docs/api-reference/chat/create) but here are my favorite parameters.

### Controlling OpenAI chat model behaviour
You can pass more arguments to control the behaviour of the model
- `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
- `temperature`: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. OpenAI documentation generally recommend altering this or top_p but not both.

### Controlling OpenAI response's additional information
- `response_format`: An object specifying the format that the model must output. Compatible with GPT-4o, GPT-4o mini. Typically used to return a JSON

- `n`: How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

### Open AI chat model - Very Advanced ðŸ’ª
- `frequency_penalty`: Number between -2.0 and 2.0. Higher increase token diversity.
- `presence_penalty`: Number between -2.0 and 2.0. Hier increase token diversity. Inddded, positive values penalize new tokens based on whether they appear in the text so far.
- `top_p`: Changes the pool of token to sample from. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

The following two are most used for evaluation / audit.
- `logprobs` (default to `False`): Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message
- `top_logprobs` (default to `null`): An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is use

## Streaming
Without streaming you have to wait until the full response is created by the model to see it.
With **streaming** you see each token as soos as they are generated, like in the ChatGPT interface. Streaming provide a much better user experience.
Otherwise, if you don't have user-facing apps, you may not need it.

In [8]:
from openai import OpenAI

client = OpenAI(api_key=OPENAI_API_KEY)

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a very short poem about an astronaut on the Moon"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

In silence deep, where shadows play,  
An astronaut treads on silver gray.  
Stars like diamonds blink above,  
In the vast stillness, whispers of love.  