# OpenAI models and Prompting:  
In this notebook we will cover these aspects:  
* **Tokenization, basics:** What is tokenization and why it is important for Large Language Models (LLM) for understaing a text
* **OpenAI LLM models overview:** Types of models and their APIs in OpenAI
* **Prompt Engineering:** Some vital prompt eng. techniques and examples for some DS/ML tasks
* **Tokenization, advanced:** Tokenization methods and examples with `tiktoken`
* **Calculate costs for models and prompts:** Functions for calculating number of tokens and costs for OpenAI models


Let's begin with tokenization.

# Tokenization, basics:  
When a text is passed to an OpenAI model, it is split into parts, called `tokens`. It's done with `tokenizer`.   

For example:      
Given a text string (e.g., `"tiktoken is great!"`) a tokenizer can split the text string into a list of tokens (e.g., `["t", "ik", "token", " is", " great", "!"]`).

Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you (a) whether the string is too long for a text model to process and (b) how much an OpenAI API call costs (as usage is priced by token).  
For example, the longest word in English, `pneumonoultramicroscopicsilicovolcanoconiosis` tokenized with [OpenAI Tokenizer](https://beta.openai.com/tokenizer) would have:  
* 15 Tokens  
* 45 chars  

![title](img/tokenization.png)  

You can see different colors represent different tokens that the original word was split into. If you click on `TOKEN IDS` button, the tokens will be represented as list of numbers. These numbers will be passed to a model instead of string tokens, because the model itself cannot recognize strings, but only numbers.  
![title](img/tokenization_2.png)

More about working with token IDs will be covered in `Tokenization, advanced` section.

# OpenAI LLM models overview:  
There are 2 major versions of LLM models: [GPT-3](https://platform.openai.com/docs/models/gpt-3) and [GPT-3.5](https://platform.openai.com/docs/models/gpt-3-5). Basically, `GPT-3.5` models outperform `GPT-3`, but cost higher, so it's reasonable to select a model according to the task that you want to solve.  

Here we will not cover `GPT-4` since it's still on limited beta (June, 2023). 

In [1]:
!pip install openai 

In [25]:
import os
import openai


openai.api_key = "YOUR_OPENAI_TOKEN"

## Additional parameters if you use Azure backend (uncomment if you do):
# openai.api_base = "HTTPS to your Azure endpoint"
# openai.api_type = "azure"
# openai.api_version = "2023-03-15-preview"

# Be aware, this notebook is not written for Azure backend, so, please, 
# when using Completion or ChatCompletion model APIs, replace `model` parameter
# with `endpoint` or `deployment_id` and specify your deployment name.
# Otherwise, you will get an error.

## GPT-3 models:
**GPT-3** models are: `ada`, `babbage`, `curie`, their `text-` variants and `davinci` . Only original models like `ada` (without prefix `text-`) available for fine-tuning, and their `text-` variants could be used as is.  
There is a description from OpenAI documentation (June, 2023):  

![Screenshot 2023-06-29 at 20.44.45.png](./img/openaicompletion.png)


Different models are capable of solving different tasks:
* **ada** -- binary, multiclass classification of text; parsing and information extraction; very fast but not so powerful;  
* **babbage** -- all of the above + excels in semantic search like 'how well texts match a query text';
* **curie** -- quite a balanced model in terms of latency and performance; summarization, QA, complex classification + babbage's tasks;  
* **davinci** -- has the best quality and text comprehension across all other models.  

Having this information and a task that you should solve, you can choose the most suitable model for it.  
For example, it's **ada** could be not the best choice if you want to solve QA task and you'd rather try **curie**. But for the binary classification of text you can fine-tune **ada** and get quite good results!

## GPT-3.5:  
These models include `text-davinci-*`, `code-davinci` variants, `gpt-3.5-*` and `gpt-4-*`.  
There is a description from OpenAI documentation (June, 2023):  
  
  
![Screenshot 2023-06-29 at 21.03.27.png](./img/gpt35models.png)   


Basically, `gpt-3.5-turbo` is cheaper than `davinci` and was fine-tuned on the chat data, so that's the best choice if you're building chatbot or want to solve NLP tasks with decent quality. OpenAI often releases new version of `gpt-3.5`, like `gpt-3.5-turbo-0613` adding new features. After some time a newest model will replace `gpt-3.5-turbo`, meaning when you will use `gpt-3.5-turbo` you will use the most recent model automatically. As you can see, starting from 27th of June, `gpt-3.5-turbo` will be replaced by `gpt-3.5-turbo-0613`.

## Completion VS ChatCompletion API:  
There is also 2 variants of API or how to use these models, for example in Python: `Completion` and `ChatCompletion`.  
Basically, `ChatCompletion` is `gpt-3.5-*` and higher, like `gpt-4`. Everything else, even davinci, are `Completion` models.

## [Completion API](https://platform.openai.com/docs/guides/gpt/completions-api):  
For the most common case you need to specify:
* model name
* prompt that will have the task for the model and should be a string


In [33]:
import openai

response = openai.Completion.create(
    model="text-davinci-003",
    # deployment_id="deployment_with_davinci",
    prompt="Write a tagline for an ice cream shop.",
    temperature=0,
    max_tokens=256,
    n=1,
)

In [34]:
# Example of an output JSON:
print(response)

{
  "id": "cmpl-7Ws0kzlg2KOTmmpDOJX2ARKqjUJxK",
  "object": "text_completion",
  "created": 1688068470,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": "\n\n\"Cool down with our delicious treats!\"",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 10,
    "total_tokens": 20
  }
}


In [11]:
print(response["choices"][0]["text"].strip())

"Taste the chill of happiness!"


Let's cover most important parameters for API:  
- `temperature` -- aka randomness or 'creativity' of an output, `[0, 1]`. 0 -- no random, the output will be the same all the executions. 1 -- the output will be very different each execution.  

In [35]:
response = openai.Completion.create(
    model="text-davinci-003",
    prompt="Write a tagline for an ice cream shop.",
    temperature=0.75,
    max_tokens=256,
    n=1,
)
print(response["choices"][0]["text"].strip())

"Taste the Cream of the Crop!"


* `max_tokens` -- limits number of tokens in the output. It works like too low values will shrink your output, but not making the model to generate very short output!

In [41]:
response = openai.Completion.create(
    model="text-davinci-003",
    prompt="Write a tagline for an ice cream shop.",
    temperature=0.75,
    max_tokens=5,
    n=1,
)
print(response["choices"][0]["text"].strip())

"It's


* `n` -- how many output variants to generate

In [43]:
response = openai.Completion.create(
    model="text-davinci-003",
    prompt="Write a tagline for an ice cream shop.",
    temperature=0.75,
    max_tokens=256,
    n=2,
)
print(response["choices"])

[<OpenAIObject at 0x7f9288e8fe70> JSON: {
  "text": "\n\n\"The sweetest treat to cool you down!\"",
  "index": 0,
  "logprobs": null,
  "finish_reason": "stop"
}, <OpenAIObject at 0x7f9288e8f470> JSON: {
  "text": "\n\n\"Cool off with a scoop of deliciousness!\"",
  "index": 1,
  "logprobs": null,
  "finish_reason": "stop"
}]


Be aware, it's meaningless to use `n > 1` and `temperature == 0` since you will get `n` equal results:

In [44]:
response = openai.Completion.create(
    model="text-davinci-003",
    prompt="Write a tagline for an ice cream shop.",
    temperature=0,
    max_tokens=256,
    n=2,
)
print(response["choices"])

[<OpenAIObject at 0x7f9288e8e2f0> JSON: {
  "text": "\n\n\"Cool down with our delicious treats!\"",
  "index": 0,
  "logprobs": null,
  "finish_reason": "stop"
}, <OpenAIObject at 0x7f9288e8e430> JSON: {
  "text": "\n\n\"Cool down with our delicious treats!\"",
  "index": 1,
  "logprobs": null,
  "finish_reason": "stop"
}]


Other parameters you can check on official [OpenAI page](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens:~:text=given%20chat%20conversation.-,Request%20body,-model).

## [ChatCompletion API](https://platform.openai.com/docs/guides/gpt/chat-completions-api):  

For the most common case you need to specify:
* model name 
* messages in the form of list of dictionaries; they will emulate a chat between a user and an assistant

In [28]:
import openai


response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

In [29]:
# Example of an output JSON:
print(response)

{
  "id": "chatcmpl-7WrzGWs27LSI9IH2Oo6DBUmQCguD7",
  "object": "chat.completion",
  "created": 1688068378,
  "model": "gpt-35-turbo",
  "usage": {
    "prompt_tokens": 57,
    "completion_tokens": 19,
    "total_tokens": 76
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas, USA."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}


In [30]:
print(response["choices"][0]["message"]["content"].strip())

The 2020 World Series was played at Globe Life Field in Arlington, Texas, USA.


The **role** here is specifying who exactly sends the message, whether its from Human or from AI.  
- **system role** is not required, but helpful to specify how a model should consider itself.  
[From OpenAI documentation](https://platform.openai.com/docs/guides/gpt/chat-completions-api#:~:text=The%20system%20message,a%20helpful%20assistant.%22):  
"The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. However note that the system message is optional and the model’s behavior without a system message is likely to be similar to using a generic message such as "You are a helpful assistant."  
- **human role** are possible inputs from the user helping a model to 'fit' on them 
- **assistant** -- examples of a desired behavior on the user's inputs 

## OpenAI Embedding models overview:  
You can use OpenAI models to vectorize texts, for example for QA, semantic search, topic recognition and other tasks that require calculations of similarities of texts.  
[OpenAI Documentation link](https://platform.openai.com/docs/guides/embeddings/embedding-models).  

![Screenshot 2023-06-29 at 21.39.27.png](./img/embeddings.png)  

Basically, everything, except `text-embedding-ada-002`, is first-generation model and they are outperformed by `*-ada-002`.  
There is a simple example of obtaining embeddings for a text:

In [18]:
def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']


text_to_embed = "This text will be vectorized and used in some DS tasks..."
embedding = get_embedding(text_to_embed)

In [22]:
len(embedding)

1536

So far we covered different models from OpenAI, there is a short summary:
* Completion models -- most of them not very powerful, but fast and capable of very simple tasks (except Davinci model);
* ChatCompletion models -- `gpt-35-turbo` could be a very good baseline in most of the cases;
* Embedding models -- you may want to use `text-embedding-ada-002` since it's not very pricy and has a good quality.

# 📝 Prompt Engineering:  
In this section we will cover:
* Existing best techniques for prompt engineerging
* Completion prompts for some Data Science tasks  


*[References] This section is based on [DeepLearning.ai course](https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/1/introduction?_gl=1*tjnub9*_ga*MzgyMDU1MDc3LjE2ODMwNTU2MzE.*_ga_PZF1GBS1R1*MTY4Nzk0NjM3My45LjEuMTY4Nzk0Nzk4Ni4zOC4wLjA.).*

In [38]:
def generate_response(prompt: str):
    response = openai.Completion.create(
          model="text-davinci-003"
          prompt=prompt,
          temperature=0,
          max_tokens=256,
          top_p=1,
          frequency_penalty=0.0,
          presence_penalty=0.0,
        )
    
    return response["choices"][0]["text"].strip()

## Best practices:  
Now let's observe and try different techniques helping you to construct a comprehensive and effective prompt. 

**1. Use delimiters to indicate distinct parts of the input clearly:**

Delimiters can be anything like: ```, """, < >, \<tag> \</tag>. It could help model focusing on specific parts of the text:

In [37]:
text = """
Jupiter is the fifth planet from the Sun and the largest in the Solar System.
It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half 
times that of all the other planets in the Solar System combined. 
Jupiter is one of the brightest objects visible to the naked eye in the night sky,
and has been known to ancient civilizations since before recorded history. 
It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be 
bright enough for its reflected light to cast visible shadows,[20] and is on average 
the third-brightest natural object in the night sky after the Moon and Venus.
"""

prompt = f""" Summarize the text delimited by triple backticks into a single sentence. ```{text}``` """
print(generate_response(prompt=prompt))

Jupiter is the fifth planet from the Sun, a gas giant with a mass two-and-a-half times that of all the other planets in the Solar System combined, and is one of the brightest objects visible to the naked eye in the night sky, having been known to ancient civilizations since before recorded history.


**2. Ask for a structured output:**

You can as about JSON, HTML or any other reasonable format. JSON could be useful in many cases, let's try it:

In [39]:
prompt = f""" 
Generate a list of three made-up book titles along 
with their authors and genres. Provide them in JSON 
format with the following keys: book_id, title, author, genre. 
"""

print(generate_response(prompt=prompt))

[
    {
        "book_id": 1,
        "title": "The Adventures of the Lost Princess",
        "author": "J.K. Rowling",
        "genre": "Fantasy"
    },
    {
        "book_id": 2,
        "title": "The Mystery of the Missing Heirloom",
        "author": "Agatha Christie",
        "genre": "Mystery"
    },
    {
        "book_id": 3,
        "title": "The Rise of the Superheroes",
        "author": "Stan Lee",
        "genre": "Superhero"
    }
]


**3. Ask the model to check whether conditions are satisfied:**  
It's like `IF-ELSE` statement inside a prompt:

In [44]:
recipe_w_steps = f"""
Making a cup of tea is easy! First, you need to get some
 water boiling. While that's happening, 
grab a cup and put a tea bag in it. Once the water is 
hot enough, just pour it over the tea bag. 
Let it sit for a bit so the tea can steep. After a 
few minutes, take out the tea bag. If you 
like, you can add some sugar or milk to taste. 
And that's it! You've got yourself a delicious 
cup of tea to enjoy. 
""" 

recipe_wo_steps = f"""
Making a cup of tea is easy! Just do it!
""" 


prompt = """ 
You will be provided with text delimited by triple quotes. If it contains a sequence of instructions, 
re-write those instructions in the following format:

Step 1 - ... Step 2 - … … Step N - …

If the text does not contain a sequence of instructions, 
then simply write \"No steps provided.\"

\"\"\"{recipe}\"\"\"
"""

print("Steps provided: ", generate_response(prompt=prompt.format(recipe=recipe_w_steps)))
print("Steps missed: ", generate_response(prompt=prompt.format(recipe=recipe_wo_steps)))

Steps provided:  Step 1 - Get some water boiling. Step 2 - Grab a cup and put a tea bag in it. Step 3 - Pour the hot water over the tea bag. Step 4 - Let the tea steep for a few minutes. Step 5 - Take out the tea bag. Step 6 - Add sugar or milk to taste. Step 7 - Enjoy your cup of tea.
Steps missed:  No steps provided.


**4. Focus on specific aspects:**  
Specifically ask a model to pay more attention on some details according to the task:

In [41]:
fact_sheet_chair = """
OVERVIEW
- Part of a beautiful family of mid-century inspired office furniture, 
including filing cabinets, desks, bookcases, meeting tables, and more.
- Several options of shell color and base finishes.
- Available with plastic back and front upholstery (SWC-100) 
or full upholstery (SWC-110) in 10 fabric and 6 leather options.
- Base finish options are: stainless steel, matte black, 
gloss white, or chrome.
- Chair is available with or without armrests.
- Suitable for home or business settings.
- Qualified for contract use.

CONSTRUCTION
- 5-wheel plastic coated aluminum base.
- Pneumatic chair adjust for easy raise/lower action.

DIMENSIONS
- WIDTH 53 CM | 20.87”
- DEPTH 51 CM | 20.08”
- HEIGHT 80 CM | 31.50”
- SEAT HEIGHT 44 CM | 17.32”
- SEAT DEPTH 41 CM | 16.14”

OPTIONS
- Soft or hard-floor caster options.
- Two choices of seat foam densities: 
 medium (1.8 lb/ft3) or high (2.8 lb/ft3)
- Armless or 8 position PU armrests 

MATERIALS
SHELL BASE GLIDER
- Cast Aluminum with modified nylon PA6/PA66 coating.
- Shell thickness: 10 mm.
SEAT
- HD36 foam

COUNTRY OF ORIGIN
- Italy
"""

prompt = f""" 
Your task is to help a marketing team create a description for a retail website of a product based on a technical fact sheet.
Write a product description based on the information provided in the technical specifications delimited by triple backticks.

The description is intended for furniture retailers, so should be technical in nature and focus on the materials the product is constructed from.

Use at most 50 words.
Technical specifications: ```{fact_sheet_chair}```
"""

print(generate_response(prompt=prompt))

This mid-century inspired office chair is constructed from cast aluminum with a modified nylon PA6/PA66 coating and 10mm shell thickness. It features a 5-wheel plastic coated aluminum base and pneumatic chair adjust for easy raise/lower action. Choose from several shell color and base finishes, with or without armrests, and two choices of seat foam densities. Suitable for home or business settings, this chair is made in Italy and qualified for contract use.


**5. Specify the model’s role:**

In [42]:
prompt = f"""
You are OrderBot, an automated service to collect orders for a pizza restaurant. 
You first greet the customer, then collects the order, and then asks if it's a pickup or delivery. 
You wait to collect the entire order, then summarize it and check for a final time if the customer wants to add anything else.
"""

print(generate_response(prompt=prompt))

Hello! Welcome to OrderBot. What can I get for you today?


## Prompt examples for Data Science / Machine Learning tasks:  

In [45]:
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \ 
super cute, and its face has a friendly look. It's \ 
a bit small for what I paid though. I think there \ 
might be other options that are bigger for the \ 
same price. It arrived a day earlier than expected, \ 
so I got to play with it myself before I gave it \ 
to her.
"""

**1. Summarization with focus:**

In [46]:
prompt = f""" 
Your task is to generate a short summary of a product review from an ecommerce site
to give feedback to the pricing deparmtment, responsible for determining the price of the product.

Summarize the review below, delimited by triple backticks, in at most 30 words, and focusing on any aspects 
that are relevant to the price and perceived value.

Review: ```{prod_review}``` 
"""

print(generate_response(prompt=prompt))

Soft and cute panda plush toy was a hit with daughter, but was smaller than expected for the price. Arrived a day early.


**2. Information extraction:**

In [47]:
prompt = f"""
Your task is to extract relevant information from 
a product review from an ecommerce site to give 
feedback to the Shipping department.

From the review below, delimited by triple quotes 
extract the information relevant to shipping and delivery. Limit to 30 words.

Review: “””{prod_review}"”” 
"""

print(generate_response(prompt=prompt))

The product arrived a day earlier than expected, in good condition.


**3. Sentiment analysis:**

In [49]:
prompt = f"""
What is the sentiment of the following product review, which is delimited with triple backticks?

Give your answer as a single word, either "positive", "negative" or "neutral".

Review text: ```{prod_review}```
"""

print(generate_response(prompt=prompt))

Positive


**4. Entity Extraction:**

In [51]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

prompt = f"""

Identify the following entities from the text:

Company name
Date of contract
Sum of contract
Currency of a contract

The review is delimited with triple backticks. 
Format your response as a JSON object with entities the keys and recoginized entities as values. 
If the information isn't present, use "unknown" as the value. Make your response as short as possible.

Text: ```{lamp_review}```
"""

print(generate_response(prompt=prompt))

{
  "Company name": "Lumina",
  "Date of contract": "unknown",
  "Sum of contract": "unknown",
  "Currency of a contract": "unknown"
}


**5. Topic recognition (open topics):**

In [60]:
story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

In [61]:
prompt = f""" 
Determine five topics that are being discussed in the following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""

print(generate_response(prompt=prompt))

NASA, satisfaction, John Smith, Tom Johnson, Social Security Administration


**6. Topic recognition (closed list of topics):**

In [64]:
topic_list = ["NASA", "Security", "Biology"]

prompt = f"""

Recognize a topic from the given topic list delimited in triple backticks 
that is being discussed in the text, which is delimited by triple quotes. If you cannot determine between these 3, return 'other'.

Topic list: ```{topic_list}```

Text sample: '''{story}'''

Provide an answer only with one word, representing a determined topic.
"""

print(generate_response(prompt=prompt))

NASA


**7. Translation:**

In [52]:
text = "Hello! How are you?"
prompt = f""" Translate the following English text to Spanish: {text}. """

print(generate_response(prompt=prompt))

¡Hola! ¿Cómo estás?


**8. Tone transforming:**

In [53]:
text = "Hi, man! Nice to see you. I will not be ready with my task, is it ok?"
prompt = f""" Translate the following from slang to a business letter: {text}. """

print(generate_response(prompt=prompt))

Dear [Name],

It was a pleasure to see you. I regret to inform you that I will not be able to complete my task by the deadline. Is there any way we can work together to find a solution?

Thank you for your understanding.

Sincerely,
[Your Name]


**9. Spellcheck / Grammar check:**

In [58]:
text = "Hi, man! Nise to see you. I wont not be raedy with my task, it is ok?"

prompt = f"""Proofread and correct the following text. If you don't find any mistakes, just say "No errors found". Text: {text}"""
print(generate_response(prompt=prompt))

Hi, man! Nice to see you. I won't be ready with my task, is that okay?


**10. Multiple tasks at the same time:**  
Be aware that a prompt with multiple tasks tends to be less stable and correct than several prompts each for its own task.

In [59]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

prompt = f""" Identify the following items from the review text:

Sentiment (positive or negative)
Is the reviewer expressing anger? (true or false)
Item purchased by reviewer
Company that made the item

The review is delimited with triple backticks. 
Format your response as a JSON object with "Sentiment", "Anger", "Item" and "Brand" as the keys. 
If the information isn't present, use "unknown"as the value. 
Make your response as short as possible. Format the Anger value as a boolean.

Review text: '''{lamp_review}''' 
"""
print(generate_response(prompt=prompt))

{
  "Sentiment": "positive",
  "Anger": false,
  "Item": "lamp",
  "Brand": "Lumina"
}


These prompts could be a good baseline for your DS/ML task. You can consider them and add your own modifications, just make sure to avoid hallucinations and make them stable.

# Tokenization, advanced:  
Here we will understand how text is tokenized before passing to the model with `tiktoken` library from OpenAI.  
Source for most of the materials is: [OpenAI official example notebook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb) 

## Encodings with tiktoken
[`tiktoken`](https://github.com/openai/tiktoken/blob/main/README.md) is a fast open-source tokenizer by OpenAI that is used to tokenize texts before passing them into models.  
Encodings specify how text is converted into tokens. Different models use different encodings.

`tiktoken` supports three encodings used by OpenAI models:

| Encoding name           | OpenAI models                                       |
|-------------------------|-----------------------------------------------------|
| `cl100k_base`           | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`  |
| `p50k_base`             | Codex models, `text-davinci-002`, `text-davinci-003`|
| `r50k_base` (or `gpt2`) | GPT-3 models like `davinci`                         |

You can retrieve the encoding for a model using `tiktoken.encoding_for_model()` as follows:
```python
encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')
```

Note that `p50k_base` overlaps substantially with `r50k_base`, and for non-code applications, they will usually give the same tokens.

## Tokenizer libraries by language

For `cl100k_base` and `p50k_base` encodings:
- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)

For `r50k_base` (`gpt2`) encodings, tokenizers are available in many languages.
- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (or alternatively [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))


## Tokenization with tiktoken:

In English, tokens commonly range in length from one character to one word (e.g., `"t"` or `" great"`), though in some languages tokens can be shorter than one character or longer than one word. Spaces are usually grouped with the starts of words (e.g., `" is"` instead of `"is "` or `" "`+`"is"`). You can quickly check how a string is tokenized at the [OpenAI Tokenizer](https://beta.openai.com/tokenizer).

If needed, install `tiktoken` with `pip`:

In [66]:
%pip install --upgrade tiktoken

Collecting tiktoken
  Using cached tiktoken-0.4.0-cp310-cp310-macosx_10_9_x86_64.whl (797 kB)
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0
Note: you may need to restart the kernel to use updated packages.


In [48]:
import tiktoken

Use `tiktoken.get_encoding()` to load an encoding by name.

The first time this runs, it will require an internet connection to download. Later runs won't need an internet connection.

In [3]:
encoding = tiktoken.get_encoding("cl100k_base")

Use `tiktoken.encoding_for_model()` to automatically load the correct encoding for a given model name.

In [4]:
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

Turn text into tokens with `encoding.encode()`.

The `.encode()` method converts a text string into a list of token integers.

In [5]:
encoding.encode("tiktoken is great!")


[83, 1609, 5963, 374, 2294, 0]

Count tokens by counting the length of the list returned by `.encode()`.

In [6]:
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens


In [7]:
num_tokens_from_string("tiktoken is great!", "cl100k_base")


6

## Turn tokens into text with `encoding.decode()`

`.decode()` converts a list of token integers to a string.

In [8]:
encoding.decode([83, 1609, 5963, 374, 2294, 0])


'tiktoken is great!'

Warning: although `.decode()` can be applied to single tokens, beware that it can be lossy for tokens that aren't on utf-8 boundaries.

For single tokens, `.decode_single_token_bytes()` safely converts a single integer token to the bytes it represents.

In [9]:
[encoding.decode_single_token_bytes(token) for token in [83, 1609, 5963, 374, 2294, 0]]


[b't', b'ik', b'token', b' is', b' great', b'!']

(The `b` in front of the strings indicates that the strings are byte strings.)

## Comparing encodings

Different encodings vary in how they split words, group spaces, and handle non-English characters. Using the methods above, we can compare different encodings on a few example strings.

In [10]:
def compare_encodings(example_string: str) -> None:
    """Prints a comparison of three string encodings."""
    # print the example string
    print(f'\nExample string: "{example_string}"')
    # for each encoding, print the # of tokens, the token integers, and the token bytes
    for encoding_name in ["gpt2", "p50k_base", "cl100k_base"]:
        encoding = tiktoken.get_encoding(encoding_name)
        token_integers = encoding.encode(example_string)
        num_tokens = len(token_integers)
        token_bytes = [encoding.decode_single_token_bytes(token) for token in token_integers]
        print()
        print(f"{encoding_name}: {num_tokens} tokens")
        print(f"token integers: {token_integers}")
        print(f"token bytes: {token_bytes}")
        

In [11]:
compare_encodings("antidisestablishmentarianism")



Example string: "antidisestablishmentarianism"

gpt2: 5 tokens
token integers: [415, 29207, 44390, 3699, 1042]
token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']

p50k_base: 5 tokens
token integers: [415, 29207, 44390, 3699, 1042]
token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']

cl100k_base: 6 tokens
token integers: [519, 85342, 34500, 479, 8997, 2191]
token bytes: [b'ant', b'idis', b'establish', b'ment', b'arian', b'ism']


In [12]:
compare_encodings("2 + 2 = 4")



Example string: "2 + 2 = 4"

gpt2: 5 tokens
token integers: [17, 1343, 362, 796, 604]
token bytes: [b'2', b' +', b' 2', b' =', b' 4']

p50k_base: 5 tokens
token integers: [17, 1343, 362, 796, 604]
token bytes: [b'2', b' +', b' 2', b' =', b' 4']

cl100k_base: 7 tokens
token integers: [17, 489, 220, 17, 284, 220, 19]
token bytes: [b'2', b' +', b' ', b'2', b' =', b' ', b'4']


In [13]:
compare_encodings("お誕生日おめでとう")



Example string: "お誕生日おめでとう"

gpt2: 14 tokens
token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 29557]
token bytes: [b'\xe3\x81', b'\x8a', b'\xe8\xaa', b'\x95', b'\xe7\x94\x9f', b'\xe6\x97', b'\xa5', b'\xe3\x81', b'\x8a', b'\xe3\x82', b'\x81', b'\xe3\x81\xa7', b'\xe3\x81\xa8', b'\xe3\x81\x86']

p50k_base: 14 tokens
token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 29557]
token bytes: [b'\xe3\x81', b'\x8a', b'\xe8\xaa', b'\x95', b'\xe7\x94\x9f', b'\xe6\x97', b'\xa5', b'\xe3\x81', b'\x8a', b'\xe3\x82', b'\x81', b'\xe3\x81\xa7', b'\xe3\x81\xa8', b'\xe3\x81\x86']

cl100k_base: 9 tokens
token integers: [33334, 45918, 243, 21990, 9080, 33334, 62004, 16556, 78699]
token bytes: [b'\xe3\x81\x8a', b'\xe8\xaa', b'\x95', b'\xe7\x94\x9f', b'\xe6\x97\xa5', b'\xe3\x81\x8a', b'\xe3\x82\x81', b'\xe3\x81\xa7', b'\xe3\x81\xa8\xe3\x81\x86']


In usual scenarios you don't need to bother about tokenization at all: it will be implemented automatically. But tokenization can help you in other aspects of your DS/ML pipeline, for example, when you need to know number of tokens in the input or some parts of the prompts.

# Counting tokens for chat completions API calls

ChatGPT models like `gpt-3.5-turbo` and `gpt-4` use tokens in the same way as older completions models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation.

Below is an example function for counting tokens for messages passed to `gpt-3.5-turbo` or `gpt-4`.

Note that the exact way that tokens are counted from messages may change from model to model. Consider the counts from the function below an estimate, not a timeless guarantee.

In particular, requests that use the optional functions input will consume extra tokens on top of the estimates calculated below.

In [49]:
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens


In [15]:
# let's verify the function above matches the OpenAI API response

import openai

example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    "gpt-3.5-turbo-0301",
    "gpt-3.5-turbo-0613",
    "gpt-3.5-turbo",
    "gpt-4-0314",
    "gpt-4-0613",
    "gpt-4",
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = openai.ChatCompletion.create(
        model=model,
        messages=example_messages,
        temperature=0,
        max_tokens=1,  # we're only counting input tokens here, so let's not waste tokens on the output
    )
    print(f'{response["usage"]["prompt_tokens"]} prompt tokens counted by the OpenAI API.')
    print()


gpt-3.5-turbo-0301
127 prompt tokens counted by num_tokens_from_messages().
127 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo
129 prompt tokens counted by num_tokens_from_messages().
127 prompt tokens counted by the OpenAI API.

gpt-4-0314
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.



Knowing number of tokens spent is very useful.  
**Some examples of tokenization usage in DS/ML pipelines:**
- Basic logging of token number in different prompts
- Checks for large inputs, then summarization or splitting 
- Handling long texts like conversation history between User and AI: progressive summarization based on tokens number
- etc...

# Calculate costs for models and prompts:
Often, information about your future costs as well as token usage for prompts could be very useful.  
For example, you can get insights about what parts of the prompt has the largest number of tokens and reduce some irrelevant information to shorten the prompt.  
Having information about the token usage, you can easily estimate your costs. Let's see how to do that with Completion models first.

## For Completion models:  
For that, you can write a code like in the cells below to calculate costs for Completion models.  
Let's assume we need to summary a story, let's estimate our costs. For that we need to use information from [Pricing page on OpenAI](https://openai.com/pricing).

In [45]:
# Relevant on 28 Jun, 2023!
# Update with the updated costs from https://openai.com/pricing
costs_dictionary_completion = {
    "ada": {
        "usage": 0.0004,
        "finetuning": 0.0004,
        "finetuned_usage": 0.0016,  
    },
    "babbage": {
        "usage": 0.0005,
        "finetuning": 0.0006,
        "finetuned_usage": 0.0024,  
    },
    "curie": {
        "usage": 0.002,
        "finetuning": 0.003,
        "finetuned_usage": 0.012,  
    },
    "davinci": {
        "usage": 0.02,
        "finetuning": 0.03,
        "finetuned_usage": 0.12,  
    },
}


def calculate_costs_of_text(text: str):
    result_pricings = {}
    for model_name in costs_dictionary_completion:
        pricings = costs_dictionary_completion[model_name]
        encoding = tiktoken.encoding_for_model(model_name)
        
        tokens_num = len(encoding.encode(text))
        result_pricings[model_name] = {"tokens_num": tokens_num}
        for price_name, price in pricings.items():
            result_pricings[model_name][price_name] = price * tokens_num / 1000
    return result_pricings  


def calculate_costs(inputs, num_tokens_func, costs_dict):
    result_pricings = {}
    for model_name in costs_dict:
        pricings = costs_dict[model_name]

        tokens_num = num_tokens_func(inputs, model=model_name)
        result_pricings[model_name] = {"tokens_num": tokens_num}
        for price_name, price in pricings.items():
            result_pricings[model_name][price_name] = round(price * tokens_num / 1000, 6)
    return result_pricings  


def get_n_tokens_completion(text: str, model):
    encoding = tiktoken.encoding_for_model(model)
    tokens_num = len(encoding.encode(text))
    return tokens_num

In [50]:
prompt = """
Summarize this story:

In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

completion_costs = calculate_costs(inputs=prompt, num_tokens_func=get_n_tokens_completion, costs_dict=costs_dictionary_completion)

In [51]:
for model in completion_costs:
    print(f"Model: {model}")
    for cost_name in completion_costs[model]:
        if cost_name  == "tokens_num":
            print("- Number of tokens: ", completion_costs[model]["tokens_num"])
        else:
            print(f"- Cost for {cost_name} to the model: {completion_costs[model][cost_name]}$")
    print()

Model: ada
- Number of tokens:  271
- Cost for usage to the model: 0.000108$
- Cost for finetuning to the model: 0.000108$
- Cost for finetuned_usage to the model: 0.000434$

Model: babbage
- Number of tokens:  271
- Cost for usage to the model: 0.000136$
- Cost for finetuning to the model: 0.000163$
- Cost for finetuned_usage to the model: 0.00065$

Model: curie
- Number of tokens:  271
- Cost for usage to the model: 0.000542$
- Cost for finetuning to the model: 0.000813$
- Cost for finetuned_usage to the model: 0.003252$

Model: davinci
- Number of tokens:  271
- Cost for usage to the model: 0.00542$
- Cost for finetuning to the model: 0.00813$
- Cost for finetuned_usage to the model: 0.03252$



* `Cost for usage to the model` -- cost for passing this prompt to the model
* `Cost for finetuning to the model` -- cost for fintuning the model on that text
* `Cost for finetuned_usage to the model` -- cost for passing this prompt to the finetuned model


## Calculate costs for ChatCompletion models:  
Now let's do the same for ChatCompletion models, but now we have a chat-like messages:

In [128]:
# Relevant on 28 Jun, 2023!
# Update with the updated costs from https://openai.com/pricing
costs_dictionary_chat = {
    "gpt-3.5-turbo": {
        "input": 0.0015,
    },
    "gpt-3.5-turbo-16k": {
        "input": 0.003,
    },
    "gpt-4": {
        "input": 0.03,
    },
    "gpt-4-32k": {
        "input": 0.06,
    },
}


In [129]:
example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

In [130]:
chat_cost = calculate_costs(inputs=example_messages, num_tokens_func=num_tokens_from_messages, costs_dict=costs_dictionary_chat)



In [136]:
for model in chat_cost:
    print(f"Model: {model}")
    print("- Number of tokens: ", chat_cost[model]["tokens_num"])
    print(f"- Cost for an input to the model: {chat_cost[model]['input']}$")
    print()

Model: gpt-3.5-turbo
- Number of tokens:  129
- Cost for an input to the model: 0.000194$

Model: gpt-3.5-turbo-16k
- Number of tokens:  129
- Cost for an input to the model: 0.000387$

Model: gpt-4
- Number of tokens:  129
- Cost for an input to the model: 0.00387$

Model: gpt-4-32k
- Number of tokens:  129
- Cost for an input to the model: 0.00774$



Having information about money spending and number of tokens will help you to understand and extrapolateyour costs. If you're working with Azure backend, you can see costs analytics directly from Azure project dashboard, but if not, these functions could be very useful.