# 01 - Tokens

GPT models process text using *tokens*, which are common sequences of characters found in text. The models understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

The conversion of a prompt into tokens happens automatically when you submit a prompt so you don't need to do anything yourself. However, OpenAI services like Azure OpenAI use the number of tokens processed as part of the pricing model, in the case of Azure OpenAI, charging per 1,000 tokens. So understanding how many tokens your prompts consume is an important part of planning and building any application that will use OpenAI.

The prompt **"Hello world, this is fun!"** gets tokenized as follows:

```
Hello
 world,
 this
 is
 fun
!

(6 tokens)
```
Notice how spaces and punctuation are included as part of the tokens. A token doesn't always necessarily equate to a single word or phrase.

Let's try the prompt **"Example using words like indivisible and emojis"**.

```
Example
 using
 words
 like
 ind
iv
isible
 and
 em
oj
is

(11 tokens)
```
This time you can see that some of the words, **indivisible** and **emojis**, got broken up into smaller chunks.

A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).

:thumbsup: You can experiment with this yourself using the *tokenizer* tool available on the OpenAI website at https://platform.openai.com/tokenizer

## Experimenting with tokens in code

OpenAI provide the `tiktoken` package that you can use to experiment with tokenization in your code.

`tiktoken` supports three encodings used by Azure OpenAI Service models:

| Encoding name | Azure OpenAI Service models |
| ------------- | -------------- |
| gpt2 (or r50k_base) | Most GPT-3 models |
| p50k_base | Code models, text-davinci-002, text-davinci-003 |
| cl100k_base | text-embedding-ada-002 |

You can use `tiktoken` as follows to tokenize a string and see what the output looks like.

In [None]:
import tiktoken

encoding = tiktoken.get_encoding("p50k_base")
encoding.encode("Hello world, this is fun!")

Was the output of the above code what you were expecting?

If you were expecting text broken up like the examples at the top of this page, you were probably wondering why you just got back a bunch of seemingly random numbers. This is because the AI models don't work on words. Instead, they use a method called *BPE* (Byte Pair Encoding) to convert the text into numeric tokens.

One of the features of BPE is that it's reversible, so you can convert the tokens back into the original text.

### Challenge - Display the text instead of the tokens

See if you can write code to display the text instead of the tokens.

:bulb: **HINT:** See the following cookbook for some tips on working with `tiktoken`: [How to count tokens with tiktoken](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb)

In [None]:
# Write code to display the text from the tokens below

#FIXME

If you're successful, the results should be similar to the following

`[b'Hello', b' world', b',', b' this', b' is', b' fun', b'!']`

### Challenge - Write a function to return the number of tokens

Using what you've learned so far, complete the following function so that it returns the count of the number of tokens in a text string.

In [None]:
def get_num_tokens_from_string(string: str, encoding_name: str='p50k_base') -> int:
    #FIXME

get_num_tokens_from_string("Hello World, this is fun!")

## Blowing up the prompt!

Apart from cost, there's another reason that you'll want to be in control of the number of tokens you use. All AI models have a limit on the maximum number of tokens that a request can consume. The limit per request includes the number of tokens in the prompt **plus** the number of tokens in the response. Different models can have different token limits, but ultimately the overall size of your prompt and the response to that prompt have to be smaller than that limit.

Let's show this in action by deliberately sending a prompt that's too large.

We have a file, `movies.csv`, in this folder which contains a long list of movie data. Let's use `tiktoken` to see how big this file is in tokens.

In [None]:
import os
import tiktoken

# Open the file with information about movies
movie_data = os.path.join(os.getcwd(), "movies.csv")
content = open(movie_data, "r").read()

# Use tiktoken to tokenize the content and get a count of tokens used.
encoding = tiktoken.get_encoding("p50k_base")
print (f"Token count: {len(encoding.encode(content))}")

You should have a result of something like 60,000 tokens. Which is a huge amount of tokens! But let's continue regardless.

Let's setup the OpenAI API to use for this example.

In [None]:
import os
import openai
from dotenv import load_dotenv

load_dotenv()

openai.api_type = os.getenv("OPENAI_API_TYPE")
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")

As you've seen in the Prompts section of this workshop, we can provide additional data to an AI model by including that data in the prompt. So, let's construct a prompt that asked for the highest rated movie and then provide the list of movies for the AI to work with.

In [None]:
query = "What's the highest rated movie from the following list\n"
query += "CSV list of movies:\n"
query += content

print (f"{query[0:500]} ...[{len(query)} characters]")

You can see we've output the first few lines of the prompt and we've printed the overall size (in characters) of the prompt. Now, let's see what happens if we submit that query to the AI.

In [None]:
r = openai.Completion.create(
    model = os.getenv("OPENAI_COMPLETION_MODEL"),
    deployment_id = os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME"),
    prompt = query,
)

print (r)

This request will fail. At the end of the output, you should see an error message something like the following:

```bash
InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 60797 tokens (60781 in your prompt; 16 for the completion). Please reduce your prompt; or completion length.
```

You can see quite clearly from the error message that we've exceeded the model's maximum length. Its maximum is **4,097** tokens, we've sent a request requiring **60,797** tokens, which includes **60,781** tokens for our query and **16** tokens needed for the response.

## Summary

In this lab, we've learned about tokens. Your prompts will be broken up and sent to the AI as tokens and all AI models have a maximum token size which you must take care not to exceed. You will also be charged based on the number of tokens that your queries consume.

## Up Next

In the next lab, we'll look at one of the ways you can take control of the number of tokens your prompts consume by introducing the concept of **Embeddings**.