# Tokenization, counting tokens, and cost calculations

Although LLMs allow text-to-text user--computer interaction, behind the scenes the work with numbers.
This means that any input text needs to be converted into a sequence of integers ("encoded") that represent the words, subwords, and symbols in the input in a way the model can process.
This process of converted text inoputs into a sequence of integers is called *tokenization*.

You should think about tokenization when you rely on a **commercial API** for using an LLM.
Commercial providers charge per **input and output tokens**:

- input tokens are those that go into the model
- output tokens are those that come out of the model (i.e., that the model generates to responsed to your prompt)

Counting input tokens and estimating the number of output tokens is important because it helps you to compute the costs of using a language model and ensure that your input text is within the maximum token limit of the model you are using.

In this notebook, we'll use some custom classes I've define to count tokens when you use Llama 3

## Background

> The atomic unit of consumption for a language model is not a “word”, but rather a “token”.
> You can kind of think of tokens as syllables, and on average they work out to about 750 words per 1,000 tokens.
> They represent many concepts beyond just alphabetical characters – such as punctuation, sentence boundaries, and the end of a document.
> &mdash; [source](https://github.com/brexhq/prompt-engineering?tab=readme-ov-file#tokens)

Learn more about tokenizers and their reason of existence here: https://huggingface.co/docs/transformers/tokenizer_summary

## Token limits a.k.a. context window size

LLMs are "stateless" and thus cannot remember anything about previous requests or converations.
This means that so you always need to include everything that it might need to know that is specific to the current session.

This is a major downside of LLMs, as it means that the leading language model architecture, the Transformer, has a fixed input and output size – at a certain point the prompt cannot grow any larger.

The total size of the prompt, sometimes referred to as the **context window**, is model dependent.
For GPT-3, it is 4,096 tokens. 
For GPT-4, it is 8,192 tokens or 32,768 tokens depending on which variant you use.

You can find a detailed overview here: 

- for GPT-4 and its variants: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
- for GPT-3.5-turbo and its variants: https://platform.openai.com/docs/models/gpt-3-5-turbo

In [1]:
import tiktoken

`tiktoken` makes available several encodings that are used by the varios OpenAI models, including GPT-3 and GPT-4.

In [2]:
# list encoding names
tiktoken.list_encoding_names()

['gpt2',
 'r50k_base',
 'p50k_base',
 'p50k_edit',
 'cl100k_base',
 'o200k_base',
 'o200k_harmony']

For example, GPT-4 (snapshot from June 2023) uses the 'cl100k_base' encoding:

In [3]:
# get the encoding model for the desired model
encoding = tiktoken.encoding_for_model('gpt-4o-2024-08-06')
encoding.name

'o200k_base'

With the `encoding` instance created above, you can tokenize and encode any text input:

In [4]:
encoding.encode('Hello, world!')

[13225, 11, 2375, 0]

These numbers are just token's indexes in the tokenizer's vocabulary. They are not the actual token counts.

In [5]:
[encoding.decode_single_token_bytes(tok).decode() for tok in encoding.encode('Hello, world!')]

['Hello', ',', ' world', '!']

But since we can tokenize a text, counting the number of tokens is trivial:

In [6]:
toks = encoding.encode('Hello, world!')
len(toks)

4

### Pre-defined token counter classes

In [7]:
from src.utils.token_counters import OpenAITokenCounter

tokens_counter = OpenAITokenCounter(model='gpt-4o-2024-08-06')

In [8]:
text = "Liberal Alliance er det eneste alternativ til  et træt VKO-flertal, som er bange for både  reformer, udlændinge og vælgere, og en  populistisk S/SF-regering, som er bange  for præcis de samme ting - og som vil indføre endnu flere skatter, afgifter, regler og  forbud,  end  den  nuværende  regering  plager os med."
tokens_counter(text)

96

In [9]:
tokens_counter(["Hello, world!", "I'm tiktoken!"])

[4, 5]

### The same with open-source models (but trickier)

Typically, open-weights models like Llama 3 are available through the [hugging face model hub](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
So we can download the tokenizers from there to compute the number of (input) tokens.

In [10]:
from src.utils.token_counters import HFTokenCounter, REPLICATE_MODELS_TOKENIZERS

# say you use 'meta/meta-llama-3-70b-instruct' via Replicate (see https://replicate.com/meta/meta-llama-3-70b-instruct)

# tokenizer_name = REPLICATE_MODELS_TOKENIZERS['meta/meta-llama-3-70b-instruct']
# print(tokenizer_name)
tokenizer_name = "Qwen/Qwen2.5-1.5B-Instruct"

tokens_counter = HFTokenCounter(tokenizer_name=tokenizer_name)

ImportError: cannot import name 'REPLICATE_MODELS_TOKENIZERS' from 'src.utils.token_counters' (/Users/hlicht/Dropbox/teaching/advanced_text_analysis/github/src/utils/token_counters.py)

In [None]:
tokens_counter(["Hello, world!", "I'm tiktoken!"])

## Computing API usage costs

OpenAI charges model usage costs based on the number of tokens processed by the model.
This means that you need to be aware of the number of tokens in your input text and the (expected) number of tokens in its response to avoid unexpected costs.

To see what OpenAI charges you per 1,000,000 (one million) input and output tokens, see https://openai.com/pricing

On September 16, 2024, the cost for using GPT-4o (`gpt-4o-2024-08-06`) are: $3.50 per 1M input tokens, and $10.00 per 1M output tokens.

In [None]:
MODEL = 'gpt-4o-2024-08-06'
tokens_counter = OpenAITokenCounter(model=MODEL)

### Example calculations

Say you have a dataset with ten sentences:

In [None]:
dataset = [
    "I absolutely love this product. It's incredibly user-friendly.",
    "I'm really disappointed with the service I received.",
    "The weather today is absolutely beautiful, it makes me feel so happy.",
    "I'm feeling really down today, nothing seems to be going right.",
    "This is the best day of my life, I couldn't be happier!",
    "I'm so frustrated with the lack of communication from the team.",
    "The movie was a masterpiece, the storyline was captivating and the acting was superb.",
    "I'm feeling really stressed about the upcoming exam.",
    "The food at the restaurant was delicious, I'll definitely be going back.",
    "I'm really angry about the decision, it's completely unfair."
]
len(dataset)

And say your instructions are:

In [None]:
instructions = """
You will be provided with a sentence. 

Your task is to classify the sentence's sentiment as either positive, negative, or neutral.

Please choose one of the following categories: positive, negative, neutral.

Only respond with your chosen category and no further text or explanations.
"""

#### Number of input tokens

compontents:

- `n = len(texts)`
- `n_tokens_prompt = tokens_counter(prompt)`
- `n_tokens_data = sum(tokens_counter(texts))`

finally: `n_input_tokens_total = sum(n*n_tokens_prompt + n_tokens_data)`

#### Number of output tokens

`n_input_tokens_total = n * 2` (this is specific to single-label text class. task)

#### Finally

`total_cost = n_input_tokens_total*input_token_cost + n_input_tokens_total*output_token_cost`

Our label classes have the following numbers of tokens:

In [None]:
# number of tokens per label class?
tokens_counter(['positive', 'negative', 'neutral'])

Then for each sentence, we need to send the instructions plus the sentences as input and we will receive one of the three answer categories.
So we can calulate:

In [None]:
n = len(dataset)
n_input_tokens = sum(tokens_counter(dataset)) + n * tokens_counter(instructions)
n_output_tokens = len(dataset)*2

print('# of input tokens:', n_input_tokens)
print('# of output tokens:', n_output_tokens)

No we can compute the cost (in U.S. $) for requesting classifications of the ten examples in our dataset:

In [None]:
(
    n_input_tokens/1_000_000 * 2.50 # $2.50 per 1M input tokens
    +
    n_output_tokens/1_000_000 * 10.00 # $10 per 1M output tokens
)

This is 19/100 of a U.S. Dollar cent.

**IMPORTANT:** Note that additional charges for value added tax (VAT) may apply. Check this when you plan your budget.