##### Copyright 2024 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: All about tokens

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Token_Counting.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>


An understanding of tokens is central to using the Gemini API. This guide will provide an interactive introduction to what a token is and how they are used in the Gemini API.

## About tokens

LLMs break up their input and produce their output at a granularity that is smaller than a word, but larger than a single character or code-point.

The objective will vary from implementation to implementation, but often the vocabulary is generated to minimise the total number of tokens required across a corpus. This encourages high-frequency words (such as `the` in English) to be represented by fewer ***tokens***, while lower-frequency words will be comprised of more tokens.

Tokens can be single characters, like `z`, or whole words, like `the`.

The specific vocabulary used is learned ahead of time in a process called "tokenization". As the Gemini API's vocabulary is considered an implementation detail, this guide will not go into detail, but you can find a technical deep-dive in TensorFlow's [Sub-word Tokenization](https://www.tensorflow.org/text/guide/subwords_tokenizer) tutorial, including the section on [the algorithm](https://www.tensorflow.org/text/guide/subwords_tokenizer#optional_the_algorithm).

## Tokens in the Gemini API

### Set up the API

You'll use the API to explore these concepts, so start by setting up the Python environment.

In [1]:
!pip install -U -q google-generativeai  # Install the Python SDK

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/137.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/137.4 kB[0m [31m1.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.4/137.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import google.generativeai as genai

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see the [Authentication](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb) quickstart for an example.

In [3]:
from google.colab import userdata
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

### Context windows

The models available through the Gemini API have context windows that are measured in tokens. These define how much input you can provide, and how much output the model can generate, and combined are referred to as the "context window". This information is available directly through [the API](https://ai.google.dev/api/rest/v1/models/get) and in the [models](https://ai.google.dev/models/gemini) documentation.

In this example you can see the `gemini-1.0-pro-latest` model has an input of 30k tokens and an output of 2k tokens, giving a total context window of 32k tokens.

In [30]:
model_info = genai.get_model('models/gemini-1.0-pro-latest')
(model_info.input_token_limit, model_info.output_token_limit)

(30720, 2048)

## Counting tokens

The API provides [an endpoint](https://ai.google.dev/api/rest/v1/models/embedContent) for counting the number of tokens in a request: [`GenerativeModel.count_tokens`](https://ai.google.dev/api/python/google/generativeai/GenerativeModel#count_tokens). You pass the same arguments as you would to the [`GenerativeModel.generate_content`](https://ai.google.dev/api/python/google/generativeai/GenerativeModel#generate_content) call and the service will return the number of tokens in that request.

### Text tokens

In [29]:
model = genai.GenerativeModel('models/gemini-1.0-pro-latest')

model.count_tokens("The quick brown fox jumps over the lazy dog.")

total_tokens: 10

### Multi-turn tokens

Multi-turn conversational (chat) objects work similarly.

In [32]:
chat = model.start_chat(history=['Hi my name is Bob',  'Hi Bob!'])

model.count_tokens(chat.history)

total_tokens: 8

To understand how big your next conversational turn will be, you will need to append it to the history when you call `count_tokens`.

In [52]:
from google.generativeai.types.content_types import to_contents

model.count_tokens(chat.history + to_contents('What is the meaning of life?'))

total_tokens: 15

### Multi-modal tokens

All input to the API is tokenized, including images or other non-text modalities.

In [39]:
from PIL import Image

!wget -q https://goo.gle/instrument-img -O instrument.jpg
organ = Image.open('instrument.jpg')

model.count_tokens(['Tell me about this instrument', organ])

total_tokens: 263

Internally, images are a fixed size, so they consume a fixed number of tokens.

In [44]:
!wget -q https://goo.gle/sketch-img -O sketch.jpg
sketch = Image.open('sketch.jpg')

print(organ.size)
print(model.count_tokens(organ))

print(sketch.size)
print(model.count_tokens(sketch))

(2048, 1362)
total_tokens: 258

(768, 1024)
total_tokens: 258



## Further reading

For more on token counting, check out these articles.

* [`countTokens`](https://ai.google.dev/api/rest/v1/models/countTokens) REST API reference,
* [`count_tokens`](https://ai.google.dev/api/python/google/generativeai/GenerativeModel#count_tokens) Python API reference,
* TensorFlow Text's [Sub-word tokenization](https://www.tensorflow.org/text/guide/subwords_tokenizer) tutorial.