##### Copyright 2024 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Context Caching Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Caching.ipynb"><img src="../images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook introduces context caching with the Gemini API and provides examples of interacting with the Apollo 11 transcript using the Python SDK. For a more comprehensive look, check out [the caching guide](https://ai.google.dev/gemini-api/docs/caching?lang=python).

### Install dependencies

In [1]:
!pip install -q -U "google-genai>=0.0.1"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [11]:
from google import genai
from google.genai import types

### Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [3]:
try:
    from google.colab import userdata
    
    GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
    client = genai.Client(api_key=GOOGLE_API_KEY)
except ImportError:
    client = genai.Client() # It will look for the key in the `GOOGLE_API_KEY` environment variable.

## Upload a file

A common pattern with the Gemini API is to ask a number of questions of the same document. Context caching is designed to assist with this case, and can be more efficient by avoiding the need to pass the same tokens through the model for each new request.

This example will be based on the transcript from the Apollo 11 mission.

Start by downloading that transcript.

In [8]:
!curl -O https://storage.googleapis.com/generativeai-downloads/data/a11.txt
!head a11.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  827k  100  827k    0     0  2925k      0 --:--:-- --:--:-- --:--:-- 2935k
INTRODUCTION

This is the transcription of the Technical Air-to-Ground Voice Transmission (GOSS NET 1) from the Apollo 11 mission.

Communicators in the text may be identified according to the following list.

Spacecraft:
CDR	Commander	Neil A. Armstrong
CMP	Command module pilot   	Michael Collins
LMP	Lunar module pilot	Edwin E. ALdrin, Jr.


Now upload the transcript using the [File API](../quickstarts/File_API.ipynb).

In [12]:
document = client.files.upload(path="a11.txt")

document = types.Part.from_uri(
    file_uri=document.uri, mime_type=document.mime_type
)  # TODO delete

## Cache the prompt

Next create a [`CachedContent`](https://ai.google.dev/api/python/google/generativeai/protos/CachedContent) object specifying the prompt you want to use, including the file and other fields you wish to cache. In this example the [`system_instruction`](../quickstarts/System_instructions.ipynb) has been set, and the document was provided in the prompt.

In [66]:
# Note that caching requires a frozen model, e.g. one with a `-001` version suffix.
model_name = "gemini-1.5-flash-001"

apollo_cache = client.caches.create(
    model=model_name,
    contents=[document],
    config = dict(system_instruction="You are an expert at analyzing transcripts."), # TODO: why config?
)

apollo_cache

CachedContent(name='cachedContents/47gl279ciatz', display_name='', model='models/gemini-1.5-flash-001', create_time='2024-11-15T17:23:05.348463Z', update_time='2024-11-15T17:23:05.348463Z', expire_time='2024-11-15T18:23:04.614234474Z', usage_metadata=CachedContentUsageMetadata(audio_duration_seconds=None, image_count=None, text_count=None, total_token_count=323384, video_duration_seconds=None))

## Manage the cache expiry

Once you have a `CachedContent` object, you can update the expiry time to keep it alive while you need it.

In [8]:
import datetime

apollo_cache.update(ttl=datetime.timedelta(hours=2))
apollo_cache

CachedContent(
    name='cachedContents/40k37vcojf2o',
    model='models/gemini-1.5-flash-001',
    display_name='',
    usage_metadata={
        'total_token_count': 323383,
    },
    create_time=2024-06-18 16:15:48.903792+00:00,
    update_time=2024-06-18 16:15:49.983070+00:00,
    expire_time=2024-06-18 18:15:49.822943+00:00
)

In [25]:
client.caches.update?

[0;31mSignature:[0m
[0mclient[0m[0;34m.[0m[0mcaches[0m[0;34m.[0m[0mupdate[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconfig[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mgoogle[0m[0;34m.[0m[0mgenai[0m[0;34m.[0m[0mtypes[0m[0;34m.[0m[0mUpdateCachedContentConfig[0m[0;34m,[0m [0mgoogle[0m[0;34m.[0m[0mgenai[0m[0;34m.[0m[0mtypes[0m[0;34m.[0m[0mUpdateCachedContentConfigDict[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mgoogle[0m[0;34m.[0m[0mgenai[0m[0;34m.[0m[0mtypes[0m[0;34m.[0m[0mCachedContent[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mFile:[0m      ~/Projects/venv3/lib/python3.12/site-packages/google/genai/caches.py
[0;31mType:[0m      method

In [33]:
client.caches.update(name = apollo_cache.name, config=dict(ttl='7200s'))

CachedContent(name='cachedContents/n2zesh6xzj2p', display_name='', model='models/gemini-1.5-flash-001', create_time='2024-11-15T16:52:19.923215Z', update_time='2024-11-15T17:05:07.543698Z', expire_time='2024-11-15T19:05:07.517968538Z', usage_metadata=CachedContentUsageMetadata(audio_duration_seconds=None, image_count=None, text_count=None, total_token_count=323384, video_duration_seconds=None))

In [34]:
# import datetime
# types.UpdateCachedContentConfig(ttl=datetime.timedelta(hours=2))  # TODO: this should work.

## Use the cache for generation

As the `CachedContent` object refers to a specific model and parameters, you must create a [`GenerativeModel`](https://ai.google.dev/api/python/google/generativeai/GenerativeModel) using [`from_cached_content`](https://ai.google.dev/api/python/google/generativeai/GenerativeModel#from_cached_content). Then, generate content as you would with a directly instantiated model object.

In [35]:
apollo_cache.model

'models/gemini-1.5-flash-001'

In [47]:
response = client.models.generate_content(
    model=apollo_cache.model,
    contents="Find a lighthearted moment from this transcript",
    config=types.GenerateContentConfig(cached_content=apollo_cache.name)
)
print(response.text)

A lighthearted moment occurs on page 11, right after the crew successfully fires their RCS thrusters:

**00 01 42 24 CDR**
*Have you seen all three axes fire?*

**00 01 42 31 CC**
*We've seen pitch and yaw; we've not seen roll to date.*

**00 01 42 36 CDR**
*Okay. I'll put in a couple more rolls.*

This exchange shows the crew's sense of humor, even in the midst of a complex mission. The "roll" comment is a playful reference to the roll axis of the spacecraft.  It demonstrates that even during a high-pressure situation, the astronauts could maintain a lighthearted attitude. 



You can inspect token usage through `usage_metadata`. Note that the cached prompt tokens are included in `prompt_token_count`, but excluded from the `total_token_count`.

In [48]:
response.usage_metadata

GenerateContentResponseUsageMetadata(cached_content_token_count=323384, candidates_token_count=169, prompt_token_count=323393, total_token_count=323562)

You can ask new questions of the model, and the cache is reused.

In [52]:
chat = client.chats.create(
    model=apollo_cache.model,
    config=types.GenerateContentConfig(cached_content=apollo_cache.name)
)

In [53]:
response = chat.send_message("Give me a quote from the most important part of the transcript.")
print(response.text)

The most important part of the transcript is the moment when the lunar module lands on the moon.  The quote is from Neil Armstrong, who says:

"Houston, Tranquility Base here. The Eagle has landed." 



In [54]:
response = chat.send_message("What was recounted after that?")
print(response.text)

After Neil Armstrong announced "Houston, Tranquility Base here. The Eagle has landed," the following events are recounted in the transcript:

* **Mission Control's response:**  The capsule communicator (CC) replies, "Roger, Tranquility. We copy you on the ground. You got a bunch of guys about to turn blue. We're breathing again. Thanks a lot." 
* **Armstrong's reply:** Armstrong responds, "Thank you."
* **Buzz Aldrin's actions:** Aldrin, the lunar module pilot, announces, "MASTER ARM, ON. Take care of the ... I'll get this..." He then says, "Very smooth touchdown."  
* **Aldrin venting the oxidizer:** Aldrin reports, "Okay. It looks like we're venting the oxidizer now."
* **Mission Control's instructions:**  The CC instructs, "Eagle, you are STAY for..." and then confirms, "Roger. And we see you venting the OX." 
* **Aldrin continues with the checklist:** Aldrin says, "...circuit breaker..."  
* **Aldrin requests information:** He says, "... copy NOUN 60, NOUN 43. Over."
* **Mission Co

In [55]:
response.usage_metadata

GenerateContentResponseUsageMetadata(cached_content_token_count=323384, candidates_token_count=374, prompt_token_count=323453, total_token_count=323827)

As you can see, among the 323455 tokens, 323383 were cached (and thus less expensive) and only 236 were from the prompt.

Since the cached tokens are cheaper than the normal ones, it means this prompt was 75% cheaper that if you had not used caching. Check the [pricing here](https://ai.google.dev/pricing) for the up-to-date discount on cached tokens.

## Delete the cache

The cache has a small recurring storage cost (cf. [pricing](https://ai.google.dev/pricing)) so by default it is only saved for an hour. In this case you even set it up for a shorter amont of time (using `"ttl"`) of 2h.

Still, if you don't need you cache anymore, it is good practice to delete it proactively.

In [56]:
print(apollo_cache.name)

cachedContents/n2zesh6xzj2p


In [59]:
client.caches.delete(name=apollo_cache.name)

DeleteCachedContentResponse()

## Next Steps
### Useful API references:

If you want to know more about the caching API, you can check the full [API specifications](https://ai.google.dev/api/rest/v1beta/cachedContents) and the [caching documentation](https://ai.google.dev/gemini-api/docs/caching).

### Continue your discovery of the Gemini API

Check the File API notebook to know more about that API. The [vision capabilities](../quickstarts/Video.ipynb) of the Gemini API are a good reason to use the File API and the caching. 
The Gemini API also has configurable [safety settings](../quickstarts/Safety.ipynb) that you might have to customize when dealing with big files.
