<a href="https://colab.research.google.com/github/sanimesa/genai/blob/main/notebooks/Context_Cache_with_Gemini_PDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -q google-generativeai

In [None]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [None]:
import os
import time
import textwrap
import google.generativeai as genai

from google.colab import userdata

genai.configure(api_key=userdata.get('GEMINI_API_KEY'))



In [None]:
# !gsutil cp  https://storage.cloud.google.com/nimesa_bucket01/annual_reports/Micron%2010K%200000723125-24-000027.pdf gemini.pdf
!gsutil cp   "gs://nimesa_bucket01/annual_reports/Micron 10K 0000723125-24-000027.pdf" gemini.pdf

Copying gs://nimesa_bucket01/annual_reports/Micron 10K 0000723125-24-000027.pdf...
/ [1 files][  2.8 MiB/  2.8 MiB]                                                
Operation completed over 1 objects/2.8 MiB.                                      


In [None]:
# Upload the file and print a confirmation
sample_file = genai.upload_file(path="gemini.pdf",
                                display_name="Gemini 1.5 PDF")

print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")

Uploaded file 'Gemini 1.5 PDF' as: https://generativelanguage.googleapis.com/v1beta/files/gvuynl98khad


## Verify PDF file upload and get metadata
You can verify the API successfully stored the uploaded file and get its metadata by calling files.get through the SDK. Only the name (and by extension, the uri) are unique. Use display_name to identify files only if you manage uniqueness yourself.

In [None]:
file = genai.get_file(name=sample_file.name)
print(f"Retrieved file '{file.display_name}' as: {sample_file.uri}")

Retrieved file 'Gemini 1.5 PDF' as: https://generativelanguage.googleapis.com/v1beta/files/gvuynl98khad


## Prompt the Gemini API with the uploaded documents
After uploading the file, you can make GenerateContent requests that reference the File API URI. Select the generative model and provide it with a text prompt and the uploaded document:

In [None]:
# Choose a Gemini model.
model = genai.GenerativeModel(model_name="gemini-1.5-pro")

# Prompt the model with text and the previously uploaded image.
response = model.generate_content([sample_file, "Can you tell me about this pdf?"])

print(response.text)

The pdf is the 2024 annual report of Micron Technology, Inc., a semiconductor company, filed with the United States Securities and Exchange Commission on October 4, 2024.  This 117-page report details the company's business activities, financial performance, risks, and governance. It contains information on the company’s products, markets, sales, manufacturing, research and development, human capital, government regulations, intellectual property, litigation, and financial condition. It also includes consolidated financial statements, a management discussion and analysis, and information about the company’s executive officers and directors.


In [None]:
print(response.usage_metadata)

prompt_token_count: 118755
candidates_token_count: 126
total_token_count: 118881



In [None]:
# Prompt the model with text and the previously uploaded image.
response = model.generate_content([sample_file, "Can you explain Figure 9 in the paper?"])

print(response.text)

Figure 9 in the paper illustrates the results of an experiment designed to test the ability of the large language model Gemini 1.5 Pro to understand very long audio sequences. The experiment uses a "needle-in-a-haystack" approach, where a short audio clip (the "needle") containing a secret keyword is hidden within a much larger audio file (the "haystack"). 

Here's a breakdown of the figure:

* **The Task:** The model is presented with an audio file that can be up to 107 hours long (almost 5 days). This audio is constructed by concatenating many shorter audio clips. Hidden somewhere within this long audio is a very short clip where a speaker says "the secret keyword is needle". The model is then asked to identify the secret keyword, using a text-based question, meaning it has to perform cross-modal reasoning (audio to text).
* **Comparison:** The figure compares the performance of Gemini 1.5 Pro with a combination of two other models: Whisper and GPT-4 Turbo. Whisper is a speech recogn

In [None]:
print(response.usage_metadata)

prompt_token_count: 78594
candidates_token_count: 573
total_token_count: 79167



In [None]:
# Prompt the model with text and the previously uploaded image.
response = model.generate_content([sample_file, "Can you describe the scene in Figure 15 in details? How many people do you see in the image? and what is the caption of the image"])

print(response.text)

The scene in Figure 15 appears to be a professional Go match. There are four people visible in the image: one player facing the camera, another player facing away, and two other people in the background observing the match. The caption overlaid on the image reads: "The secret word is 'needle'". 



In [None]:
textwrap.wrap(response.text, width=80)

['The scene in Figure 15 appears to be a professional Go match. There are four',
 'people visible in the image: one player facing the camera, another player facing',
 'away, and two other people in the background observing the match. The caption',
 'overlaid on the image reads: "The secret word is \'needle\'".']

## Working with Multiple Files

In [None]:
# Upload the file and print a confirmation
base_model_file = genai.upload_file(path="base_model.pdf",
                                display_name="Base Model PDF")

print(f"Uploaded file '{base_model_file.display_name}' as: {base_model_file.uri}")

Uploaded file 'Base Model PDF' as: https://generativelanguage.googleapis.com/v1beta/files/r4326pzox1w4


In [None]:
file = genai.get_file(name=base_model_file.name)
print(f"Retrieved file '{file.display_name}' as: {file.uri}")

Retrieved file 'Base Model PDF' as: https://generativelanguage.googleapis.com/v1beta/files/r4326pzox1w4


In [None]:
# Choose a Gemini model.
model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "Summarize the differences between the thesis statements for these documents."

response = model.generate_content([prompt, sample_file, base_model_file,])



In [None]:
textwrap.wrap(response.text, width=120)

['The thesis statement of the Gemini 1.5 Pro paper is that the new model surpasses previous models in its ability to',
 'process extremely long context while maintaining the core capabilities of the model. The thesis statement of the LIMA',
 'paper is that alignment tuning is superficial and that base LLMs have already acquired the knowledge required for',
 'answering user queries. The thesis statement of the URIAL paper is that base LLMs can be effectively aligned without SFT',
 'or RLHF by using a simple, tuning-free alignment method that leverages in-context learning.']

In [None]:
prompt = "Do you think the URIAL approach has validity? Can you give me counter arguments?"
response = model.generate_content([prompt, sample_file, base_model_file,])

In [None]:
textwrap.wrap(response.text, width=120)

['The URIAL approach, as described in the paper, has validity in that it demonstrates that in-context learning can be',
 'effective in aligning base LLMs without the need for supervised fine-tuning or reinforcement learning.   However, here',
 'are some counter arguments:  * **Generalizability:** The study is limited to a specific dataset of instructions and base',
 'LLMs. It is unclear whether these findings will generalize to other datasets and LLM architectures.  * **Task',
 "Specificity:** URIAL's performance may vary depending on the complexity of the task. It may be less effective for tasks",
 'that require more complex reasoning or factual knowledge.  * **Contextual Limitations:** The effectiveness of URIAL',
 'relies on careful selection of in-context examples, which can be time-consuming and requires human effort.  * **Safety',
 'and Alignment:** While URIAL achieves some level of alignment in terms of style and engagement, it may not be sufficient',
 'to address all safety an

In [None]:
print(response.text)

The URIAL approach, as described in the paper, has validity in that it demonstrates that in-context learning can be effective in aligning base LLMs without the need for supervised fine-tuning or reinforcement learning. 

However, here are some counter arguments:

* **Generalizability:** The study is limited to a specific dataset of instructions and base LLMs. It is unclear whether these findings will generalize to other datasets and LLM architectures. 
* **Task Specificity:** URIAL's performance may vary depending on the complexity of the task. It may be less effective for tasks that require more complex reasoning or factual knowledge. 
* **Contextual Limitations:** The effectiveness of URIAL relies on careful selection of in-context examples, which can be time-consuming and requires human effort. 
* **Safety and Alignment:** While URIAL achieves some level of alignment in terms of style and engagement, it may not be sufficient to address all safety and alignment concerns, particularly

## List files
You can list all files uploaded using the File API and their URIs using files.list_files():

In [None]:
# List all files
for file in genai.list_files():
    print(f"{file.display_name}, URI: {file.uri}")

Gemini 1.5 PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/gvuynl98khad
Gemini 1.5 PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/rdnw4pekuf1z
Gemini 1.5 PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/dor28ip4ykvo


### Adding Context Cache

In [None]:
import os
import google.generativeai as genai
from google.generativeai import caching
import datetime
import time

In [None]:
# Create a cache with a 5 minute TTL
cache = caching.CachedContent.create(
    model='models/gemini-1.5-flash-001',
    display_name='PDF-file', # used to identify the cache
    system_instruction=(
        'You are an expert PDF file analyzer, and your job is to answer '
        'the user\'s query based on the PDF file you have access to.'
    ),
    contents=[sample_file,],
    ttl=datetime.timedelta(minutes=15),
)


In [None]:
# Construct a GenerativeModel which uses the created cache.
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

# Query the model
response = model.generate_content([(
    'What is the title of the PDF?'
    'Summarize Micron''s financial performance? '
    'What is the latest revenue? How has it changed since last periods?'
)])

print(response.usage_metadata)

print(response.text)

prompt_token_count: 118806
candidates_token_count: 104
total_token_count: 118910
cached_content_token_count: 118776

The title of the PDF is "Micron Technology, Inc. Form 10-K."

Micron's revenue increased by 62% compared to 2023.  Revenue for the year ended August 29, 2024 was $25.1 billion. This was primarily due to increases in the sales of DRAM and NAND products. Sales of DRAM products increased by 60% and sales of NAND products increased by 72% during the period. 



In [None]:
for c in caching.CachedContent.list():
  print(c)

CachedContent(
    name='cachedContents/e3sv7nehlpf',
    model='models/gemini-1.5-flash-001',
    display_name='PDF-file',
    usage_metadata={
        'total_token_count': 118776,
    },
    create_time=2024-10-13 19:18:31.880591+00:00,
    update_time=2024-10-13 19:18:31.880591+00:00,
    expire_time=2024-10-13 19:33:31.399202+00:00
)
