In [None]:
!pip install -q google-generativeai

In [None]:
import os
import time
import textwrap
import google.generativeai as genai

from google.colab import userdata

genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))



In [None]:
!curl -o gemini.pdf https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7059k  100 7059k    0     0  11.1M      0 --:--:-- --:--:-- --:--:-- 11.1M


In [None]:
!curl -o base_model.pdf https://arxiv.org/pdf/2312.01552

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3656k  100 3656k    0     0  6819k      0 --:--:-- --:--:-- --:--:-- 6833k


In [None]:
# Upload the file and print a confirmation
sample_file = genai.upload_file(path="gemini.pdf",
                                display_name="Gemini 1.5 PDF")

print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")

Uploaded file 'Gemini 1.5 PDF' as: https://generativelanguage.googleapis.com/v1beta/files/bvdtt8stukk5


## Verify PDF file upload and get metadata
You can verify the API successfully stored the uploaded file and get its metadata by calling files.get through the SDK. Only the name (and by extension, the uri) are unique. Use display_name to identify files only if you manage uniqueness yourself.

In [None]:
file = genai.get_file(name=sample_file.name)
print(f"Retrieved file '{file.display_name}' as: {sample_file.uri}")

Retrieved file 'Gemini 1.5 PDF' as: https://generativelanguage.googleapis.com/v1beta/files/bvdtt8stukk5


## Prompt the Gemini API with the uploaded documents
After uploading the file, you can make GenerateContent requests that reference the File API URI. Select the generative model and provide it with a text prompt and the uploaded document:

In [None]:
# Choose a Gemini model.
model = genai.GenerativeModel(model_name="gemini-1.5-pro")

# Prompt the model with text and the previously uploaded image.
response = model.generate_content([sample_file, "Can you summarize this document as a bulleted list?"])

print(response.text)

Here is a summary of the document in a bulleted list:

* **Introduction:** The document introduces Gemini 1.5 Pro, a new multimodal language model from Google DeepMind. It's the first of its kind capable of recalling and reasoning over 10 million tokens of context, including text, video, and audio. 
* **Key advancements:**
    * **Novel mixture-of-experts architecture:** Improves efficiency, reasoning, and long-context performance.
    * **Unprecedented context length:** Processes entire documents, hours of video, and days of audio.
    * **Multimodal capabilities:**  Processes and retrieves information across various modalities.
* **Long-context evaluation:**
    * **Qualitative examples:** Demonstrates capabilities like understanding long codebases, learning a new language from limited context, and answering questions about a full-length movie.
    * **Quantitative evaluations:**
        * **Diagnostic tests:** Shows near-perfect recall in needle-in-a-haystack tasks for text, video, 

In [None]:
print(response.usage_metadata)

prompt_token_count: 78595
candidates_token_count: 520
total_token_count: 79115



In [None]:
# Prompt the model with text and the previously uploaded image.
response = model.generate_content([sample_file, "Can you explain Figure 9 in the paper?"])

print(response.text)

Figure 9 in the paper illustrates the results of an experiment designed to test the ability of the large language model Gemini 1.5 Pro to understand very long audio sequences. The experiment uses a "needle-in-a-haystack" approach, where a short audio clip (the "needle") containing a secret keyword is hidden within a much larger audio file (the "haystack"). 

Here's a breakdown of the figure:

* **The Task:** The model is presented with an audio file that can be up to 107 hours long (almost 5 days). This audio is constructed by concatenating many shorter audio clips. Hidden somewhere within this long audio is a very short clip where a speaker says "the secret keyword is needle". The model is then asked to identify the secret keyword, using a text-based question, meaning it has to perform cross-modal reasoning (audio to text).
* **Comparison:** The figure compares the performance of Gemini 1.5 Pro with a combination of two other models: Whisper and GPT-4 Turbo. Whisper is a speech recogn

In [None]:
print(response.usage_metadata)

prompt_token_count: 78594
candidates_token_count: 573
total_token_count: 79167



In [None]:
# Prompt the model with text and the previously uploaded image.
response = model.generate_content([sample_file, "Can you describe the scene in Figure 15 in details? How many people do you see in the image? and what is the caption of the image"])

print(response.text)

The scene in Figure 15 appears to be a professional Go match. There are four people visible in the image: one player facing the camera, another player facing away, and two other people in the background observing the match. The caption overlaid on the image reads: "The secret word is 'needle'". 



In [None]:
textwrap.wrap(response.text, width=80)

['The scene in Figure 15 appears to be a professional Go match. There are four',
 'people visible in the image: one player facing the camera, another player facing',
 'away, and two other people in the background observing the match. The caption',
 'overlaid on the image reads: "The secret word is \'needle\'".']

## Working with Multiple Files

In [None]:
# Upload the file and print a confirmation
base_model_file = genai.upload_file(path="base_model.pdf",
                                display_name="Base Model PDF")

print(f"Uploaded file '{base_model_file.display_name}' as: {base_model_file.uri}")

Uploaded file 'Base Model PDF' as: https://generativelanguage.googleapis.com/v1beta/files/r4326pzox1w4


In [None]:
file = genai.get_file(name=base_model_file.name)
print(f"Retrieved file '{file.display_name}' as: {file.uri}")

Retrieved file 'Base Model PDF' as: https://generativelanguage.googleapis.com/v1beta/files/r4326pzox1w4


In [None]:
# Choose a Gemini model.
model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "Summarize the differences between the thesis statements for these documents."

response = model.generate_content([prompt, sample_file, base_model_file,])



In [None]:
textwrap.wrap(response.text, width=120)

['The thesis statement of the Gemini 1.5 Pro paper is that the new model surpasses previous models in its ability to',
 'process extremely long context while maintaining the core capabilities of the model. The thesis statement of the LIMA',
 'paper is that alignment tuning is superficial and that base LLMs have already acquired the knowledge required for',
 'answering user queries. The thesis statement of the URIAL paper is that base LLMs can be effectively aligned without SFT',
 'or RLHF by using a simple, tuning-free alignment method that leverages in-context learning.']

In [None]:
prompt = "Do you think the URIAL approach has validity? Can you give me counter arguments?"
response = model.generate_content([prompt, sample_file, base_model_file,])

In [None]:
textwrap.wrap(response.text, width=120)

['The URIAL approach, as described in the paper, has validity in that it demonstrates that in-context learning can be',
 'effective in aligning base LLMs without the need for supervised fine-tuning or reinforcement learning.   However, here',
 'are some counter arguments:  * **Generalizability:** The study is limited to a specific dataset of instructions and base',
 'LLMs. It is unclear whether these findings will generalize to other datasets and LLM architectures.  * **Task',
 "Specificity:** URIAL's performance may vary depending on the complexity of the task. It may be less effective for tasks",
 'that require more complex reasoning or factual knowledge.  * **Contextual Limitations:** The effectiveness of URIAL',
 'relies on careful selection of in-context examples, which can be time-consuming and requires human effort.  * **Safety',
 'and Alignment:** While URIAL achieves some level of alignment in terms of style and engagement, it may not be sufficient',
 'to address all safety an

In [None]:
print(response.text)

The URIAL approach, as described in the paper, has validity in that it demonstrates that in-context learning can be effective in aligning base LLMs without the need for supervised fine-tuning or reinforcement learning. 

However, here are some counter arguments:

* **Generalizability:** The study is limited to a specific dataset of instructions and base LLMs. It is unclear whether these findings will generalize to other datasets and LLM architectures. 
* **Task Specificity:** URIAL's performance may vary depending on the complexity of the task. It may be less effective for tasks that require more complex reasoning or factual knowledge. 
* **Contextual Limitations:** The effectiveness of URIAL relies on careful selection of in-context examples, which can be time-consuming and requires human effort. 
* **Safety and Alignment:** While URIAL achieves some level of alignment in terms of style and engagement, it may not be sufficient to address all safety and alignment concerns, particularly

## List files
You can list all files uploaded using the File API and their URIs using files.list_files():

In [None]:
# List all files
for file in genai.list_files():
    print(f"{file.display_name}, URI: {file.uri}")

Base Model PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/r4326pzox1w4
Gemini 1.5 PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/bvdtt8stukk5
Base Model PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/fpygzv9hp2c2
Base Model PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/5j2rhoqxz7th
Base Model PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/xsn3l96pmhoq
Gemini 1.5 PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/bdk8holpqvfz
Base Model PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/20hkjq8z6ita
Gemini 1.5 PDF, URI: https://generativelanguage.googleapis.com/v1beta/files/38k5wg9heefa


### Adding Context Cache

In [None]:
import os
import google.generativeai as genai
from google.generativeai import caching
import datetime
import time

In [None]:
# Create a cache with a 5 minute TTL
cache = caching.CachedContent.create(
    model='models/gemini-1.5-flash-001',
    display_name='PDF-file', # used to identify the cache
    system_instruction=(
        'You are an expert PDF file analyzer, and your job is to answer '
        'the user\'s query based on the PDF file you have access to.'
    ),
    contents=[sample_file,],
    ttl=datetime.timedelta(minutes=15),
)


In [None]:
# Construct a GenerativeModel which uses the created cache.
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

# Query the model
response = model.generate_content([(
    'What is the title of the paper?'
    'Who are the authors? '
    'What are the major contributions of the paper accordig to the authors?'
)])

print(response.usage_metadata)

print(response.text)

prompt_token_count: 77914
candidates_token_count: 278
total_token_count: 78192
cached_content_token_count: 77886

The title of the paper is "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context".

The authors are the Gemini Team at Google. 

The major contributions of the paper are:
- The authors introduce Gemini 1.5 Pro, a new, highly compute-efficient multimodal mixture-of-experts model that can recall and reason over fine-grained information from millions of tokens of context. 
- They show that Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks.
- They show that Gemini 1.5 Pro is able to handle extremely long contexts; it has the ability to recall and reason over fine-grained information from up to at least 10M tokens. 

In [None]:
# Construct a GenerativeModel which uses the created cache.
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

# Query the model
response = model.generate_content([(
    'What is the title of the paper?'
    'Who are the authors? provide a list '
    'What are the major contributions of the paper accordig to the authors?'
)])

print(response.usage_metadata)

print(response.text)

prompt_token_count: 77917
candidates_token_count: 310
total_token_count: 78227
cached_content_token_count: 77886

The title of the paper is "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context". The authors of the paper are the "Gemini Team, Google". 

The major contributions of the paper, according to the authors, are:

* **Gemini 1.5 Pro, a new multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context**, including multiple long documents and hours of video and audio.
* **Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR**, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks.
* **Gemini 1.5 Pro can handle extremely long contexts**, up to at least 10M tokens, which is unprecedented among contempo

In [None]:
for c in caching.CachedContent.list():
  print(c)

CachedContent(
    name='cachedContents/pbfam6rwcdl5',
    model='models/gemini-1.5-flash-001',
    display_name='PDF-file',
    usage_metadata={
        'total_token_count': 77886,
    },
    create_time=2024-08-21 02:42:15.266216+00:00,
    update_time=2024-08-21 02:42:15.266216+00:00,
    expire_time=2024-08-21 02:57:13.884981+00:00
)
CachedContent(
    name='cachedContents/l01zlu22q67z',
    model='models/gemini-1.5-flash-001',
    display_name='PDF-file',
    usage_metadata={
        'total_token_count': 78613,
    },
    create_time=2024-08-21 02:37:48.009988+00:00,
    update_time=2024-08-21 02:37:48.009988+00:00,
    expire_time=2024-08-21 02:52:47.005240+00:00
)
CachedContent(
    name='cachedContents/bkn36tvko8ws',
    model='models/gemini-1.5-flash-001',
    display_name='PDF-file',
    usage_metadata={
        'total_token_count': 112395,
    },
    create_time=2024-08-21 02:35:04.336944+00:00,
    update_time=2024-08-21 02:35:04.336944+00:00,
    expire_time=2024-08-21 02: