# Gemini 1.5 Pro Python code samples



| | |
|-|-|
|Author(s) | [Eric Dong](https://github.com/gericdong)|

## Setting up environment

#### Install Vertex AI SDK for Python (Dev build)


In [None]:
! pip3 install --upgrade --user --quiet google-cloud-aiplatform

# ! pip3 install --force-reinstall --quiet git+https://github.com/googleapis/python-aiplatform.git@main

#### Restart runtime (Colab only)

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

#### Authenticate your notebook environment (Colab only)

In [None]:
from google.colab import auth

auth.authenticate_user()

#### Set Google Cloud project information and initialize Vertex AI

In [None]:
PROJECT_ID = "vertex-training-356201"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

project_id = PROJECT_ID

## Code samples

- Make best effort to make samples copy-paste-runnable
- Set default parameter values when possible, only `project_id` is needed from users
- Model output is non-deterministic, don't hard-code generated text in tests
- Use `generativeaionvertexai` as a product prefix (as opposed to `aiplatform`)
- Put region tags and all code inside a function, do not need to expose the function to users
```
def funct() -> str:
  [START generativeaionvertexai-region-tag]
  import vertexai
  ...

  [END generativeaionvertexai-region-tag]
  return str
```

## 1. Audio understanding

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_audio.py

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def summarize_audio(project_id: str) -> str:
  # [START generativeaionvertexai_gemini_audio_summarization]

  import vertexai
  from vertexai.generative_models import GenerativeModel, Part

  # TODO(developer): Update and un-comment below lines
  # project_id = "PROJECT_ID"

  vertexai.init(project=project_id, location="us-central1")

  model = GenerativeModel("gemini-1.5-pro-preview-0409")

  prompt = """
    Please provide a summary for the audio.
    Provide chapter titles with timestamps, be concise and short, no need to provide chapter summaries.
    Do not make up any information that is not part of the audio and do not be verbose.
  """

  audio_file_uri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3"
  audio_file = Part.from_uri(audio_file_uri, mime_type="audio/mpeg")

  contents = [audio_file, prompt]

  response = model.generate_content(contents)
  print(response.text)

  # [END generativeaionvertexai_gemini_audio_summarization]
  return response.text


def transcript_audio(project_id: str) -> str:
  # [START generativeaionvertexai_gemini_audio_transcription]

  import vertexai
  from vertexai.generative_models import GenerativeModel, Part

  # TODO(developer): Update and un-comment below lines
  # project_id = "PROJECT_ID"

  vertexai.init(project=project_id, location="us-central1")

  model = GenerativeModel("gemini-1.5-pro-preview-0409")

  prompt = """
    Can you transcribe this interview, in the format of timecode, speaker, caption?
    Use speaker A, speaker B, etc. to identify speakers.
  """

  audio_file_uri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3"
  audio_file = Part.from_uri(audio_file_uri, mime_type="audio/mpeg")

  contents = [audio_file, prompt]

  response = model.generate_content(contents)
  print(response.text)

  # [END generativeaionvertexai_gemini_audio_transcription]
  return response.text

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/test_gemini_examples.py

# import gemini_audio

def test_summarize_audio() -> None:
    text = summarize_audio(PROJECT_ID)
    assert len(text) > 0

def test_transcript_audio() -> None:
    text = transcript_audio(PROJECT_ID)
    assert len(text) > 0


test_summarize_audio()
test_transcript_audio()

This audio is about the Pixel Feature Drops. 

Chapter Titles & Timestamps:

*  Introduction (00:00)
*  Importance of Feature Drops (01:49)
*  January Feature Drop Highlights (02:48)
*  March Feature Drop - Pixel Watch (03:41)
*  March Feature Drop - Pixel Phone (05:58)
*  March Feature Drop - Other Devices (07:41) 
*  Pixel Superfans Question (08:12)
*  Favorite Feature Drop Features (08:23)
*  Feature Drop Release Date (08:07)
*  Outro (10:17) 

## Interview Transcription

**00:00** | **Speaker A** | your devices are getting better over time and so we think about it across the entire portfolio from phones to watch to buds to tablet we get really excited about how we can tell a joint narrative across everything.

**00:14** | **Speaker B** | Welcome to the made by Google podcast where we meet the people who work on the Google products you love. Here's your host Rasheed Finch. Today we're talking to Aisha Sharif and the Carlos Love, they're both product managers for various pixel device

## 2. Video with audio

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_video_audio.py

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def analyze_video_with_audio(project_id: str) -> str:
  # [START generativeaionvertexai_gemini_video_with_audio]

  import vertexai
  from vertexai.generative_models import GenerativeModel, Part

  # TODO(developer): Update and un-comment below lines
  # project_id = "PROJECT_ID"

  vertexai.init(project=project_id, location="us-central1")

  model = GenerativeModel("gemini-1.5-pro-preview-0409")

  prompt = """
    Provide a description of the video.
    The description should also contain anything important which people say in the video.
  """

  video_file_uri = "gs://cloud-samples-data/generative-ai/video/pixel8.mp4"
  video_file = Part.from_uri(video_file_uri, mime_type="video/mp4")

  contents = [video_file, prompt]

  response = model.generate_content(contents)
  print(response.text)

  # [END generativeaionvertexai_gemini_video_with_audio]
  return response.text

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/test_gemini_examples.py

# import gemini_video_audio

def test_analyze_video_with_audio() -> None:
    text = analyze_video_with_audio(PROJECT_ID)
    assert len(text) > 0

test_analyze_video_with_audio()

The video is a night tour of Tokyo with a photographer named Saeka Shimada. She shows us different parts of the city and how they look different at night. She also demonstrates the new Pixel phone's "Video Boost" feature, which improves video quality in low-light conditions. The video ends with Saeka in Shibuya, a popular district in Tokyo known for its nightlife.


## 3. All modalities

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_all_modalities.py

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def analyze_all_modalities(project_id: str) -> str:
  # [START generativeaionvertexai_gemini_all_modalities]

  import vertexai
  from vertexai.generative_models import GenerativeModel, Part

  # TODO(developer): Update and un-comment below lines
  # project_id = "PROJECT_ID"

  vertexai.init(project=project_id, location="us-central1")

  model = GenerativeModel("gemini-1.5-pro-preview-0409")

  video_file_uri = "gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4"
  video_file = Part.from_uri(video_file_uri, mime_type="video/mp4")

  image_file_uri = "gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png"
  image_file = Part.from_uri(image_file_uri, mime_type="image/png")

  prompt = """
    Watch each frame in the video carefully and answer the questions.
    Only base your answers strictly on what information is available in the video attached.
    Do not make up any information that is not part of the video and do not be too
    verbose, be to the point.

    Questions:
    - When is the moment in the image happening in the video? Provide a timestamp.
    - What is the context of the moment and what does the narrator say about it?
  """

  contents = [
      video_file,
      image_file,
      prompt,
  ]

  response = model.generate_content(contents)
  print(response.text)

  # [END generativeaionvertexai_gemini_all_modalities]
  return response.text

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/test_gemini_examples.py

# import gemini_all_modalities

def test_analyze_all_modalities() -> None:
    text = analyze_all_modalities(PROJECT_ID)
    assert len(text) > 0

test_analyze_all_modalities()

- The image appears at the 0:49 mark. 
-  It's part of a montage of blurred video clips depicting significant moments in the relationship between the blind man and his girlfriend. The narrator says the AI feature on the Pixel phone allows the blind man to capture these moments.


## 4. System instruction

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_system_instruction.py

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def set_system_instruction(project_id: str) -> str:
  # [START generativeaionvertexai_gemini_system_instruction]

  import vertexai
  from vertexai.generative_models import GenerativeModel

  # TODO(developer): Update and un-comment below lines
  # project_id = "PROJECT_ID"

  vertexai.init(project=project_id, location="us-central1")

  model = GenerativeModel(
      "gemini-1.5-pro-preview-0409",
      system_instruction=[
          "You are a helpful language translator.",
          "Your mission is to translate text in English to French.",
      ],
  )

  prompt = """
    User input: I like bagels.
    Answer:
  """

  contents = [prompt]

  response = model.generate_content(contents)
  print(response.text)

  # [END generativeaionvertexai_gemini_system_instruction]
  return response.text

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/test_gemini_examples.py

# import gemini_system_instruction

def test_set_system_instruction() -> None:
    text = set_system_instruction(PROJECT_ID)
    assert len(text) > 0

test_set_system_instruction()

J'aime les bagels. 



## 5. PDF

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_pdf_example.py

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def analyze_pdf(project_id: str) -> str:
    # [START generativeaionvertexai_gemini_pdf]

    import vertexai
    from vertexai.generative_models import GenerativeModel, Part

    # TODO(developer): Update and un-comment below lines
    # project_id = "PROJECT_ID"

    vertexai.init(project=project_id, location="us-central1")

    model = GenerativeModel("gemini-1.5-pro-preview-0409")

    prompt = """
    You are a very professional document summarization specialist.
    Please summarize the given document.
    """

    pdf_file_uri = "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
    pdf_file = Part.from_uri(pdf_file_uri, mime_type="application/pdf")
    contents = [pdf_file, prompt]

    response = model.generate_content(contents)
    print(response.text)

    # [END generativeaionvertexai_gemini_pdf]
    return response.text

In [None]:
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/test_gemini_examples.py

# import gemini_pdf_example

def test_gemini_pdf_example() -> None:
    text = analyze_pdf(PROJECT_ID)
    assert len(text) > 0

test_gemini_pdf_example()

## Gemini 1.5 Pro: Summary of Capabilities

**Gemini 1.5 Pro** is a new, highly efficient multimodal AI model from Google DeepMind. It excels at understanding and reasoning over vast amounts of information across various formats, including text, video, audio, and code. 

Here's a breakdown of its key features:

**Long-Context Understanding:**

* **Unprecedented Scale:**  Gemini 1.5 Pro handles context lengths of **up to 10 million tokens**, exceeding existing models like Claude 2.1 (200k) and GPT-4 Turbo (128k) by a significant margin. This allows it to process large documents, multiple hours of video, and almost five days of audio recordings.
* **Near-Perfect Recall:** It achieves near-perfect recall (>99%) on retrieval tasks across modalities, even when information is buried within millions of tokens.
* **Reasoning and Understanding:** Beyond simple recall, Gemini 1.5 Pro demonstrates the ability to reason and understand relationships within long and complex contexts, enabling it to 

# Tryout

In [None]:
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, Part, FinishReason
import vertexai.preview.generative_models as generative_models

def generate():
  vertexai.init(project="cloud-llm-preview1", location="us-central1")
  model = GenerativeModel("gemini-1.5-pro-preview-0409")
  responses = model.generate_content(
      [audio1, """Summarize"""],
      generation_config=generation_config,
      safety_settings=safety_settings,
      stream=True,
  )

  for response in responses:
    print(response.text, end="")

audio_file_uri = "pixel.mp3"
with open(audio_file_uri, 'rb') as file:
        binary_data = file.read()
        base64_encoded_data = base64.b64encode(binary_data)
        encoded_string = base64_encoded_data.decode('utf-8')


audio1 = Part.from_data(
    mime_type="audio/mpeg",
    data=encoded_string)

generation_config = {
    "max_output_tokens": 8192,
    "temperature": 1,
    "top_p": 0.95,
}

safety_settings = {
    generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
}

generate()



The March Pixel Feature Drop brings new features and enhancements to Pixel devices, including:

**For Pixel Watch:**

* **Pixel Watch 1 gets Pixel Watch 2 features:**  This includes heart rate zone training, pace coaching, automatic workout tracking, and the Fitbit Sleep Profile app.
* **Improved Bluetooth connectivity:** Makes it easier to connect previously paired Pixel Buds to your Pixel Watch.

**For Pixel Phones:**

* **Circle to Search expands to Pixel 7 and 7 Pro:** Allows users to search for anything on their phone from any app or screen.
* **10-bit HDR comes to Instagram:** Enables users to take high-quality videos with a wider range of colors and contrast.
* **Partial screen sharing:** Allows users to share only a specific app during video calls, enhancing privacy.
* **Direct my Call for non-toll-free numbers:** The call screening feature now works with more phone numbers, providing greater convenience and time savings.
* **Clear Calling:** Improves call quality by reducing b