In [21]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Getting Started with the Gemini API in Vertex AI with cURL

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fgetting-started%2Fintro_gemini_curl.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>       
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/getting-started/intro_gemini_curl.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/53/X_logo_2023_original.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_curl.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            


| | |
|-|-|
|Author(s) | [Eric Dong](https://github.com/gericdong), [Polong Lin](https://github.com/polong-lin) |

## Overview

### Gemini
Gemini is a family of generative AI models developed by Google DeepMind. Gemini models support prompts that include text, image, and video as input and support text responses as output.

### Gemini API in Vertex AI

The Gemini API in Vertex AI provides a unified interface for interacting with Gemini models. You can interact with the Gemini API by using the following methods:

* Use [Vertex AI Studio](https://cloud.google.com/generative-ai-studio) for quick testing and command generation.
* Use cURL commands in Cloud Shell.
* Use the Vertex AI SDK for Python in a Jupyter notebook

This notebook focuses on using the **cURL commands** to call the Gemini API in Vertex AI.

For more information, see the [Generative AI on Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) documentation.

### Objectives

In this tutorial, you learn how to use the Gemini API in Vertex AI with cURL commands to interact with the Gemini 2.0 Flash (`gemini-2.0-flash-001`) model.

You will complete the following tasks:

- Install the Python SDK.
- Use the Gemini API in Vertex AI to interact with each model.
  - Gemini 2.0 Flash (`gemini-2.0-flash-001`) model:
    - Generate text from text prompts.
    - Explore various features and configuration options.
    - Generate text from image(s) and text prompts.
    - Generate text from video.
  

### Costs
This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment.

This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [22]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [23]:
PROJECT_ID = "qwiklabs-gcp-00-9d421d4315f8"  # @param {type:"string"}
LOCATION = "us-east1"  # @param {type:"string"}

### Defining environment variables for cURL commands

These environment variables are used to construct the cURL commands.

In [24]:
import os

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["LOCATION"] = LOCATION
os.environ["API_ENDPOINT"] = f"{LOCATION}-aiplatform.googleapis.com"

## Use the Gemini 2.0 Flash model

In [25]:
os.environ["MODEL_ID"] = "gemini-2.0-flash-001"

### Generate content

The generateContent method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. In this example, you send a text prompt using the `generateContent` method.

In [26]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2803    0  2702  100   101   1139     42  0:00:02  0:00:02 --:--:--  1182


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a breakdown of why it happens:\n\n1.  **Sunlight and Colors:** Sunlight is actually made up of all the colors of the rainbow. These colors have different wavelengths.\n\n2.  **Entering the Atmosphere:** When sunlight enters the Earth's atmosphere, it collides with tiny air molecules (mostly nitrogen and oxygen).\n\n3.  **Scattering of Light:** This collision causes the sunlight to scatter in different directions.  Shorter wavelengths of light (like blue and violet) are scattered more strongly than longer wavelengths (like red and orange).\n\n4.  **Why Blue, Not Violet?:** Violet light is scattered even more than blue light. However, several factors contribute to why we see a predominantly blue sky:\n\n    *   The sun emits less violet light than blue light.\n    *   The upper atmosphere 

### Streaming

The Gemini API provides a streaming response mechanism. With this approach, you don't need to wait for the complete response; you can start processing fragments as soon as they're accessible.

In [27]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed


[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-14T08:40:25.817222Z",
  "responseId": "-cn8Z8bwMYGEjNsP0r6tmQw"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " sky appears blue"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-14T08:40:25.817222Z",
  "responseId": "-cn8Z8bwMYGEjNsP0r6tmQw"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " due to a phenomenon called **Rayleigh scattering**. Here's a breakdown:"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "tr

100  5436    0  5335  100   101   2706     51  0:00:01  0:00:01 --:--:--  2756


            "text": " to travel through more of the atmosphere to reach our eyes. This means the blue light is scattered away even more, leaving the longer wavelengths like orange and red to reach us directly. That's why we see those colors during those times.\n**In summary:** Blue light is scattered more than other colors by the air molecules in the"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-14T08:40:25.817222Z",
  "responseId": "-cn8Z8bwMYGEjNsP0r6tmQw"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " atmosphere, so when we look up, we see the scattered blue light from all directions.\n"
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 6,
    "candidatesTokenCount": 292,
    "totalTokenCount": 298,
    "trafficType": "ON

### Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change. 

In [28]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {"text": "Describe this image"},
        {"file_data": {
          "mime_type": "image/png",
          "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
        }}
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3244    0  2641  100   603   1457    332  0:00:01  0:00:01 --:--:--  1789


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Here's a description of the image:\n\n**Overall Impression:**\n\nThe image shows a tabby cat standing in the snow. The cat is alert and looking directly at the viewer.\n\n**Details:**\n\n*   **Subject:** A medium-sized tabby cat.\n*   **Coat:** The cat has a brown and black tabby coat with distinct stripes.\n*   **Eyes:** The cat has yellow-green eyes.\n*   **Pose:** The cat is standing with one paw slightly raised, as if in mid-step. Its tail is slightly curved.\n*   **Background:** The background is a snowy landscape. The snow appears to be relatively fresh and undisturbed.\n*   **Lighting:** The lighting is bright and natural, casting soft shadows.\n*   **Composition:** The cat is positioned slightly off-center, drawing the viewer's eye. The snowy background provides a clean and contrasting backdrop."
          }
        ]
      },
      "finishReason": "STOP",
 

### Chat

The Gemini API supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions.

Specify the `role` field only if the content represents a turn in a conversation. You can set `role` to one of the following values: `user`, `model`.

In [29]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Hello" }
        ]
      },
      {
        "role": "model",
        "parts": [
          { "text": "Hello! I am glad you could both make it." }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "text": "So what is the first order of business?" }
        ]
      }
    ]
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1960    0  1562  100   398   1086    276  0:00:01  0:00:01 --:--:--  1363


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Since you asked, and without knowing any specific context, I'll make a suggestion for a general \"first order of business\":\n\n**Establishing the Purpose/Objective of the Meeting/Conversation.**\n\nBefore diving into details, it's crucial to clarify *why* we're talking. This helps ensure everyone is on the same page and that the conversation stays focused.  Some questions to consider are:\n\n*   **What do we hope to achieve by the end of this conversation?**\n*   **What problem are we trying to solve?**\n*   **What decisions need to be made?**\n*   **What information needs to be shared?**\n\nOnce we have a clear objective, we can then determine the best way to proceed.\n\nDoes that sound like a reasonable starting point, or is there something specific you had in mind? Let me know what you are planning to use me for.\n"
          }
        ]
      },
      "finishRe

### Function calling

Function calling lets you create a description of a function in their code, then pass that description to a language model in a request. This sample is an example of passing in a description of a function that returns information about where a movie is playing. Several function declarations are included in the request, such as `find_movies` and `find_theaters`.

Learn more about [function calling](https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling).

In [30]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
  "contents": {
    "role": "user",
    "parts": {
      "text": "Which theaters in Mountain View show Barbie movie?"
    }
  },
  "tools": [
    {
      "function_declarations": [
        {
          "name": "find_movies",
          "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "description": {
                "type": "string",
                "description": "Any kind of description including category or genre, title words, attributes, etc."
              }
            },
            "required": [
              "description"
            ]
          }
        },
        {
          "name": "find_theaters",
          "description": "find theaters based on location and optionally movie title which are is currently playing in theaters",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              }
            },
            "required": [
              "location"
            ]
          }
        },
        {
          "name": "get_showtimes",
          "description": "Find the start times for movies playing in a specific theater",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              },
              "theater": {
                "type": "string",
                "description": "Name of theater"
              },
              "date": {
                "type": "string",
                "description": "Date for requested showtime"
              }
            },
            "required": [
              "location",
              "movie",
              "theater",
              "date"
            ]
          }
        }
      ]
    }
  ]
}'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3478    0   915  100  2563   2118   5932 --:--:-- --:--:-- --:--:--  8050


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "functionCall": {
              "name": "find_theaters",
              "args": {
                "location": "Mountain View, CA",
                "movie": "Barbie"
              }
            }
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.10945702682841908
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 191,
    "candidatesTokenCount": 11,
    "totalTokenCount": 202,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 191
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 11
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-14T08:40:34.582177Z",
  "responseId": "Asr8Z6HEI4GEjNsP0r6tmQw"
}


## Multimodal input

The Gemini 2.0 Flash  (`gemini-2.0-flash-001`) is a multimodal model that supports adding image and video in text or chat prompts for a text response.


### Download an image from Google Cloud Storage

In [31]:
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg...
/ [1 files][ 17.4 KiB/ 17.4 KiB]                                                
Operation completed over 1 objects/17.4 KiB.                                     


### Generate text from a local image

Specify the [base64](https://en.wikipedia.org/wiki/Base64) encoding of the image or video to include inline in the prompt and the `mime_type` field. The supported [MIME types](https://en.wikipedia.org/wiki/Media_type) for images include `image/png` and `image/jpeg`.

In [32]:
%%bash

# Encode image data in base64
# NOTE: This command only works on Linux.
data=$(base64 -w 0 image.jpg)

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d "{
      'contents': {
        'role': 'USER',
        'parts': [
          {
            'text': 'Is it a cat?'
          },
          {
            'inline_data': {
              'data': '${data}',
              'mime_type':'image/jpeg'
            }
          }
        ]
       }
     }"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24951    0   857  100 24094   1252  35225 --:--:-- --:--:-- --:--:-- 36478


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Yes, it is a cat. It appears to be a brown tabby cat.\n"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.12815998582278981
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 263,
    "candidatesTokenCount": 17,
    "totalTokenCount": 280,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "IMAGE",
        "tokenCount": 258
      },
      {
        "modality": "TEXT",
        "tokenCount": 5
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 17
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-14T08:40:38.499167Z",
  "responseId": "Bsr8Z9-7HoGEjNsP0r6tmQw"
}


### Generate text from an image on Google Cloud Storage

Specify the Cloud Storage URI of the image to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported image MIME types include `image/png` and `image/jpeg`.

In [33]:
%%bash

MODEL_ID="gemini-2.0-flash-001"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Describe this image"
        },
        {
          "file_data": {
            "mime_type": "image/png",
            "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
          }
        }
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3253    0  2604  100   649   1455    362  0:00:01  0:00:01 --:--:--  1817


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Here's a description of the image:\n\n**Overall Impression:**\n\nThe image shows a tabby cat standing in the snow. The cat is alert and looking directly at the viewer.\n\n**Details:**\n\n*   **Subject:** A medium-sized tabby cat.\n*   **Coat:** The cat has a brown and black tabby coat with distinct stripes.\n*   **Eyes:** The cat has yellow-green eyes.\n*   **Pose:** The cat is standing with one paw slightly raised, as if in mid-step. Its tail is slightly curved.\n*   **Background:** The background is a snowy landscape. The snow appears to be relatively fresh and undisturbed.\n*   **Lighting:** The lighting is bright and natural, casting soft shadows.\n*   **Composition:** The cat is positioned slightly off-center, drawing the viewer's eye. The background is blurred, emphasizing the cat as the main subject."
          }
        ]
      },
      "finishReason": "STOP

### Generate text from a video file

Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported MIME types for video include `video/mp4`.


In [34]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d \
'{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Answer the following questions using the video only. What is the profession of the main person? What are the main features of the phone highlighted?Which city was this recorded in?Provide the answer JSON."
        },
        {
          "file_data": {
            "mime_type": "video/mp4",
            "file_uri": "gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
          }
        }
      ]
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1599    0  1085  100   514    310    147  0:00:03  0:00:03 --:--:--   458


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "```json\n{\n  \"profession\": \"photographer\",\n  \"phone_features\": [\n    \"Video Boost feature\",\n    \"Night Sight activation in low light for better quality\"\n  ],\n  \"city\": \"Tokyo\"\n}\n```"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.16299362182617189
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 16285,
    "candidatesTokenCount": 55,
    "totalTokenCount": 16340,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "AUDIO",
        "tokenCount": 1425
      },
      {
        "modality": "VIDEO",
        "tokenCount": 14820
      },
      {
        "modality": "TEXT",
        "tokenCount": 40
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 55
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-001",
  "cre