## Getting Started with the Vertex AI Gemini API with cURL

### Vertex AI Gemini API
The Vertex AI Gemini API provides a unified interface for interacting with Gemini models. There are currently two models available in the Gemini API:

- **Gemini Pro model** (gemini-pro): Designed to handle natural language tasks, multiturn text and code chat, and code generation.
    - Generate text from text prompts.
    - Explore various features and configuration options.
- **Gemini Pro Vision model** (gemini-pro-vision): Supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.
    - Generate text from image and text prompts.
    - Generate text from video.

In [1]:
! pip3 install --upgrade --user --quiet google-cloud-aiplatform

[0m

In [2]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

### Task 2: Use the Gemini Pro Model

The Gemini Pro (`gemini-pro`) model is tailored for natural language tasks such as classification, summarization, extraction, and writing.

In [1]:
PROJECT_ID = "qwiklabs-gcp-01-a5f1d4a80a31"
LOCATION = "us-central1"

In [2]:
import os

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["LOCATION"] = LOCATION
os.environ["API_ENDPOINT"] = f"{LOCATION}-aiplatform.googleapis.com"

#### Generate content

The generateContent method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports.

In [17]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -s -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's a simplified explanation:\n\n1. **Sunlight Enters the Atmosphere:** Sunlight, which appears white, is actually made up of all the colors of the rainbow.\n\n2. **Scattering of Light:** When sunlight enters Earth's atmosphere, it collides with tiny air molecules (mostly nitrogen and oxygen). This causes the light to scatter in different directions.\n\n3. **Shorter Wavelengths Scatter More:** Blue and violet light have shorter wavelengths compared to other colors in the visible spectrum. This means they are scattered more strongly by the air molecules than longer wavelengths like red and orange.\n\n4. **Our Eyes' Perception:** While violet light is scattered even more than blue, our eyes are more sensitive to blue light.  Therefore, we perceive the sky as blue because the scattered blue lig

#### Streaming

Send a text prompt to the model. The Gemini Pro (`gemini-pro`) model provides a streaming response mechanism. With this approach, we don't need to wait for the complete response; we can start processing fragments as soon as they're accessible.

In [18]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed


[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The"
          }
        ]
      }
    }
  ],
  "usageMetadata": {},
  "modelVersion": "gemini-1.5-pro-001"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.08886719,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.026000977
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.23925781,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.103515625
        },
        {
         

100 11523    0 11422  100   101   2052     18  0:00:05  0:00:05 --:--:--  1834


          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.12890625,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.025512695
        }
      ]
    }
  ],
  "modelVersion": "gemini-1.5-pro-001"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "\n\n* When the sun is near the horizon, sunlight has to travel through a much thicker layer of the atmosphere to reach us. \n* The longer path through the atmosphere means most of the blue and green light is scattered away before it reaches our eyes. \n* This leaves the longer wavelength colors like red,"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.06298828,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore

### Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.

In [19]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {"text": "Describe this image"},
        {"file_data": {
          "mime_type": "image/png",
          "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
        }}
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2301    0  1698  100   603    968    343  0:00:01  0:00:01 --:--:--  1312


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "A brown tabby cat is carefully walking across a snow-covered surface. The cat has bright yellow eyes and a slightly concerned expression. Its tail is held high, and its fur looks thick and warm against the cold snow. The background is blurred, emphasizing the cat as the focus of the image. \n"
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.018798828,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.055908203
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.057373047,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.057373047
        },
     

#### Chat

The Gemini Pro model supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions.

We should specify the `role` field only if the content represents a turn in a conversation. You can set `role` to one of the following values: `user`, `model`.

In [20]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Hello" }
        ]
      },
      {
        "role": "model",
        "parts": [
          { "text": "Hello! I am glad you could both make it." }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "text": "So what is the first order of business?" }
        ]
      }
    ]
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2000    0  1602  100   398    934    232  0:00:01  0:00:01 --:--:--  1167


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "I'm ready for anything!  Do *you* have something in mind you'd like to start with? I'm happy to answer questions, write stories, translate languages, and much more. 😊 \n\nWhat can I do for you today? \n"
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.040771484,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.017456055
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.122558594,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.083984375
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE

#### Function calling

Function calling lets you create a description of a function in their code, then pass that description to a language model in a request. This is an example of passing in a description of a function that returns information about where a movie is playing. Several function declarations are included in the request, such as `find_movies` and `find_theaters`.

In [21]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
  "contents": {
    "role": "user",
    "parts": {
      "text": "Which theaters in Mountain View show Barbie movie?"
    }
  },
  "tools": [
    {
      "function_declarations": [
        {
          "name": "find_movies",
          "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "description": {
                "type": "string",
                "description": "Any kind of description including category or genre, title words, attributes, etc."
              }
            },
            "required": [
              "description"
            ]
          }
        },
        {
          "name": "find_theaters",
          "description": "find theaters based on location and optionally movie title which are is currently playing in theaters",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              }
            },
            "required": [
              "location"
            ]
          }
        },
        {
          "name": "get_showtimes",
          "description": "Find the start times for movies playing in a specific theater",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              },
              "theater": {
                "type": "string",
                "description": "Name of theater"
              },
              "date": {
                "type": "string",
                "description": "Date for requested showtime"
              }
            },
            "required": [
              "location",
              "movie",
              "theater",
              "date"
            ]
          }
        }
      ]
    }
  ]
}'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4150    0  1587  100  2563   1572   2540  0:00:01  0:00:01 --:--:--  4112


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "functionCall": {
              "name": "find_theaters",
              "args": {
                "location": "Mountain View, CA",
                "movie": "Barbie"
              }
            }
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.15722656,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.12158203
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.13671875,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.17871094
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabili

### Task 3. Use the Gemini Pro Vision Model

The Gemini Pro Vision (`gemini-pro-vision`) is a multimodal model that supports adding image and video in text or chat prompts for a text response.

**Note**: Text-only prompts are not supported by the Gemini Pro Vision model. Instead, use the Gemini Pro model for text-only prompts.

In [14]:
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg...
/ [1 files][ 17.4 KiB/ 17.4 KiB]                                                
Operation completed over 1 objects/17.4 KiB.                                     


#### Generate text from a local image

Specify the [base64](https://en.wikipedia.org/wiki/Base64) encoding of the image or video to include inline in the prompt and the mime_type field. The supported [MIME](https://en.wikipedia.org/wiki/Media_type) types for images include `image/png` and `image/jpeg`.

In [25]:
%%bash

MODEL_ID="gemini-1.5-pro"

# Encode image data in base64
# NOTE: This command only works on Linux.
data=$(base64 -w 0 image.jpg)

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${model_}:generateContent \
  -d "{
      'contents': {
        'role': 'USER',
        'parts': [
          {
            'text': 'Is it a cat?'
          },
          {
            'inline_data': {
              'data': '${data}',
              'mime_type':'image/jpeg'
            }
          }
        ]
      }
    }"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24092    0     0  100 24092      0   784k --:--:-- --:--:-- --:--:--  784k


#### Generate text from an image on Google Cloud Storage

Specify the Cloud Storage URI of the image to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported image MIME types include `image/png` and `image/jpeg`.

In [23]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Describe this image"
        },
        {
          "file_data": {
            "mime_type": "image/png",
            "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
          }
        }
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2384    0  1735  100   649    940    351  0:00:01  0:00:01 --:--:--  1292


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "A brown tabby cat is carefully walking across a snow-covered surface. The cat has bright yellow eyes and a slightly concerned expression. Its tail is held high, and its fur looks thick and warm against the cold snow. The background is blurry, suggesting a focus on the cat and its cautious movement through the wintery landscape. \n"
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.01586914,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.05419922
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.06542969,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severity

#### Generate text from a video file

Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported MIME types for video include `video/mp4`.

In [26]:
%%bash

MODEL_ID="gemini-1.5-pro"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d \
'{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Answer the following questions using the video only. What is the profession of the main person? What are the main features of the phone highlighted?Which city was this recorded in?Provide the answer JSON."
        },
        {
          "file_data": {
            "mime_type": "video/mp4",
            "file_uri": "gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
          }
        }
      ]
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2062    0  1548  100   514    324    107  0:00:04  0:00:04 --:--:--   433


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "{\"profession\": \"Photographer\", \"phone features\": \"Night Sight activates in low light to make the video quality better.\", \"city\": \"Tokyo\"}"
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10253906,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06738281
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.09423828,
          "severity": "HARM_SEVERITY_LOW",
          "severityScore": 0.22851563
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10107422,
          "severity