## Getting Started

### Install required libraries

In [None]:
%%capture

!sudo apt install -q jq

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment.

This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project

To get started using Vertex AI, the organizers will provide you with these parameters to connect to the API

In [None]:
# Define project information
PROJECT_ID = "Project_id"  # @param {type:"string"}
LOCATION = "Region"  # @param {type:"string"}

# Import libraries
import os

## Use the Gemini 2.0 Flash model

In [None]:
MODEL_ID = "gemini-2.0-flash"
API_HOST = f"{LOCATION}-aiplatform.googleapis.com"

os.environ["API_ENDPOINT"] = (
    f"{API_HOST}/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"
)

## Text generation

The `generateContent` method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. In this example, you send a text prompt and request the model response in text.

In [None]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" },
    },
    "generation_config": {
      "response_modalities": "TEXT",
     },
  }' 2>/dev/null >response.json

jq -r ".candidates[].content.parts[].text" response.json

The sky appears blue due to a phenomenon called **Rayleigh scattering**. Here's the breakdown:

*   **Sunlight and its Colors:** Sunlight, which appears white, is actually made up of all the colors of the rainbow. Each color has a different wavelength, with violet and blue having the shortest wavelengths, and red having the longest.

*   **The Atmosphere and Molecules:** The Earth's atmosphere is filled with tiny particles of nitrogen and oxygen molecules, as well as other small particles.

*   **Rayleigh Scattering:** When sunlight enters the atmosphere, it collides with these tiny air molecules. This collision causes the light to scatter in different directions. The amount of scattering depends on the wavelength of the light. Shorter wavelengths (blue and violet) are scattered much more strongly than longer wavelengths (red and orange).

*   **Why Blue and Not Violet?** While violet light is scattered more than blue light, our eyes are more sensitive to blue. Also, the sun emits slig

### Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.

In [None]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {"text": "Tell me a story."}
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }' 2>/dev/null >response.json

jq -r ".candidates[].content.parts[].text" response.json

The lighthouse keeper, Silas, was a man woven from the sea itself. His skin was tanned and weathered like driftwood, his eyes the grey-green of a storm-tossed wave. He'd spent thirty years tending the beacon on the jagged, isolated Isle of Aethel, a lonely sentinel against the unforgiving Atlantic.

Silas wasn't lonely, though. He had the gulls for company, the rhythmic crash of the waves, and the stories whispered on the wind. He knew the sea's moods intimately, its gentle caress and its furious rage. He knew the names of the constellations, the migratory patterns of the birds, and the secrets hidden in the tide pools.

One day, a storm unlike any he'd ever witnessed descended upon Aethel. The wind howled like a banshee, tearing at the lighthouse walls. Waves, mountains of black water, crashed against the tower, threatening to swallow it whole. Silas, his face grim, fought to keep the lamp burning, its beam a defiant finger pointing into the abyss.

Amidst the chaos, he saw something 

### Chat

The Gemini API supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions.

Specify the `role` field only if the content represents a turn in a conversation. You can set `role` to one of the following values: `user`, `model`.

In [None]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Hello" }
        ]
      },
      {
        "role": "model",
        "parts": [
          { "text": "Hello! I am glad you could both make it." }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "text": "So what is the first order of business?" }
        ]
      }
    ]
  }' 2>/dev/null >response.json

jq -r ".candidates[].content.parts[].text" response.json

Alright, let's get down to business! To give you the most relevant answer, I need a little more context.  What are we meeting about?  What kind of business are we talking about here?

For example, are we:

*   **Starting a project?** We might need to define goals and assign tasks.
*   **Planning an event?** We could discuss venue, date, and budget.
*   **Having a general chat?** We could talk about anything!
*   **Troubleshooting a problem?** We'd need to define the issue clearly.
*   **Working on a software project?** Perhaps we should discuss user stories.
*   **Running a meeting for a company?** Perhaps we need to ratify some agenda.

Once I know the context, I can help you determine the most important first step. Give me some more information!



## Multimodal input

Gemini is a multimodal model that supports adding image and video in text or chat prompts for a text response.


### Download an image from Google Cloud Storage

In [None]:
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg...
/ [1 files][ 17.4 KiB/ 17.4 KiB]                                                
Operation completed over 1 objects/17.4 KiB.                                     


### Generate text from a local image

Specify the [base64](https://en.wikipedia.org/wiki/Base64) encoding of the image or video to include inline in the prompt and the `mime_type` field. The supported [MIME types](https://en.wikipedia.org/wiki/Media_type) for images include `image/png` and `image/jpeg`.

In [None]:
%%bash

# Encode image data in base64
image_file="image.jpg"
if [[ -f "$image_file" ]]; then
  if command -v base64 &> /dev/null; then
    # base64 is available
    if [[ "$(uname -s)" == "Darwin" ]]; then
      # macOS -b 0 to avoid line wrapping
      data=$(base64 -b 0 -i "$image_file")
    else
      # Linux -w 0 to avoid line wrapping
      data=$(base64 -w 0 "$image_file")
    fi
  else
    echo "Error: base64 command not found."
    exit 1
  fi
else
  echo "Error: Image file '$image_file' not found."
  exit 1
fi

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d "{
      'contents': {
        'role': 'USER',
        'parts': [
          {
            'text': 'Is it a cat?'
          },
          {
            'inline_data': {
              'data': '${data}',
              'mime_type':'image/jpeg'
            }
          }
        ]
       }
    }" 2>/dev/null >response.json

jq -r ".candidates[].content.parts[].text" response.json

Yes, the image shows a cat. It's a tabby cat with a striped coat.



### Generate text from an image on Google Cloud Storage

Specify the Cloud Storage URI of the image to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported image MIME types include `image/png` and `image/jpeg`.

In [None]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Describe this image"
        },
        {
          "file_data": {
            "mime_type": "image/png",
            "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
          }
        }
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }' 2>/dev/null >response.json

jq -r ".candidates[].content.parts[].text" response.json

Here's a description of the image:

**Overall Impression:**

The image shows a tabby cat standing in the snow. The cat is the main subject and is in focus, while the snowy background is slightly blurred.

**Cat's Appearance:**

*   **Coat:** The cat has a classic tabby coat pattern, with dark brown or black stripes on a lighter brown background.
*   **Eyes:** The cat has yellow or golden eyes.
*   **Pose:** The cat is standing with one paw slightly raised, as if it's about to take a step. It's looking directly at the viewer with a curious or alert expression.
*   **Build:** The cat appears to be of average build, neither overly thin nor overweight.

**Background:**

*   The background is entirely snow-covered.
*   There are some subtle textures and variations in the snow, suggesting it might be a path or a field.
*   The background is out of focus, which helps to emphasize the cat as the main subject.

**Overall Tone:**

The image has a natural and slightly cold feel due to the snow. T

### Generate text from a video file

Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported MIME types for video include `video/mp4`.


In [None]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d \
'{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Answer the following questions using the video only. What is the profession of the main person? What are the main features of the phone highlighted? Which city was this recorded in?"
        },
        {
          "file_data": {
            "mime_type": "video/mp4",
            "file_uri": "gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
          }
        }
      ]
    }
  }' 2>/dev/null >response.json

jq -r ".candidates[].content.parts[].text" response.json

Okay, here's the answer to your questions based on the video:

*   **Profession of the main person:** Photographer
*   **Main features of the phone highlighted:** Video Boost with Night Sight (to make the quality even better)
*   **City recorded in:** Tokyo

I hope this answers your questions!


### Code Execution

The Gemini API code execution feature enables the model to generate and run Python code and learn iteratively from the results until it arrives at a final output.

In [None]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}:generateContent \
  -d '{
  "contents": {
    "role": "user",
    "parts": {
      "text": "Calculate 20th fibonacci number. Then find the nearest palindrome to it."
    }
  },
  "tools": [
      {"code_execution": {},}
  ]
}' 2>/dev/null >response.json

jq -r ".candidates[].content.parts[]" response.json

{
  "text": "Okay, I can do that. First, I'll calculate the 20th Fibonacci number, and then I'll find the nearest palindrome.\n\n"
}
{
  "executableCode": {
    "language": "PYTHON",
    "code": "def fibonacci(n):\n    if n <= 0:\n        return 0\n    elif n == 1:\n        return 1\n    else:\n        a, b = 0, 1\n        for _ in range(2, n + 1):\n            a, b = b, a + b\n        return b\n\nfib_20 = fibonacci(20)\nprint(f'{fib_20=}')\n"
  }
}
{
  "codeExecutionResult": {
    "outcome": "OUTCOME_OK",
    "output": "fib_20=6765\n"
  }
}
{
  "text": "The 20th Fibonacci number is 6765. Now, let's find the nearest palindrome to 6765. I'll check palindromes above and below it.\n\nThe closest palindrome smaller than 6765 is 6776 in reverse which is 6776. Let's see if there's one closer. We can try generating some palindromes around 6765. Numbers of the form 6XX6 and 6XXX6 should be considered.\n\nLet's check 6666 and 6886, which are clearly closer. Then, we'll find the absolute differe