In [1]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Intro to Grounding with Gemini in Vertex AI

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fgrounding%2Fintro-grounding-gemini.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/grounding/intro-grounding-gemini.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://goo.gle/4jeQyFS">
      <img width="32px" src="https://cdn.qwiklabs.com/assets/gcp_cloud-e3a77215f0b8bfa9b3f611c0d2208c7e8708ed31.svg" alt="Google Cloud logo"><br> Open in  Cloud Skills Boost
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            

| | |
|-|-|
|Author(s) | [Holt Skinner](https://github.com/holtskinner), [Kristopher Overholt](https://github.com/koverholt) |

## Overview

**YouTube Video: Introduction to grounding with Gemini on Vertex AI**

<a href="https://www.youtube.com/watch?v=Ph0g6dnsB4g&list=PLIivdWyY5sqJio2yeg1dlfILOUO2FoFRx" target="_blank">
  <img src="https://img.youtube.com/vi/Ph0g6dnsB4g/maxresdefault.jpg" alt="Introduction to grounding with Gemini on Vertex AI" width="500">
</a>

[Grounding in Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini) lets you use generative text models to generate content grounded in your own documents and data. This capability lets the model access information at runtime that goes beyond its training data. By grounding model responses in Google Search results or data stores within [Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/enterprise-search-introduction), LLMs that are grounded in data can produce more accurate, up-to-date, and relevant responses.

Grounding provides the following benefits:

- Reduces model hallucinations (instances where the model generates content that isn't factual)
- Anchors model responses to specific information, documents, and data sources
- Enhances the trustworthiness, accuracy, and applicability of the generated content

You can configure two different sources of grounding in Vertex AI:

1. Google Search results for data that is publicly available and indexed.
   - If you use this service in a production application, you will also need to [use a Google Search entry point](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/grounding-search-entry-points).
2. [Data stores in Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest), which can include your own data in the form of website data, unstructured data, or structured data

### Objective

In this tutorial, you learn how to:

- Generate LLM text and chat model responses grounded in Google Search results
- Compare the results of ungrounded LLM responses with grounded LLM responses
- Create and use a data store in Vertex AI Search to ground responses in custom documents and data
- Generate LLM text and chat model responses grounded in Vertex AI Search results

This tutorial uses the following Google Cloud AI services and resources:

- Vertex AI
- Vertex AI Search

The steps performed include:

- Configuring the LLM and prompt for various examples
- Sending example prompts to generative text and chat models in Vertex AI
- Setting up a data store in Vertex AI Search with your own data
- Sending example prompts with various levels of grounding (no grounding, web grounding, data store grounding)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
1. Enable the [Vertex AI and Vertex AI Agent Builder APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,discoveryengine.googleapis.com).
1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

### Install Google Gen AI SDK for Python

Install the following packages required to execute this notebook.

In [2]:
%pip install --upgrade --quiet google-genai

Note: you may need to restart the kernel to use updated packages.


### Authenticate your Google Cloud account

If you are running this notebook on Google Colab, you will need to authenticate your environment. To do this, run the new cell below. This step is not required if you are using Vertex AI Workbench.

In [3]:
import sys

if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and create client

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

You can also change the `LOCATION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [4]:
import os

PROJECT_ID = "qwiklabs-gcp-04-68b16004a141"  # @param {type: "string"}
if not PROJECT_ID or PROJECT_ID == "qwiklabs-gcp-04-68b16004a141":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

from google import genai

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries

In [5]:
from IPython.display import Markdown, display
from google.genai.types import (
    GenerateContentConfig,
    GenerateContentResponse,
    GoogleSearch,
    Part,
    Retrieval,
    Tool,
    VertexAISearch,
)

### Helper functions

In [6]:
def print_grounding_data(response: GenerateContentResponse) -> None:
    candidate = response.candidates[0]

    citation_metadata = getattr(candidate, "citation_metadata")
    if citation_metadata:
        print("Citations")
        for citation in citation_metadata.citations:
            print(citation)

    grounding_metadata = getattr(candidate, "grounding_metadata")
    if grounding_metadata:
        print("\nGrounding Chunks")
        for grounding_chunk in grounding_metadata.grounding_chunks:
            print(grounding_chunk)

        print("\nGrounding Supports")
        for grounding_support in grounding_metadata.grounding_supports:
            print(grounding_support)

Initialize the Gemini model from Vertex AI:

In [7]:
MODEL_ID = "gemini-2.0-flash"  # @param {type: "string"}

## Example: Grounding with Google Search results

In this example, you'll compare LLM responses with no grounding with responses that are grounded in the results of a Google Search. You'll ask a question about a the most recent solar eclipse.

In [8]:
PROMPT = "You are an expert in astronomy. When is the next solar eclipse in the US?"

### Text generation without grounding

Make a prediction request to the LLM with no grounding:

In [9]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=PROMPT,
)

display(Markdown(response.text))

Alright, let's talk solar eclipses in the US!

The next solar eclipse visible in the United States will be an **annular solar eclipse on October 14, 2023**.

**Here's a breakdown:**

*   **Type:** Annular Solar Eclipse - This means the Moon will pass in front of the Sun, but it won't completely cover it. Instead, a bright ring of sunlight (the "annulus" or "ring of fire") will be visible around the Moon.

*   **Date:** October 14, 2023

*   **Path of Annularity (where the ring of fire is visible):** A narrow path will cross the US, starting in Oregon and moving southeast through parts of Nevada, Utah, New Mexico, and Texas, before continuing into Mexico and Central America.

*   **Partial Eclipse Visibility:** A partial solar eclipse (where the Moon covers only a portion of the Sun) will be visible across a much wider area of North and Central America, including most of the United States.

**Important Note:** Always use proper eye protection (eclipse glasses or a solar viewer) when viewing any solar eclipse. Looking directly at the Sun, even during an eclipse, can cause serious and permanent eye damage.

Enjoy the celestial event!


### Text generation grounded in Google Search results

You can add the `tools` keyword argument with a `Tool` including `GoogleSearch` to instruct Gemini to first perform a Google Search with the prompt, then construct an answer based on the web search results.

The search queries and [Search Entry Point](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/grounding-search-entry-points) are available for each `Candidate` in the response.

In [10]:
google_search_tool = Tool(google_search=GoogleSearch())

response = client.models.generate_content(
    model=MODEL_ID,
    contents=PROMPT,
    config=GenerateContentConfig(tools=[google_search_tool]),
)

display(Markdown(response.text))

print(print_grounding_data(response))

Okay, I can help you with that! Let me check when the next solar eclipse will be visible in the US.

Okay, here's the information about upcoming solar eclipses in the US:

*   **Partial Solar Eclipse:** The next partial solar eclipse in the United States will occur on **August 12, 2026.**
*   **Total Solar Eclipse (Alaska only):** The next total solar eclipse in the United States will occur on **March 30, 2033,** but it will only be visible in Alaska.
*   **Total Solar Eclipse (Contiguous US):** For the contiguous United States, the next total solar eclipse will be on **August 23, 2044**. This eclipse will start in Greenland, travel through Canada, and end around sunset in Montana, North Dakota, and South Dakota.
*   **Total Solar Eclipse (Significant portion of the US):** A more significant eclipse that will cross a large portion of the continental US will occur on **August 12, 2045.** This eclipse will travel from California to Florida.

So, while there's a partial eclipse in 2026 and a total eclipse in Alaska in 2033, the next total solar eclipse that will be widely visible in the contiguous United States will be in 2044, with an even more extensive one in 2045.



Grounding Chunks
retrieved_context=None web=GroundingChunkWeb(domain='wikipedia.org', title='wikipedia.org', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AWQVqALlLEV8VRJFBwHC3QSLYZFK_j7LgZEx2Kld1AqnoGY5EP1qZe5_i1u073A-X_nFYKR96IvHGprf806W1JwGmN-YzQOFPmlItxZ088j5-pRVZlA3IbaP_W8jcG-fHkI8KKU0iLggS-6Vnl8nsj4t66Yb54VnHoxIpPkCWJ7ubambERUjqAB2KZdHUlka-rg=')
retrieved_context=None web=GroundingChunkWeb(domain='cbsnews.com', title='cbsnews.com', uri='https://vertexaisearch.cloud.google.com/grounding-api-redirect/AWQVqAKqoAI7gVJC_FIIYK7qsePvbj9cHXwmlpeGPGMcaMg4d-yEb89XrvRRFvFL5_nWoDa_Poon9pQlVVUmx0xxiH1Wr5PvC8Ci6FvfoCfp7LCC3_-KFeWfqTFkO-VXtHfZXVM02q3H0OX7sf7vuNGqK-6rlPA0ei3ewmPNrg==')

Grounding Supports
confidence_scores=[0.91968614] grounding_chunk_indices=[0] segment=Segment(end_index=289, part_index=None, start_index=172, text='*   **Partial Solar Eclipse:** The next partial solar eclipse in the United States will occur on **August 12, 2026.**')
confidence_scores=[0.9

Note that the response without grounding only has limited information from the LLM about solar eclipses. Whereas the response that was grounded in web search results contains the most up to date information from web search results that are returned as part of the LLM with grounding request.

### Text generation with multimodal input grounded in Google Search results

Gemini can also generate grounded responses with multimodal input. Let's try with this image of the Eiffel Tower.

![Paris](https://storage.googleapis.com/github-repo/generative-ai/gemini/grounding/paris.jpg)

In [11]:
PROMPT = "What is the current temperature at this location?"

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        Part.from_uri(
            file_uri="gs://github-repo/generative-ai/gemini/grounding/paris.jpg",
            mime_type="image/jpeg",
        ),
        PROMPT,
    ],
    config=GenerateContentConfig(
        tools=[google_search_tool],
    ),
)

display(Markdown(response.text))
print_grounding_data(response)

The image shows a view of Paris, France, featuring the Eiffel Tower and the Seine River. To provide you with the current temperature in Paris, I will perform a search.

The current temperature in Paris, France is 13°C (55°F). It feels like 13°C (55°F). The weather is partly cloudy, and the chance of rain is around 0%.



Grounding Chunks
retrieved_context=None web=GroundingChunkWeb(domain='google.com', title='Weather information for locality: Paris', uri='https://www.google.com/search?q=weather+in+Paris')

Grounding Supports
confidence_scores=[0.7194718] grounding_chunk_indices=[0] segment=Segment(end_index=227, part_index=None, start_index=169, text='The current temperature in Paris, France is 13°C (55°F).')
confidence_scores=[0.96209973] grounding_chunk_indices=[0] segment=Segment(end_index=256, part_index=None, start_index=228, text='It feels like 13°C (55°F).')
confidence_scores=[0.98002464] grounding_chunk_indices=[0] segment=Segment(end_index=323, part_index=None, start_index=257, text='The weather is partly cloudy, and the chance of rain is around 0%.')


## Example: Grounding with custom documents and data

In this example, you'll compare LLM responses with no grounding with responses that are grounded in the [results of a data store in Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest).

The data store will contain internal documents from a fictional bank, Cymbal Bank. These documents aren't available on the public internet, so the Gemini model won't have any information about them by default.

### Creating a data store in Vertex AI Search

In this example, you'll use a Google Cloud Storage bucket with a few sample internal documents for our bank. There's some docs about booking business travel, strategic plan for this Fiscal Year and HR docs describing the different jobs available in the company.

Follow the tutorial steps in the Vertex AI Search documentation to:

1. [Create a data store with unstructured data](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#unstructured-data) that loads in documents from the GCS folder `gs://cloud-samples-data/gen-app-builder/search/cymbal-bank-employee`.
2. [Create a search app](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#create_a_search_app) that is attached to that data store. You should also enable the **Enterprise edition features** so that you can search indexed records within the data store.

Note: The data store must be in the same project that you are using for Gemini.

You can also follow this notebook to do it with code. [Create a Vertex AI Search Datastore and App
](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/search/create_datastore_and_search.ipynb)

Once you've created a data store, obtain the Data Store ID and input it below.

Note: You will need to wait for data ingestion to finish before using a data store with grounding. For more information, see [create a data store](https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es).

In [12]:
DATA_STORE_PROJECT_ID = PROJECT_ID  # @param {type: "string"}
DATA_STORE_REGION = "global"  # @param {type: "string"}
# Replace this with your data store ID from Vertex AI Search
DATA_STORE_ID = "loli_1745436079620"  # @param {type: "string"}

DATA_STORE_NAME = f"projects/{DATA_STORE_PROJECT_ID}/locations/{DATA_STORE_REGION}/collections/default_collection/dataStores/{DATA_STORE_ID}"

Now you can ask a question about the company culture:

In [13]:
PROMPT = "What is the company culture like?"

### Text generation without grounding

Make a prediction request to the LLM with no grounding:

In [14]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=PROMPT,
)

display(Markdown(response.text))

To give you the best answer about a company's culture, I need a little more information!  Tell me:

*   **Which company are you asking about?** Please provide the name of the company.
*   **Are you a job applicant, employee, or just curious?** This helps me tailor the response.
*   **Are you looking for information on a specific aspect of the company culture?** (e.g., work-life balance, opportunities for growth, management style, diversity and inclusion)

In the meantime, here are some general things to consider when researching a company's culture:

**Common Aspects of Company Culture:**

*   **Values:** What principles guide the company's decisions and actions?
*   **Mission:** What is the company trying to achieve?
*   **Vision:** Where does the company see itself in the future?
*   **Leadership Style:** How do managers lead their teams? Is it collaborative, hierarchical, or something else?
*   **Communication:** How is information shared within the company? Is it open and transparent?
*   **Work-Life Balance:** How does the company support employees' personal lives?
*   **Opportunities for Growth:** Are there opportunities for professional development and advancement?
*   **Diversity and Inclusion:** How does the company promote diversity and create an inclusive environment?
*   **Employee Recognition:** How are employees recognized and rewarded for their contributions?
*   **Social Activities:** Are there company-sponsored events or activities that promote social interaction?
*   **Dress Code:** What is the typical attire at the office?
*   **Work Environment:** Is it a formal or informal environment? Is it fast-paced or more relaxed?

Once you give me the company name, I can use publicly available information from sources like:

*   **Company Website:** Look for sections on "About Us," "Our Values," "Careers," or "Life at [Company Name]."
*   **Glassdoor:** This site provides employee reviews, salary information, and insights into the company's culture.
*   **LinkedIn:** Check out the company's LinkedIn page for posts about company events, employee achievements, and company culture.
*   **Indeed:** Similar to Glassdoor, Indeed has company reviews and employee ratings.
*   **Comparably:** This site focuses on providing data-driven insights into company culture and compensation.
*   **News Articles and Press Releases:** These can provide information about the company's values and initiatives.
*   **Social Media:** Check out the company's social media accounts (e.g., Facebook, Twitter, Instagram) for a glimpse into their culture.
*   **Personal Network:** If you know someone who works at the company, reach out and ask them about their experience.

I look forward to helping you learn more!

### Text generation grounded in Vertex AI Search results

Now we can add the `tools` keyword arg with a grounding tool of `grounding.VertexAISearch()` to instruct the LLM to first perform a search within your custom data store, then construct an answer based on the relevant documents:

In [15]:
vertex_ai_search_tool = Tool(
    retrieval=Retrieval(vertex_ai_search=VertexAISearch(datastore=DATA_STORE_NAME))
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents="What is the company culture like?",
    config=GenerateContentConfig(tools=[vertex_ai_search_tool]),
)

display(Markdown(response.text))
print_grounding_data(response)

Company culture encompasses the values, beliefs, attitudes, and behaviors that characterize how a company and its employees operate. It influences all aspects of a company, including how employees interact with each other, how decisions are made, and how customers are served. A positive company culture can lead to increased employee engagement, productivity, and retention, while a negative culture can lead to the opposite.



Grounding Chunks


TypeError: 'NoneType' object is not iterable

Note that the response without grounding doesn't have any context about what company we are asking about. Whereas the response that was grounded in Vertex AI Search results contains information from the documents provided, along with citations of the information.

<div class="alert alert-block alert-warning">
<b>⚠️ Important notes:</b><br>
<br>
<b>If you get an error when running the previous cell:</b><br>
&nbsp;&nbsp;&nbsp;&nbsp;In order for this sample notebook to work with data store in Vertex AI Search,<br>
&nbsp;&nbsp;&nbsp;&nbsp;you'll need to create a <a href="https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#create_a_data_store">data store</a> <b>and</b> a <a href="https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#create_a_search_app">search app</a> associated with it in Vertex AI Search.<br>
&nbsp;&nbsp;&nbsp;&nbsp;If you only create a data store, the previous request will return errors when making queries against the data store.
<br><br>
<b>If you get an empty response when running the previous cell:</b><br>
&nbsp;&nbsp;&nbsp;&nbsp;You will need to wait for data ingestion to finish before using a data store with grounding.<br>
&nbsp;&nbsp;&nbsp;&nbsp;For more information, see <a href="https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es">create a data store</a>.
</div>
</div>

## Example: Grounded chat responses

You can also use grounding when using chat conversations in Vertex AI. In this example, you'll compare LLM responses with no grounding with responses that are grounded in the results of a Google Search and a data store in Vertex AI Search.

In [16]:
PROMPT = "What are managed datasets in Vertex AI?"
PROMPT_FOLLOWUP = "What types of data can I use?"

### Chat session grounded in Google Search results

Now you can add the `tools` keyword arg with a Tool of `GoogleSearch` to instruct the chat model to first perform a Google Search with the prompt, then construct an answer based on the web search results:

In [17]:
chat = client.chats.create(
    model=MODEL_ID,
    config=GenerateContentConfig(tools=[Tool(google_search=GoogleSearch())]),
)

response = chat.send_message(PROMPT)
display(Markdown(response.text))

response = chat.send_message(PROMPT_FOLLOWUP)
display(Markdown(response.text))

In Vertex AI, managed datasets serve as a centralized repository for your machine learning data, offering a structured approach to manage, prepare, and access data for training and serving ML models. Here's a breakdown of their key aspects:

**Key Features and Benefits:**

*   **Centralized Management:** Datasets are stored in a central location within Vertex AI, providing a single source of truth for your machine learning data.
*   **Simplified Dataset Management:** They eliminate the need for separate data storage and management infrastructure, streamlining dataset organization, access, and version control.
*   **Data Preparation:** Managed datasets support various data preparation tasks, including cleansing, transformation, and encoding.
*   **Data Labeling:** They provide the ability to annotate and label data directly within the user interface. If your data is unlabeled, you can use the data labeling service for human assistance.
*   **Automatic Data Splitting**: Automatically split data into training, test, and validation sets
*   **Data Exploration:** Managed datasets allow you to generate data statistics and visualizations to quickly inspect data distributions and identify missing values.
*   **Model Governance:** They enable tracking of lineage to models for governance and iterative development.
*   **Performance Comparison**: Managed datasets allow comparing model performance by training AutoML and custom models using the same datasets.

**Supported Data Types:**

Vertex AI managed datasets support various data types, including:

*   **Tabular Datasets:** Consist of structured data in rows and columns, typically in CSV or JSON format.
*   **Image Datasets:** Contain images for tasks like image classification, object detection, and image segmentation.
*   **Text Datasets:** Contain written text for tasks like text classification, sentiment analysis, and NLP.
*   **Video Datasets:** Contain videos for tasks like video classification, object tracking, and action recognition.

**Accessing Managed Datasets:**

To access managed datasets from custom training code, Vertex AI injects key environment variables into the training container, including file locations and schema details.

**How to Use Managed Datasets:**

1.  **Create a Dataset:** In the Vertex AI console, go to the Datasets section and click "Create Dataset." Choose the appropriate data type (Tabular, Image, Text, or Video) and name the dataset.
2.  **Choose a Storage Location:** Specify the storage location. You can import data from local files, Cloud Storage buckets, or BigQuery tables.
3.  **Define Data Schema (Optional):** For tabular datasets, define the schema, including column names and data types. This helps Vertex AI understand the data structure and perform automated preprocessing.



Vertex AI managed datasets support a variety of data types to accommodate different machine learning tasks:

*   **Tabular Datasets:** These datasets consist of structured data organized in rows and columns, commonly found in CSV or JSON formats. You can import tabular data from CSV files, BigQuery tables, or Cloud Storage buckets.
*   **Image Datasets:** These datasets contain images suitable for tasks such as image classification, object detection (identifying and locating objects within an image), and image segmentation (assigning labels to individual pixels in an image).
*   **Text Datasets:** These datasets comprise written text and are used for tasks like text classification, sentiment analysis (determining the emotional tone of text), and natural language processing (NLP).
*   **Video Datasets:** These datasets include videos for tasks like video classification, object tracking (following objects across video frames), and action recognition (identifying actions occurring in videos).

In addition to the data types, Vertex AI also supports various data formats, including JSONL, CSV, and BigQuery.


### Chat session grounded in Vertex AI Search results

Now we can add the `tools` keyword arg with a grounding tool of `grounding.VertexAISearch()` to instruct the chat model to first perform a search within your custom data store, then construct an answer based on the relevant documents:

In [18]:
PROMPT = "How do I book business travel?"
PROMPT_FOLLOWUP = "Give me more details."

In [19]:
chat = client.chats.create(
    model=MODEL_ID,
    config=GenerateContentConfig(
        tools=[
            Tool(
                retrieval=Retrieval(
                    vertex_ai_search=VertexAISearch(datastore=DATA_STORE_NAME)
                )
            )
        ]
    ),
)

response = chat.send_message(PROMPT)
display(Markdown(response.text))

response = chat.send_message(PROMPT_FOLLOWUP)
display(Markdown(response.text))

Booking business travel involves a few key steps, and the best approach often depends on the size of your company and the frequency of travel. Here's a general guide:

**1. Determine Your Company's Travel Policy:**

*   **Does your company have a preferred booking method?** Many companies use specific online booking tools or travel management companies.
*   **Are there pre-approved hotels or airlines?** Travel policies often include preferred vendors to leverage negotiated rates or loyalty programs.
*   **What is the approval process?** Understand if you need to get approval from a manager before booking.
*   **What are the spending limits?** Policies usually outline acceptable price ranges for flights, hotels, and other expenses.

**2. Choose a Booking Method:**

Here are several options for booking business travel:

*   **Online Travel Agencies (OTAs):** Websites like Expedia, Kayak, and Booking.com can be convenient for comparing prices and options. However, they may not always offer the best deals for business travelers, and you might miss out on corporate discounts or benefits.
*   **Directly with Airlines and Hotels:** Booking directly can sometimes give you access to better rates, especially if you're a loyalty program member.
*   **Travel Management Companies (TMCs):** TMCs (like American Express Global Business Travel, BCD Travel, or CWT) specialize in corporate travel. They offer a range of services, including booking flights and hotels, managing travel itineraries, and providing 24/7 support. They can also help you track travel expenses and ensure compliance with your company's travel policy.
*   **Dedicated Business Travel Booking Platforms:** Several platforms cater specifically to business travel, such as Navan (formerly TripActions), TravelPerk, and Spotnana. These platforms often integrate with expense management systems and offer features like travel policy enforcement, automated reporting, and duty of care support.

**3. Search and Compare Options:**

*   **Flights:** Look for flights that fit your schedule and budget. Consider factors like layovers, baggage fees, and seat selection.
*   **Hotels:** Choose hotels that are conveniently located and offer the amenities you need (e.g., Wi-Fi, business center). Consider your company's preferred hotels or those that offer corporate rates.
*   **Car Rentals/Ground Transportation:** If necessary, book a rental car or arrange for airport transfers.

**4. Book and Confirm:**

*   **Double-check all details** before confirming your booking (dates, times, names, etc.).
*   **Save your confirmation numbers** and any relevant contact information.
*   **Add the trip to your calendar** and share your itinerary with colleagues or family members.

**5. Expense Reporting:**

*   **Keep track of all your expenses,** including receipts for flights, hotels, meals, and transportation.
*   **Submit your expense report** according to your company's policy. Many travel booking platforms integrate with expense management systems to automate this process.

**Tips for Efficient Business Travel Booking:**

*   **Book in advance:** You'll often get better rates by booking flights and hotels well in advance of your trip.
*   **Be flexible with your dates:** If possible, try to be flexible with your travel dates. Sometimes flying on a Tuesday or Wednesday can save you money.
*   **Consider alternative airports:** Flying into or out of a smaller, less busy airport can sometimes be cheaper.
*   **Use travel apps:** Download the apps for your preferred airlines, hotels, and rental car companies. These apps can provide you with real-time updates on your flights, hotel reservations, and other travel information.
*   **Take advantage of loyalty programs:** If you travel frequently, sign up for airline, hotel, and rental car loyalty programs. You can earn points or miles that can be redeemed for free travel or other benefits.


Okay, let's break down the details of business travel booking further, expanding on the key steps and providing more granular advice:

**1. Diving Deeper into Company Travel Policy:**

*   **Policy Access:** Where is the policy documented? (Intranet, shared drive, HR portal, travel booking platform). Get familiar with it.
*   **Specific Restrictions:** Are there restrictions on booking first class or business class flights, or specific hotel brands? What's the approval process for exceptions?
*   **Duty of Care:** What are the company's policies regarding traveler safety and security? Is there a travel risk management program? Who do you contact in an emergency?
*   **Sustainability:** Does the company prioritize sustainable travel options (e.g., direct flights, eco-friendly hotels)?
*   **Payment Methods:** Are you required to use a corporate credit card? Are virtual cards used? Is there a process for pre-trip expense advances?
*   **Who to contact with questions**: Who in your company is responsible for travel-related questions? Is there a dedicated travel department, or an HR contact?

**2. Exploring Booking Methods in Detail:**

*   **Online Travel Agencies (OTAs):**
    *   *Pros:* Comparison shopping, wide selection, user reviews.
    *   *Cons:* Lack of personalized service, difficult to make changes, may not integrate with expense systems, potential hidden fees, might not comply with corporate travel policy
    *   *When to use:* Independent contractors or very small businesses without negotiated rates, for simple domestic trips when policy allows.
*   **Direct Booking (Airlines/Hotels):**
    *   *Pros:* Can earn loyalty points, potentially better customer service for changes, sometimes access to exclusive deals.
    *   *Cons:* Time-consuming to compare options, doesn't ensure policy compliance, difficult to track expenses.
    *   *When to use:* When you have strong loyalty to a specific brand, or if the company policy requires booking direct for certain vendors.
*   **Travel Management Companies (TMCs):**
    *   *Pros:* Policy compliance, negotiated rates, 24/7 support, expense tracking, travel risk management, reporting capabilities.
    *   *Cons:* Can be more expensive than OTAs (but savings often outweigh the cost), may require training on the platform.
    *   *When to use:* Mid-sized to large companies with complex travel needs, frequent travelers, strict policy requirements.
*   **Business Travel Booking Platforms (e.g., Navan, TravelPerk):**
    *   *Pros:* User-friendly interface, policy enforcement, automated expense reporting, real-time data, duty of care features, often integrates with other business systems.
    *   *Cons:* Can be expensive, may not offer the same level of personalized service as a TMC.
    *   *When to use:* Startups to mid-sized companies that want a modern, technology-driven solution with built-in policy and expense management.
*   **Travel Agencies**
    *   *Pros:* Personalized service, can handle complex itineraries, good for groups.
    *   *Cons:* More expensive than online options, might not have access to all the best deals.
    *   *When to use:* Complex international itineraries, group travel, VIP travelers who require high-touch service.

**3. Refining Your Search and Comparison Strategies:**

*   **Flights:**
    *   *Optimal Booking Window:* Generally, 2-3 weeks in advance for domestic and 1-3 months for international travel can yield the best prices (but this varies greatly depending on route, season, and demand).
    *   *Use Flight Comparison Tools:* Google Flights, Kayak, and Skyscanner are useful for identifying the cheapest days to fly and alternative airports.
    *   *Consider "Hidden City" Ticketing:* Be aware that "hidden city" ticketing (booking a flight with a layover at your desired destination and getting off there) can violate airline terms of service and may result in penalties.
    *   *Watch out for Basic Economy:* Basic Economy fares often restrict seat selection, baggage allowance, and changes. Make sure it aligns with your needs.
*   **Hotels:**
    *   *Location, Location, Location:* Choose hotels near your meeting location or with easy access to transportation.
    *   *Read Reviews:* Check reviews on TripAdvisor, Google Hotels, and other sites to get an idea of the hotel's quality and service.
    *   *Negotiate Corporate Rates:* If you travel frequently to a particular city, try to negotiate a corporate rate with a hotel.
    *   *Consider Amenities:* Does the hotel have free Wi-Fi, a business center, a gym, and free breakfast?
*   **Car Rentals/Ground Transportation:**
    *   *Compare Rental Companies:* Use comparison sites to find the best deals on rental cars.
    *   *Consider Ride-Sharing:* Uber and Lyft can be convenient and cost-effective for short trips.
    *   *Look for Airport Shuttles:* Many hotels offer free airport shuttles, which can save you money on transportation.

**4. Booking and Confirmation - The Finer Points:**

*   **Passport and Visa Requirements:** If traveling internationally, ensure your passport is valid for at least six months beyond your return date and that you have any necessary visas.
*   **TSA PreCheck/Global Entry:** If you travel frequently, consider applying for TSA PreCheck or Global Entry to expedite airport security.
*   **Seat Selection:** Choose your seat in advance to ensure you get a comfortable spot.
*   **Mobile Check-In:** Check in for your flight online or via the airline's app to save time at the airport.
*   **Travel Insurance:** Consider purchasing travel insurance to protect yourself against unexpected events like flight cancellations, lost luggage, or medical emergencies.
*   **Share your Itinerary:** Leave a copy of your itinerary with a colleague, family member, or friend.

**5. Optimizing the Expense Reporting Process:**

*   **Track Expenses Diligently:** Use a mobile app or spreadsheet to track your expenses as you incur them.
*   **Take Photos of Receipts:** Take pictures of your receipts with your smartphone to avoid losing them.
*   **Submit Expense Reports Promptly:** Submit your expense reports as soon as possible to get reimbursed quickly.
*   **Automated Expense Reporting Tools:** Integrate your travel booking platform with an expense management system like Expensify, Concur, or Zoho Expense.
*   **Understand Per Diem Rates:** If your company uses per diem rates for meals and incidentals, familiarize yourself with the allowable amounts.

**Additional Tips for the Savvy Business Traveler:**

*   **Travel Light:** Pack only what you need to avoid checked baggage fees and delays.
*   **Stay Connected:** Bring a portable charger to keep your devices powered up.
*   **Learn Basic Phrases:** If traveling internationally, learn a few basic phrases in the local language.
*   **Be Prepared for Delays:** Pack a book, download movies, or bring other entertainment to keep yourself occupied during delays.
*   **Stay Healthy:** Get enough sleep, eat healthy foods, and stay hydrated to avoid getting sick while traveling.
*   **Use Airline Lounges:** If you have access to airline lounges, take advantage of the free food, drinks, and Wi-Fi.

By paying attention to these details, you can make your business travel more efficient, comfortable, and cost-effective. Remember to always prioritize your company's travel policy and seek clarification from your travel department or HR if you have any questions.


## Cleaning up

To avoid incurring charges to your Google Cloud account for the resources used in this notebook, follow these steps:

1. To avoid unnecessary Google Cloud charges, use the [Google Cloud console](https://console.cloud.google.com/) to delete your project if you do not need it. Learn more in the Google Cloud documentation for [managing and deleting your project](https://cloud.google.com/resource-manager/docs/creating-managing-projects).
1. If you used an existing Google Cloud project, delete the resources you created to avoid incurring charges to your account. For more information, refer to the documentation to [Delete data from a data store in Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/delete-datastores), then delete your data store.
2. Disable the [Vertex AI Search API](https://console.cloud.google.com/apis/api/discoveryengine.googleapis.com) and [Vertex AI API](https://console.cloud.google.com/apis/api/aiplatform.googleapis.com) in the Google Cloud Console.