# Installation

In [2]:
! pip install --upgrade --quiet langchain langchain_google_community langchain_google_vertexai google-cloud-discoveryengine google_cloud_aiplatform

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/99.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.6/99.6 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/104.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.0/104.0 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m50.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m81.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m69.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.1/42.1 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32

In [3]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [1]:
# Authenticate with Colab (if you're using Google Colab)

import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

from google.auth import default
creds, _ = default()

In [2]:
# Set your project and region
PROJECT_ID = "commsenglabs-poc-4187240"  # @param {type:"string"}
REGION = "us-central1"  # @param {type:"string"}
LOCATION = "global"  # @param {type:"string"}

In [3]:
# This cell is only needed if you run Google Colab

import vertexai

vertexai.init(project=PROJECT_ID, location=REGION)
! gcloud config set project {PROJECT_ID}
#! gcloud auth application-default login
#! gcloud auth application-default set-quota-project {PROJECT_ID}

Updated property [core/project].


# Hallucinations

## How prompt engineering helps to avoid hallucinations

Hallucinations are a real problem for LLMs. Since the LLM completes text that is plausible, it can be tricked into creating plausible, but completely false responses. Please take note that we are intentionally using an "old" model here, as newer model have much better built-in guardrailes. Here's a simple example:

In [4]:
from langchain.chains import LLMChain
from langchain_core.prompts.prompt import PromptTemplate
from langchain_google_vertexai import VertexAI


llm = VertexAI(model_name="gemini-2.0-flash-001", temperature=0.8, max_output_tokens=128)

template = """Describe {plant}.
"""


prompt_template = PromptTemplate(
   input_variables=["plant"],
   template=template,
)


chain = LLMChain(llm=llm, prompt=prompt_template)
chain.run(plant="black cucumbers")

  chain = LLMChain(llm=llm, prompt=prompt_template)
  chain.run(plant="black cucumbers")


'Black cucumbers are a fascinating and somewhat misleading name! There aren\'t cucumbers that are truly black in the same way that, say, a black olive or black grape is. However, there are a few reasons why a cucumber might be described as "black":\n\n*   **Very Dark Green:** The term "black cucumber" is most often used to describe cucumbers that are a very deep, dark green color. This can be so dark that they appear almost black, especially from a distance or in certain lighting. This dark color is often due to specific varieties or growing conditions.\n\n*   **Immature Stage:** Some cucumber varieties may'

Let's look at a simple prompt adjustment that helps to avoid hallucinations:

In [5]:
from langchain.chains import LLMChain
from langchain_core.prompts.prompt import PromptTemplate
from langchain_google_vertexai import VertexAI


llm = VertexAI(model_name="gemini-2.0-flash-001", temperature=0.8, max_output_tokens=128)

template = """Describe {plant}.


First, think whether {plant} exist.
If {plant} doesn't exist, answer "I don't have enough information about {plant}".
Otherwise, give its title, a short summary and then talk about origin and cultivation.
After that, describe its physical characteristics.
"""


prompt_template = PromptTemplate(
   input_variables=["plant"],
   template=template,
)


chain = LLMChain(llm=llm, prompt=prompt_template)
chain.run(plant="black cucumbers")

'Yes, black cucumbers do exist.\n\n**Title:** Black Cucumber (Also sometimes called Black Serpent Cucumber)\n\n**Summary:** Black cucumbers are a unique variety of cucumber known for their dark green, almost black skin. They offer a slightly sweet and mild flavor, making them a refreshing addition to salads, snacks, and various culinary dishes.\n\n**Origin and Cultivation:** The exact origin of the black cucumber is somewhat vague, but they are believed to be derived from Asian varieties. They are relatively easy to grow, similar to other cucumber varieties. They thrive in warm weather, well-drained soil, and require consistent watering. Seeds can'

# Vertex AI Agent Builder

## Using Vertex AI Agent Builder as a tool

Vertex AI Agent Builder provides an end-to-end experience for build out-of-the-box RAG agents. You will find relevant documents and be able to prepare a final answer with generative AI. You can use it as a tool. We will look in Chapter 9 what tools are and how they help to build powerful generative AI agents, but all we need to know for now is that Vertex AI Agent Builder is an interface that takes a question as an input and returns an answer.

Check the [Google Cloud Documentation](https://cloud.google.com/products/agent-builder?e=0&hl=en#use-vertex-ai-search-for-out-of-the-box-rag-for-your-agents-and-apps) for a deep discussion of Vertex AI Agent Builder for Search.

Vertex AI Agent Builder uses so-called Data Stores for running its code. Follow the instructions in the book on how to set up your datastore, and after you created it, you can use it with LangChain:

### Creating the Datastore

First, we need to create the datastore. This is the place, where you will hold the data that flows into your RAG. Confusingly, it uses something called a Discoveryengine. Check the documentation for details on datastores [here](https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es).

In [6]:
# Let's start with with some helper functions
def create_data_store(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # Once the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

def import_documents(
    project_id: str,
    location: str,
    data_store_id: str,
    gcs_uri: str,
    ):
    # Create a client
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )
    source_documents = [f"{gcs_uri}/*"]

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        gcs_source=discoveryengine.GcsSource(
            input_uris=source_documents, data_schema="content"
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )

    # Make the request
    operation = client.import_documents(request=request)

    response = operation.result()

    # Once the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    return operation.operation.name

def create_engine(
    project_id: str, location: str, data_store_name: str, data_store_id: str
  ):
    # Create a client
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )
    client = discoveryengine.EngineServiceClient(client_options=client_options)

    # Initialize request argument(s)
    config = discoveryengine.Engine.SearchEngineConfig(
        search_tier="SEARCH_TIER_ENTERPRISE", search_add_ons=["SEARCH_ADD_ON_LLM"]
    )

    engine = discoveryengine.Engine(
        display_name=data_store_name,
        solution_type="SOLUTION_TYPE_SEARCH",
        industry_vertical="GENERIC",
        data_store_ids=[data_store_id],
        search_engine_config=config,
    )

    request = discoveryengine.CreateEngineRequest(
        parent=discoveryengine.DataStoreServiceClient.collection_path(
            project_id, location, "default_collection"
        ),
        engine=engine,
        engine_id=engine.display_name,
    )

    # Make the request
    operation = client.create_engine(request=request)
    response = operation.result(timeout=90)

In [7]:
from google.cloud import discoveryengine_v1alpha as discoveryengine
from google.api_core.client_options import ClientOptions

# The datastore name can only contain lowercase letters, numbers, and hyphens
DATASTORE_NAME = "movie-database-maxtsc"
DATASTORE_ID = f"{DATASTORE_NAME}-id"
COLLECTION = create_data_store(PROJECT_ID, LOCATION, DATASTORE_ID)


PermissionDenied: 403 Discovery Engine API has not been used in project 522309567947 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/discoveryengine.googleapis.com/overview?project=522309567947 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry. [reason: "SERVICE_DISABLED"
domain: "googleapis.com"
metadata {
  key: "service"
  value: "discoveryengine.googleapis.com"
}
metadata {
  key: "serviceTitle"
  value: "Discovery Engine API"
}
metadata {
  key: "containerInfo"
  value: "522309567947"
}
metadata {
  key: "consumer"
  value: "projects/522309567947"
}
metadata {
  key: "activationUrl"
  value: "https://console.developers.google.com/apis/api/discoveryengine.googleapis.com/overview?project=522309567947"
}
, locale: "en-US"
message: "Discovery Engine API has not been used in project 522309567947 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/discoveryengine.googleapis.com/overview?project=522309567947 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry."
, links {
  description: "Google developers console API activation"
  url: "https://console.developers.google.com/apis/api/discoveryengine.googleapis.com/overview?project=522309567947"
}
]

In [None]:
# Now we need to also set up a search engine on top of the datastore so we can use enterprise features
create_engine(PROJECT_ID, LOCATION, DATASTORE_NAME, DATASTORE_ID)

In [None]:
# Now we import documents. We load a folder that contains arxiv articles. This will take some time ☕🥪
import_documents(PROJECT_ID, LOCATION, DATASTORE_ID,"gs://cloud-samples-data/gen-app-builder/search/arxiv")

# Don't worry if it times out, you can check the status in the Agent Builder console and use the tool once it's complete.

In [13]:
from langchain_google_community import VertexAISearchSummaryTool

DATASTORE_NAME = "alphabet-investor-pdfs"
DATASTORE_ID = "alphabet-investor-pdfs_1754412379585"
LOCATION = "us"
vertex_search = VertexAISearchSummaryTool(
  project_id=PROJECT_ID, location_id=LOCATION,
  data_store_id=DATASTORE_ID, get_extractive_answers = True,
  name="Vertex AI Agent Builder", description="")

query = "How many users Youtube had last year?"

print(vertex_search.invoke(query))



No results could be found. Try rephrasing the search query.


### LangChain Documents

Before we dive into building the RAG, it's important to understand how LangChain documents work. Below is a simple example.

In [14]:
from langchain_core.documents import Document

doc = Document(page_content="my page",
              metadata={"source_id": "example.pdf", "page": 1})
print(doc.page_content)

my page


### Using Vertex AI Agent Builder Search as retriever

Now we can also use our newly built tool as a retriever to have more control over the context it returns.

In [18]:
from langchain_google_community import VertexAISearchRetriever

vertex_search_retriever = VertexAISearchRetriever(
   project_id=PROJECT_ID,
   location_id=LOCATION,
   data_store_id=DATASTORE_ID,
   max_documents=3,
   beta = True
)


result = vertex_search_retriever.invoke(query)
print(len(result))

for doc in result:
   print(len(doc.page_content), doc.metadata)

3
1623 {'id': 'fec239d1570ce88f57a8971c3d6e8415', 'source': 'gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2006_google_annual_report.pdf7', 'previous_segments': [], 'next_segments': [{'content': 'Local\nPeople use Google products to learn not just about\nthe farthest reaches of the universe but about places\ncloser to home. Google MapsTM has become the #1\nmapping site across Europe and #2 in the U.S., and\nnow offers detailed street maps in more than 50\ncountries. We are pleased that so many developers\nhave used our mapping technology as a platform\nfor further innovation, and proud that more than\n30,000 websites use our maps API. Local authorities\nin London now use the Google Maps API to let\nresidents report problems such as road defects and\ntrash on the streets. Google Maps is also available\nnow on mobile devices and plays an integral role\nin our partnerships with mobile providers. We expect\nmore and better local products to result from our\nwork in 

## Building a RAG

Finally, with everything in place, we can build a beautiful RAG application.

In [20]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_google_vertexai import VertexAI


template = """Answer the question based only on the following context:
{context}


Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)


llm = VertexAI(model_name="gemini-2.0-flash-001", temperature=0.4, max_output_tokens=512)

chain = (
   {"context": vertex_search_retriever, "question": RunnablePassthrough()}
   | prompt
   | llm
   | StrOutputParser()
)


chain.invoke("What are Alphabet's Other Bets?")

'Alphabet’s investment in the portfolio of Other Bets includes businesses that are at various stages of development, ranging from those in the R&D phase to those that are in the beginning stages of commercialization. These bets include emerging businesses across many industries, from improving transportation and health technology to exploring solutions to address climate change.\n'

## Query expansion

Query expansion is an attempt to reformulate queries to get a better (and broader) list of chunks.  One way of doing this is using `MultiQueryRetriever` component. Let's see how the amount of chunks increases after query expansion:


In [21]:
from langchain.retrievers.multi_query import MultiQueryRetriever


retriever_with_expansion = MultiQueryRetriever.from_llm(
   retriever=vertex_search_retriever, llm=llm
)

result = vertex_search_retriever.invoke(query)
print(len(result))


result_expansion = retriever_with_expansion.invoke(query)
print(len(result_expansion))


3
6


## Filtering out irrelevant chunks

Sometimes retrieval returns documents that are not really relevant, and they confuse the model. You can filter them by making additional pass through each document and asking the LLM to evaluate its relevance with `LLMChainFilter`:

In [24]:
from langchain.retrievers.document_compressors import LLMChainFilter

# Retrieve many documents from the retrieval
vertex_search_retriever_many = VertexAISearchRetriever(
   project_id=PROJECT_ID,
   location_id=LOCATION,
   data_store_id=DATASTORE_ID,
   beta = True,
   max_documents=30,
)
results_many = vertex_search_retriever_many.invoke(query)


llm_compression = VertexAI(temperature=0., model_name="gemini-2.0-flash-001")
chain_filter = LLMChainFilter.from_llm(llm=llm_compression)


results_filtered_many = chain_filter.compress_documents(results_many, query)
print(len(results_many), len(results_filtered_many))

28 11


And that's how you adjust the original chain to include filtering into your RAG:

In [25]:
from langchain_core.runnables import RunnableLambda

chain = (
   {"context": vertex_search_retriever, "question": RunnablePassthrough()}
   | RunnableLambda(lambda x: {"context": chain_filter.compress_documents(x["context"], x["question"]), "question": x["question"]})
   | prompt
   | llm
   | StrOutputParser()
)


chain.invoke("How can I make my LLM prompts perform better?")


'Since the provided context is empty, I cannot answer the question. I have no information about how to improve LLM prompt performance.\n'

Let's run extraction and compare the length of documents. We can observe a signifant reduction in the overall chunks' (and context) length:

In [26]:
from langchain.retrievers.document_compressors import LLMChainExtractor

llm_extractor = VertexAI(temperature=0., model_name="gemini-1.5-flash-001")
chain_extractor = LLMChainExtractor.from_llm(llm=llm_extractor)


results_compressed = chain_filter.compress_documents(result_expansion, query)


for original_doc, compressed_doc in zip(result_expansion, results_compressed):
 print(f"Document reduced from {len(original_doc.page_content)} to {len(compressed_doc.page_content)}.")


Document reduced from 89 to 2800.
Document reduced from 2800 to 692.
Document reduced from 1104 to 2405.
