In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Building a Gen AI RAG application with Vertex AI Feature Store and BigQuery

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretrieval-augmented-generation%2Frag_qna_with_bq_and_featurestore.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/bigquery/import?url=https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/bigquery/v1/32px.svg" alt="BigQuery Studio logo"><br> Open in BigQuery Studio
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/53/X_logo_2023_original.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/rag_qna_with_bq_and_featurestore.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            

| | |
|-|-|
| Authors | [Elia Secchi](https://github.com/eliasecchig) [Lorenzo Spataro](https://github.com/lspataroG) |

## Overview

This notebook guides you through building a low-latency vector search system for your Gen AI application using [BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro) and [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview). We'll leverage the [`BigQueryVectorStore` LangChain integration]([https://github.com/langchain-ai/langchain-google/blob/main/libs/community/langchain_google_community/bq_storage_vectorstores/featurestore.py#L33]) and [`VertexFSVectorStore` LangChain integration]([https://github.com/langchain-ai/langchain-google/blob/main/libs/community/langchain_google_community/bq_storage_vectorstores/bigquery.py#L26]) to streamline this process.

Vertex AI Feature Store seamlessly integrates with BigQuery, providing a unified data storage and flexible vector search options:

- **BigQuery Vector Search**: with **`BigQueryVectorStore`** LangChain class, ideal for batch retrieval and prototyping, as it requires no infrastructure setup.
- **Feature Store Online Store**: with **`VertexFSVectorStore`** LangChain class, enables low-latency retrieval with manual or scheduled data sync. Perfect for production-ready user-facing Gen AI applications.

As part of this notebook you will learn how to:
1. Ingest data and embedding using BigQuery Vector Search with the class `BigQueryVectorStore`
2. Perform retrieval leveraging BigQuery Vector Search with the class `BigQueryVectorStore`
3. Transition to Vertex AI Feature Store with the class `VertexFSVectorStore` for low-latency retrieval
4. Understand pros and cons of both options through a performance deep dive

![bq_fs_diagram_journey.png](https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/retrieval-augmented-generation/bq_fs_diagram_journey.png)

## Get started

### Install Vertex AI SDK and other required packages

In [29]:
%pip install --upgrade --user --quiet google-cloud-aiplatform "langchain-google-vertexai" "langchain-google-community[featurestore]" google-cloud-bigquery pypdf==4.2.0

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/240.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m240.1/240.1 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25h

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [30]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>

### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [1]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [2]:
PROJECT_ID = "genai-llm01"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}


import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

## Getting started

### Import libraries

In [3]:
from langchain.chains import RetrievalQA
from langchain.globals import set_debug
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_google_community import BigQueryVectorStore, VertexFSVectorStore
from langchain_google_vertexai import VertexAI, VertexAIEmbeddings
from google.cloud import bigquery
from google.cloud.exceptions import NotFound

In [4]:
DATASET_ID = "bq_rag_embedding"  # @param {type:"string"}
TABLE = "fixmycar"  # @param {type:"string"}

In [5]:
client = bigquery.Client(project=PROJECT_ID)

In [6]:
# Set dataset_id to the ID of the dataset to create.
dataset = f"{PROJECT_ID}.{DATASET_ID}"

# Construct a full Dataset object to send to the API.
dataset_object = bigquery.Dataset(dataset)

# Specify the geographic location where the dataset should reside.
dataset_object.location = LOCATION

# Send the dataset to the API for creation, with an explicit timeout.
# Raises google.api_core.exceptions.Conflict if the Dataset already
# exists within the project.
try:
    client.get_dataset(dataset_object)  # Make an API request.
    print(f"Dataset {dataset} already exists")

except NotFound:
    dataset = client.create_dataset(dataset_object, timeout=30)  # Make an API request.
    print(f"Created dataset {client.project}.{dataset_object.dataset}")

Dataset genai-llm01.bq_rag_embedding already exists


## Add documents to `BigQueryVectorStore`

This step ingests and parse PDF documents, split them, generate embeddings and add the embeddings to the vector store. The document corpus used as dataset is a collection of owners car manual.

**Summary steps**
- Create text embeddings: LangChain `VertexAIEmbeddings`
- Ingest PDF files: LangChain `PyPDFLoader`
- Chunk documents: LangChain `TextSplitter`
- Create Vector Store: LangChain  `VertexAIFeatureStore`

### Create the Vertex AI Embedding model

In [7]:
embedding_model = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

### Ingest PDF file

The document is hosted on Cloud Storage bucket (at `gs://github-repo/generative-ai/sample-apps/fixmycar/cymbal-starlight-2024.pdf`) and LangChain provides a convenient document loader [`PyPDFLoader`](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) to load documents from pdfs.

In [8]:
GCS_BUCKET_DOCS = "github-repo/generative-ai/sample-apps/fixmycar"

# Copy the file to the current path
!gsutil cp "gs://$GCS_BUCKET_DOCS/*.pdf" .

Copying gs://github-repo/generative-ai/sample-apps/fixmycar/cymbal-starlight-2024.pdf...
/ [0 files][    0.0 B/328.9 KiB]                                                / [1 files][328.9 KiB/328.9 KiB]                                                
Operation completed over 1 objects/328.9 KiB.                                    


In [9]:
# Ingest PDF files
loader = PyPDFLoader("cymbal-starlight-2024.pdf")
documents = loader.load()

# Add document name and source to the metadata
for document in documents:
    doc_md = document.metadata
    document_name = doc_md["source"].split("/")[-1]
    # derive doc source from Document loader
    doc_source_prefix = "/".join(GCS_BUCKET_DOCS.split("/")[:3])
    doc_source_suffix = "/".join(doc_md["source"].split("/")[4:-1])
    source = f"{doc_source_prefix}/{doc_source_suffix}"
    document.metadata = {"source": source, "document_name": document_name}

print(f"# of documents loaded (pre-chunking) = {len(documents)}")

  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


# of documents loaded (pre-chunking) = 22


Verify document metadata

In [10]:
documents[0].metadata

{'source': 'github-repo/generative-ai/sample-apps/',
 'document_name': 'cymbal-starlight-2024.pdf'}

## Chunk documents - `TextSplitter`

Split the documents to smaller chunks. When splitting the document, ensure a few chunks can fit within the context length of LLM.

In [11]:
# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
)
doc_splits = text_splitter.split_documents(documents)

# Add chunk number to metadata
for idx, split in enumerate(doc_splits):
    split.metadata["chunk"] = idx

print(f"# of documents = {len(doc_splits)}")

# of documents = 60


In [12]:
doc_splits[0].metadata

{'source': 'github-repo/generative-ai/sample-apps/',
 'document_name': 'cymbal-starlight-2024.pdf',
 'chunk': 0}

## Configure `BigQueryVectorStore` as Vector Store

We will start the journey using BigQuery Vector Store as it requires no startup time, making the class great for prototyping.

You can initialize the class by providing:
- `project_id`
- `location`
- `dataset_name`
- `table_name`

The table will be used to store embeddings and metadata. You can also point to an existing table. The class will use [BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro) to perform vector search.

See [here](https://github.com/langchain-ai/langchain-google/blob/main/libs/community/langchain_google_community/bq_storage_vectorstores/bigquery.py#L26) for the full list of parameters of the class.

In [14]:
bq_store = BigQueryVectorStore(
    project_id=PROJECT_ID,
    location=LOCATION,
    dataset_name=DATASET_ID,
    table_name=TABLE,
    embedding=embedding_model,
)

INFO:langchain_google_community.bq_storage_vectorstores._base:BigQuery table genai-llm01.bq_rag_embedding.fixmycar initialized/validated as persistent storage. Access via BigQuery console:
 https://console.cloud.google.com/bigquery?project=genai-llm01&ws=!1m5!1m4!4m3!1sgenai-llm01!2sbq_rag_embedding!3sfixmycar


### Add documents to the store

Note: If you have precomputed embeddings, you can add text, embeddings and potential metadata using the method `add_texts_with_embeddings`

In [15]:
doc_ids = bq_store.add_documents(doc_splits)

Verify the `BigQueryVectorSearch` with similarity search

In [16]:
bq_store.similarity_search(
    "What should I do when I call the emergency roadside assistance?"
)

[Document(metadata={'doc_id': 'ee7bd1a855b543038e24860abefdb5ca', 'source': 'github-repo/generative-ai/sample-apps/', 'document_name': 'cymbal-starlight-2024.pdf', 'chunk': 56, 'score': 0.6757340150553269}, page_content="manual.md 2024-03-23\n21 / 22Wash your vehicle regularly to remove dirt and grime.\nWax your vehicle twice a year to protect the paint.\nCheck the tire pressure regularly and adjust it as needed.\nInspect the brakes regularly for wear and tear.\nKeep the interior of your vehicle clean and free of debris.\nBy following these tips, you can help to keep your Cymbal Starlight 2024 in top condition for many years to\ncome.\nChapter 18: Emergencies\nRoadside Assistance\nIf you experience a roadside emergency, such as a flat tire or a dead battery, you can call roadside\nassistance for help. Roadside assistance is available 24 hours a day, 7 days a week.\nTo call roadside assistance, dial the following number:\n1-800-555-1212\nWhen you call roadside assistance, be prepared to

### Get a langchain retriever
The retriever will be used in a LangChain Chain to find the most similar documents for a given query.

In [17]:
langchain_retriever = bq_store.as_retriever()

### Compose a LangChain Chain

We are going to use the [`RetrievalQA` chain](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa)
There are several different chain types available, listed [here](https://docs.langchain.com/docs/components/chains/index_related_chains).

In [20]:
# Set high verbosity
set_debug(True)

llm = VertexAI(model_name="gemini-pro")

search_query = "What should I do when calling the emergency roadside assistance?"  # @param {type:"string"}

retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=langchain_retriever
)
response = retrieval_qa.invoke(search_query)
print("\n################ Final Answer ################\n")
print(response["result"])

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What should I do when calling the emergency roadside assistance?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What should I do when calling the emergency roadside assistance?",
  "context": "manual.md 2024-03-23\n21 / 22Wash your vehicle regularly to remove dirt and grime.\nWax your vehicle twice a year to protect the paint.\nCheck the tire pressure regularly and adjust it as needed.\nInspect the brakes regularly for wear and tear.\nKeep the interior of your vehicle clean and free of debris.\nBy following these tips, you can help to keep your Cymbal Starlight 2024 in top condition for many years to\ncome.\nChapter 18: Emergencies\nRoadside Assistance\

### Filtering by metadata

It is possible to post-filter results by metadata by passing the filter parameter to any search method

`VertexFSVectorStore` also support metadata filter while performing search, for this to work:
- the `filter_columns` parameter must be passed to `VertexFSVectorStore` when the online feature store feature view is created (first time the class is initialised with a given online store name and feature view name).

- the `string_filters` parameter must be passed to any search method. Note only string fields are supported at the moment. See [here](https://github.com/googleapis/python-aiplatform/blob/8a4a41afe47aaff2f69a73e5011b34bcba5cd2e9/google/cloud/aiplatform_v1beta1/types/feature_online_store_service.py#L345)

In [21]:
bq_store.similarity_search(search_query, filter={"chunk": 28})

[Document(metadata={'doc_id': '625a716e9ca44b00a91a9232df5f6975', 'source': 'github-repo/generative-ai/sample-apps/', 'document_name': 'cymbal-starlight-2024.pdf', 'chunk': 28, 'score': 0.6867311598638806}, page_content='Stay in your vehicle and wait for emergency services to arrive.\nIf you are stranded in a remote area, stay with your vehicle and try to attract attention by waving your\narms or using a flashlight.')]

##Retrieval with memory

Now let’s take a slightly more sophisticated example. We will use the ConversationalRetrievalChain. This still uses BigQuery Vector Search, but persists previous conversation history in memory and adds it as context to the LLM response. This provides a conversational capability with your data.

In [28]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
conversational_retrieval = ConversationalRetrievalChain.from_llm(
    llm=llm, retriever=langchain_retriever, memory=memory
)

search_query = "Does the Cymbal Starlight have roadside assistance?"

conversational_retrieval.invoke(search_query)["answer"]


[32;1m[1;3m[chain/start][0m [1m[chain:ConversationalRetrievalChain] Entering Chain run with input:
[0m{
  "question": "Does the Cymbal Starlight have roadside assistance?",
  "chat_history": []
}
[32;1m[1;3m[chain/start][0m [1m[chain:ConversationalRetrievalChain > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:ConversationalRetrievalChain > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Does the Cymbal Starlight have roadside assistance?",
  "context": "the safety features in your Cymbal Starlight 2024. This chapter provides important information on how to\nuse the following emergency assistance features:\nHazard lights\nHorn\nEmergency roadside assistance\nTire repair kit\nHazard Lights\n\nHorn\nThe horn is used to alert other drivers and pedestrians of your presence.\nTo sound the horn, press the horn button located on the steering wheel.\nEmergency Roadside Ass

'Yes, the Cymbal Starlight 2024 comes with 24/7 emergency roadside assistance. If you experience a flat tire or a dead battery, you can call 1-800-555-1212 for help.'

We can then ask a follow up question without needing to provide much additional context, because the last question and answer are already passed through.

In [30]:
new_query = "What number do I call?"

result = conversational_retrieval.invoke(new_query)

print(result["answer"])

[32;1m[1;3m[chain/start][0m [1m[chain:ConversationalRetrievalChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:ConversationalRetrievalChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What number do I call?",
  "chat_history": "\nHuman: Does the Cymbal Starlight have roadside assistance?\nAssistant: Yes, the Cymbal Starlight 2024 comes with 24/7 emergency roadside assistance. If you experience a flat tire or a dead battery, you can call 1-800-555-1212 for help."
}
[32;1m[1;3m[llm/start][0m [1m[chain:ConversationalRetrievalChain > chain:LLMChain > llm:VertexAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n\nHuman: Does the Cymbal Starlight have roadside assistance?\nAssistant: Yes, the Cymbal Starlight 2024 comes with 24/7 emergency roadsid

# Appendix