# Building a GenAI RAG application with Feature Store and BigQuery

## Overview
This notebook guides you through building a low-latency vector search system for your GenAI application using Vertex AI Feature Store. We'll leverage the [Vertex Feature Store Langchain integration]([link to integration]) to streamline this process.

Feature Store seamlessly integrates with BigQuery, providing a unified data storage and flexible vector search options:

- **BigQuery Vector Search**: Ideal for batch retrieval and prototyping, as it requires no infrastructure setup.
- **Feature Store Online Store**: Enables low-latency retrieval with manual or scheduled data sync. Perfect for production-ready user-facing GenAI applications.
![Image notebook journey](diagram_journey.png)


# Setup


### Install libraries

In [None]:
!pip install langchain-google-vertexai pypdf==4.2.0 langchain pyarrow==16.0.0 db-dtypes==1.2.0 --upgrade

### Authenticating your notebook environment
* If you are using **Colab** to run this notebook, uncomment the cell below and continue.
* If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
# from google.colab import auth

# auth.authenticate_user()

### Import libraries

In [None]:
# import sys
# sys.path.append("../")

In [None]:
%load_ext autoreload
%autoreload 2
from langchain_google_vertexai.vectorstores.feature_store.featurestore import VertexFSVectorStore
from langchain_google_vertexai.vectorstores.feature_store.bigquery import BigQueryVectorStore
from langchain_google_vertexai.vectorstores.feature_store._base import BaseBigQueryVectorStore

# import logging
# logging.basicConfig(level=logging.INFO)

### Define environment variables

In [None]:
PROJECT_ID = "cloud-llm-preview2"
DATASET = "vertex_documentation_new1"
TABLE = "mytest4"
REGION = "europe-west4"

# Add documents to `VertexAIFeatureStore`

This step ingests and parse PDF documents, split them, generate embeddings and add the embeddings to the vector store. The document corpus used as dataset is a collection of owners car manual.

**Summary steps**
- Create text embeddings: LangChain `VertexAIEmbeddings`
- Ingest PDF files: LangChain `PyPDFLoader`
- Chunk documents: LangChain `TextSplitter`
- Create Vector Store: LangChain  `VertexAIFeatureStore` 

### Create the VertexAI Embedding model

In [None]:
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_community.vectorstores import BigQueryVectorSearch

embedding_model = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

### Ingest PDF file

The document is hosted on Cloud Storage bucket (at `gs://github-repo/generative-ai/sample-apps/fixmycar/cymbal-starlight-2024.pdf`) and LangChain provides a convenient document loader [`PyPDFLoader`](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) to load documents from pdfs.


In [None]:
GCS_BUCKET_DOCS = (
    "github-repo/generative-ai/sample-apps/fixmycar"  # @param {type: "string"}
)

# Copy the file to the current path
!gsutil cp "gs://$GCS_BUCKET_DOCS/*.pdf" .

In [None]:
# Ingest PDF files
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("cymbal-starlight-2024.pdf")
documents = loader.load()


# Add document name and source to the metadata
for document in documents:
    doc_md = document.metadata
    document_name = doc_md["source"].split("/")[-1]
    # derive doc source from Document loader
    doc_source_prefix = "/".join(GCS_BUCKET_DOCS.split("/")[:3])
    doc_source_suffix = "/".join(doc_md["source"].split("/")[4:-1])
    source = f"{doc_source_prefix}/{doc_source_suffix}"
    document.metadata = {"source": source, "document_name": document_name}

print(f"# of documents loaded (pre-chunking) = {len(documents)}")

Verify document metadata

In [None]:
documents[0].metadata

## Chunk documents - TextSplitter

Split the documents to smaller chunks. When splitting the document, ensure a few chunks can fit within the context length of LLM.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
)
doc_splits = text_splitter.split_documents(documents)

# Add chunk number to metadata
for idx, split in enumerate(doc_splits):
    split.metadata["chunk"] = idx

print(f"# of documents = {len(doc_splits)}")

In [None]:
doc_splits[0].metadata

## Configure `VertexFeatureStore` as Vector Store

You are now ready to start using Vertex Feature Store! 
You can initialize the class by providing `project_id`, `location`, a BQ `dataset_name` and `table_name` to be used to store embeddings. 
You can also point to an existing table. By default the class will use [BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro) to perform vector search.

See [here](TODO) for the full list of parameters of the class. 

In [50]:
# %load_ext autoreload
# %autoreload 2
# from langchain_google_vertexai import VertexFeatureStore
# PROJECT_ID = "cloud-llm-preview2"
# DATASET = "vertex_test"
# TABLE = "mytest5"
# REGION = "europe-west4"
vertex_fs = VertexFSVectorStore(
    project_id=PROJECT_ID,
    location=REGION,
    dataset_name=DATASET+"2",
    table_name=TABLE+"233",
    embedding=embedding_model,
 )

BigQuery table cloud-llm-preview2.vertex_documentation_new12.mytest4233 initialized/validated as persistent storage. Access via BigQuery console:
 https://console.cloud.google.com/bigquery?project=cloud-llm-preview2&ws=!1m5!1m4!4m3!1scloud-llm-preview2!2svertex_documentation_new12!3smytest4233
Creating feature store online store
name: "projects/323656405210/locations/europe-west4/featureOnlineStores/vertex_documentation_new12"

name: "projects/323656405210/locations/europe-west4/featureOnlineStores/vertex_documentation_new12"

VertexFSVectorStore initialized with Feature Store optimized Vector Search.
Batch serving accessible through .to_big_query_vector_store() method


In [44]:
%%time
vertex_fs.get_documents(ids=["5263a6e149ec43b7bf6c723bd3aabb56"])

CPU times: user 8.89 ms, sys: 6.36 ms, total: 15.3 ms
Wall time: 398 ms




In [25]:
vertex_fs.add_documents(doc_splits)

Creating FeatureView
Create FeatureView backing LRO: projects/323656405210/locations/europe-west4/featureOnlineStores/vertex_documentation_new1/featureViews/mytest4233/operations/8342730018337587200
FeatureView created. Resource name: projects/323656405210/locations/europe-west4/featureOnlineStores/vertex_documentation_new1/featureViews/mytest4233
To use this FeatureView in another session:
feature_view = aiplatform.FeatureView('projects/323656405210/locations/europe-west4/featureOnlineStores/vertex_documentation_new1/featureViews/mytest4233')
Sync ongoing, waiting for 30 seconds.
Sync ongoing, waiting for 30 seconds.
Sync ongoing, waiting for 30 seconds.
Sync ongoing, waiting for 30 seconds.
Sync ongoing, waiting for 30 seconds.
Sync ongoing, waiting for 30 seconds.
Sync Succeed for projects/cloud-llm-preview2/locations/europe-west4/featureOnlineStores/vertex_documentation_new1/featureViews/mytest4233/featureViewSyncs/6192508723823378432.


['0117f1ef1c7a48e4823ff52ff840d2a4',
 'e2cbe8b4e004462ea6b62dff51b21977',
 '1d829b6aed7443cf99de18c210f154e8',
 '7d3f64cebd9a44c693e9e14046284043',
 '269a00e6f42f4875ae49a3fa7feffc38',
 'b339829e85c749d3a83cc415221dcb58',
 'c674e302ea174d24a632f5d6f797eea6',
 '19cfb5ce9f994b0a8f38a6f77e477951',
 '1d0cff4b82fe4dfdae98e49151db1647',
 'ae0c3bfec0894ff3a09a241e8550a1e3',
 'abf9c711279444c6b3fd30b1456e01a9',
 '5263a6e149ec43b7bf6c723bd3aabb56',
 '22cb5d9f95cd41228dd740f09075afda',
 '792b75710b6b42c3adb8366bc997d82d',
 'fdead3e4028948d4a10591416a903ccc',
 '9930f35c377c47c486081504e17d1ad3',
 'f5bafb6b4e7b4183a13899bf131c8a62',
 'af09d62aa1604fda892455ad82fc953b',
 '29a77980594147de9816b9cbd665bf58',
 '3f7c01cd57f84db69c7a9a7121f58ec1',
 'bd7a6991ca0443f9a78cd1efeb2fcb03',
 '5d2e1800d87f4489a297be2a9e620dc7',
 '91ed57ed492a451c88c487c067d38130',
 '81f0427308d046e0b733c2ef8087497b',
 'c39b3a39d7d247ef84674b7d3958ac79',
 '1652cae8a50b4a3b9b92c19705eaa5f2',
 '9bf3e8c37ea542ef85980dfa3eeeb14c',
 

In [None]:
from google.cloud.aiplatform_v1beta1 import NearestNeighborQuery




In [49]:
%%time

vertex_fs.similarity_search(
            "treat",
            k=6,
            # \string_filters=[
            #    NearestNeighborQuery.StringFilter({"name":"kind","deny_tokens":["treat"]})
            # ]
        )

CPU times: user 22.8 ms, sys: 3.45 ms, total: 26.2 ms
Wall time: 431 ms


[Document(page_content="manual.md 2024-03-23\n2 / 22VSC can be turned off by pressing the VSC OFF button on the dashboard. However, it is\nrecommended to leave VSC on for optimal safety.\nAnti-Lock Braking System (ABS)\nABS prevents the wheels from locking during braking, allowing you to maintain control of the vehicle.\nABS can be felt as a pulsation in the brake pedal during braking. Do not release the brake pedal;\ncontinue applying steady pressure until the vehicle comes to a stop.\nTire Safety\nMaintain proper tire pressure at all times (see the Tire Pressure Information label on the driver's door\njamb).\nCheck tire tread depth regularly and replace tires when they reach the minimum tread depth of 2/32\ninches.\nAvoid sudden starts, stops, and turns that can cause excessive tire wear.\nVehicle Inspection\nInspect your vehicle regularly for any signs of damage or malfunction, including:\nLeaks under the vehicle\nUnusual noises or vibrations\nDim or flickering lights\nWorn or damag

Verify the BigQueryVectorSearch with similarity search

### Get a langchain retriever
The retriever will be used in a Langchain Chain to find the most similar documents for a given query.

In [32]:
langchain_retriever = vertex_fs.as_retriever()

### Compose a Langchain Chain

We are going to use the [`RetrievalQA` chain](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa)
There are several different chain types available, listed [here](https://docs.langchain.com/docs/components/chains/index_related_chains).

In [33]:
%%time
from langchain_google_vertexai import VertexAI
from langchain.chains import RetrievalQA
from langchain.globals import set_debug

# Set high verbosity
set_debug(True)

llm = VertexAI(model_name="gemini-pro")

search_query = "What should I do when call the emergency roadside assistance?"  # @param {type:"string"}

retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=langchain_retriever
)
response = retrieval_qa.invoke(search_query)
print("\n################ Final Answer ################\n")
print(response["result"])

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What should I do when call the emergency roadside assistance?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What should I do when call the emergency roadside assistance?",
  "context": "manual.md 2024-03-23\n21 / 22Wash your vehicle regularly to remove dirt and grime.\nWax your vehicle twice a year to protect the paint.\nCheck the tire pressure regularly and adjust it as needed.\nInspect the brakes regularly for wear and tear.\nKeep the interior of your vehicle clean and free of debris.\nBy following these tips, you can help to keep your Cymbal Starlight 2024 in top condition for many years to\ncome.\nChapter 18: Emergencies\nRoadside Assistance\nIf yo

## Low latency Vector Search with FeatureStore

We are now ready to perform low latency serving with Feature Store! 

To do that, you can simply use the method `set_executor`, to `feature_online_store` type. 

See the [function definition](TODO) for all the parameters you can use.

In [None]:
vertex_fs.set_executor({"type": "feature_online_store"})

#### Kick off a synchronization process

You can use the method `sync` to synchronize the data from BigQuery to the Feature Online Store, to achieve low latency serving.
When in a production environment, you can also use `cron_schedule` to setup an automatic scheduled synchronization. 

The synchronization process will take around ~20 minutes. 

In [None]:
vertex_fs.sync()

You can also monitor the synchronization process from GCP Console: [Vertex AI Feature Store Tab](https://console.cloud.google.com/vertex-ai/feature-store/online-stores)

#### Serve with Feature Online Store

You are now ready to serve with Feature Store! You can re-use the same retriever to perform low-latency Vector Search.

In [None]:
results = langchain_retriever.invoke(search_query)
results[0]

In [None]:
%%time
results = langchain_retriever.invoke("Leaks under the vehicle")

In [None]:
%%time
response = retrieval_qa.invoke(search_query)
print("\n################ Final Answer ################\n")
print(response["result"])

### Filtering by metadata


# Appendix

We add here other useful examples to work with the `VertexFeatureStore` Langchain integration.

### Local Bruteforce

You can also prototype by using a (local) bruteforce executor. During initialization, data is downloaded from BQ to your memory.

You can use it for prototyping when the number of documents is low. 

### Get documents by ID

You can also use the function `get_documents` to retrieve a set of documents given a document ID:


In [34]:
vertex_fs.get_documents(ids=["65a6b83ec5ae428ab8a9607a4f845e47"])

NotFound: 404 The FeatureOnlineStore does not exist.