# Building a GenAI RAG application with Feature Store and BigQuery

## Overview
This notebook guides you through building a low-latency vector search system for your GenAI application using Vertex AI Feature Store. We'll leverage the [Vertex Feature Store Langchain integration]([link to integration]) to streamline this process.

Feature Store seamlessly integrates with BigQuery, providing a unified data storage and flexible vector search options:

- **BigQuery Vector Search**: Ideal for batch retrieval and prototyping, as it requires no infrastructure setup.
- **Feature Store Online Store**: Enables low-latency retrieval with manual or scheduled data sync. Perfect for production-ready user-facing GenAI applications.

![Image notebook journey](diagram_journey.png)


# Setup


### Install libraries

In [None]:
!pip install langchain-google-vertexai pypdf==4.2.0 langchain pyarrow==16.0.0 db-dtypes==1.2.0 scikit-learn --upgrade

### Authenticating your notebook environment
* If you are using **Colab** to run this notebook, uncomment the cell below and continue.
* If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
# from google.colab import auth
# auth.authenticate_user()

### Import libraries

In [None]:
%load_ext autoreload
%autoreload 2
from langchain_community.document_loaders import PyPDFLoader
from langchain_google_vertexai import VertexAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_community.feature_store.bigquery import BigQueryVectorStore
from langchain_google_community.feature_store.featurestore import VertexFSVectorStore
from langchain_google_community.feature_store.local_store import BigQueryInMemoryVectorStore

### Define environment variables

In [None]:
PROJECT_ID = "cloud-llm-preview2"
DATASET = "vertex_documentation"
TABLE = "mytest99"
REGION = "europe-west4"

# Add documents to `BigQueryVectorStore`

This step ingests and parse PDF documents, split them, generate embeddings and add the embeddings to the vector store. The document corpus used as dataset is a collection of owners car manual.

**Summary steps**
- Create text embeddings: LangChain `VertexAIEmbeddings`
- Ingest PDF files: LangChain `PyPDFLoader`
- Chunk documents: LangChain `TextSplitter`
- Create Vector Store: LangChain  `VertexAIFeatureStore` 

### Create the VertexAI Embedding model

In [None]:
embedding_model = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

### Ingest PDF file

The document is hosted on Cloud Storage bucket (at `gs://github-repo/generative-ai/sample-apps/fixmycar/cymbal-starlight-2024.pdf`) and LangChain provides a convenient document loader [`PyPDFLoader`](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) to load documents from pdfs.


In [None]:
GCS_BUCKET_DOCS = (
    "github-repo/generative-ai/sample-apps/fixmycar"  # @param {type: "string"}
)

# Copy the file to the current path
!gsutil cp "gs://$GCS_BUCKET_DOCS/*.pdf" .

In [None]:
# Ingest PDF files
loader = PyPDFLoader("cymbal-starlight-2024.pdf")
documents = loader.load()

# Add document name and source to the metadata
for document in documents:
    doc_md = document.metadata
    document_name = doc_md["source"].split("/")[-1]
    # derive doc source from Document loader
    doc_source_prefix = "/".join(GCS_BUCKET_DOCS.split("/")[:3])
    doc_source_suffix = "/".join(doc_md["source"].split("/")[4:-1])
    source = f"{doc_source_prefix}/{doc_source_suffix}"
    document.metadata = {"source": source, "document_name": document_name}

print(f"# of documents loaded (pre-chunking) = {len(documents)}")

Verify document metadata

In [None]:
documents[0].metadata

## Chunk documents - TextSplitter

Split the documents to smaller chunks. When splitting the document, ensure a few chunks can fit within the context length of LLM.

In [None]:
# split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
)
doc_splits = text_splitter.split_documents(documents)

# Add chunk number to metadata
for idx, split in enumerate(doc_splits):
    split.metadata["chunk"] = idx

print(f"# of documents = {len(doc_splits)}")

In [None]:
doc_splits[0].metadata

## Configure `BigQueryVectorStore` as Vector Store

You are now ready to use BigQuery Vector Store!

You can initialize the class by providing:
- `project_id`
- `location`
- `dataset_name`
- `table_name`

The table will be used to store embeddings and metadata. You can also point to an existing table. 

The class will use [BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro) to perform vector search.

See [here](TODO) for the full list of parameters of the class. 

In [None]:
bq_store = BigQueryVectorStore(
    project_id=PROJECT_ID,
    location=REGION,
    dataset_name=DATASET,
    table_name=TABLE,
    embedding=embedding_model,
 )

### Add documents to the store

Note: If you have precomputed embeddings, you can add text, embeddings and potential metadata using the method `add_texts_with_embeddings`

In [None]:
bq_store.add_documents(doc_splits)

Verify the BigQueryVectorSearch with similarity search

In [None]:
bq_store.similarity_search(
    "What should I do when I call the emergency roadside assistance?"
)

### Get a langchain retriever
The retriever will be used in a Langchain Chain to find the most similar documents for a given query.

In [None]:
langchain_retriever = bq_store.as_retriever()

### Compose a Langchain Chain

We are going to use the [`RetrievalQA` chain](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa)
There are several different chain types available, listed [here](https://docs.langchain.com/docs/components/chains/index_related_chains).

In [None]:
%%time
from langchain_google_vertexai import VertexAI
from langchain.chains import RetrievalQA
from langchain.globals import set_debug

# Set high verbosity
set_debug(True)

llm = VertexAI(model_name="gemini-pro")

search_query = "What should I do when call the emergency roadside assistance?"  # @param {type:"string"}

retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=langchain_retriever
)
response = retrieval_qa.invoke(search_query)
print("\n################ Final Answer ################\n")
print(response["result"])

## Low latency Vector Search with FeatureStore

We are now ready to perform low latency serving with Feature Store! 

To do that, you can simply use the method `.get_vertex_fs_vector_store()`, to get a `VertexFSVectorStore` object

See the [function definition](TODO) for all the parameters you can use.

Note: Any method we run earlier can be equivalently called on both `BigQueryVectorStore` and `VertexFSVectorStore`. For instance it is possible to add new documents to an instance of `VertexFSVectorStore` as both stores share the same underlying BQ source.

In [None]:
vertex_fs = bq_store.get_vertex_fs_vector_store() # pass optional parameters here

### Alternatively you can also init the VertexFSVectorStore class directly

In [None]:
vertex_fs = VertexFSVectorStore(
    project_id=PROJECT_ID,
    location=REGION,
    dataset_name=DATASET,
    table_name=TABLE,
    embedding=embedding_model,
    # pass optional parameters here
 )

#### Kick off a synchronization process

We use the `sync` method to synchronize the data from BigQuery to the Feature Online Store, to achieve low latency serving.

When in a production environment, you can also use `cron_schedule` to setup an automatic scheduled synchronization. 

The synchronization process will take around ~20 minutes.

In [None]:
# force sync
vertex_fs.sync()

You can monitor the synchronization process from GCP Console: [Vertex AI Feature Store Tab](https://console.cloud.google.com/vertex-ai/feature-store/online-stores)

#### Serve with Feature Online Store

You are now ready to serve with Feature Store!

In [None]:
langchain_retriever = vertex_fs.as_retriever()

In [None]:
%%time
results = langchain_retriever.invoke("Leaks under the vehicle")
results

In [None]:
%%time
response = retrieval_qa.invoke(search_query)
print("\n################ Final Answer ################\n")
print(response["result"])

### Filtering by metadata

It is possible to post-filter results by metadata by passing the filter parameter to any search method

VertexFSVectorStore also support metadata filter while performing search, for this to work:
- the `filter_columns` parameter must be passed to `VertexFSVectorStore` when the online feature store feature view is created (first time the class is initialised with a given online store name and feature view name).

- the `string_filters` parameter must be passed to any search method. Note only string fields are supported at the moment. See [here](https://github.com/googleapis/python-aiplatform/blob/8a4a41afe47aaff2f69a73e5011b34bcba5cd2e9/google/cloud/aiplatform_v1beta1/types/feature_online_store_service.py#L345) 


In [None]:
# perform post search filtering
vertex_fs.similarity_search(search_query, filter={'chunk': 56})

### Batch search

For some use cases it is necessary to run batch searches (ie. when running a retrieval evaluation).

Instead of running a search for each query in a loop we can do that more efficiently by running a batch search.

While any of the classes introduced in this notebook can run batch searches, the most efficient way of doing it is by using the `BigQueryVectorStore`

In [None]:
# get a bq vector store back
bq_vector_store = vertex_fs.get_big_query_vector_store()

bq_vector_store.batch_search(
    embeddings=None, # can pass embeddings or
    queries=[search_query, search_query], # can pass queries
    with_scores=True, # return matching scores
    with_embeddings=True # return matched embeddings
)

# Appendix

### Local Bruteforce

You can also prototype by using a (local) bruteforce executor. During initialization, data is downloaded from BQ to your memory.

You can use it for prototyping when the number of documents is low. 

In [None]:
memory_store = BigQueryInMemoryVectorStore(
    project_id=PROJECT_ID,
    location=REGION,
    dataset_name=DATASET,
    table_name=TABLE,
    embedding=embedding_model
)
# sync the data from BQ
memory_store.sync()

In [None]:
memory_store.similarity_search(search_query)

In [None]:
memory_retriever = memory_store.as_retriever()
memory_retriever.invoke(search_query)

### Max Marginal Relevance

In [None]:
mmr_retriever = vertex_fs.as_retriever(search_type="mmr")
mmr_retriever.invoke(search_query)

### Get documents by ID

You can also use the function `get_documents` to retrieve a set of documents given a document ID:


In [None]:
vertex_fs.get_documents(ids=["6470d610abbd40e2af3c32ede29e09d5"])

### Remove documents by ID

You can also use the function `delete` to remove a set of documents given a document ID:

In [None]:
vertex_fs.delete(ids=["my_id1", "my_id2"])