![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI%2FRetrieval&file=Retrieval+-+Vertex+AI+Feature+Store.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Vertex%20AI%20Feature%20Store.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FRetrieval%2FRetrieval%2520-%2520Vertex%2520AI%2520Feature%2520Store.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Vertex%20AI%20Feature%20Store.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Vertex%20AI%20Feature%20Store.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Retrieval - Vertex AI Feature Store

In prior workflows, a series of documents was [processed into chunks](../Chunking/readme.md), and for each chunk, [embeddings](../Embeddings/readme.md) were created:

- Process: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb)
- Embed: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

Retrieving chunks for a query involves calculating the embedding for the query and then using similarity metrics to find relevant chunks. A thorough review of similarity matching can be found in [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb) - use dot product! As development moves from experiment to application, the process of storing and computing similarity is migrated to a [retrieval](./readme.md) system. This workflow is part of a [series of workflows exploring many retrieval systems](./readme.md).  

A detailed [comparison of many retrieval systems](./readme.md#comparison-of-vector-database-solutions) can be found in the readme as well.

---

**Vertex AI Feature Store For Storage, Indexing, And Search**

[Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview) creates online feature views from BigQuery tables or views, either directly or through the feature registry, and synchronizes the source data in BigQuery with the online serving. The online stores provide a low-latency API for vector matching using either vectors or entity IDs as input. The response includes all features in the feature view, which can encompass the text and metadata for the match, as these can be included as features. Learn more about Vertex AI Feature Store in this repository's [MLOps](../../MLOps/readme.md) section, which includes a deep dive into [feature stores](../../MLOps/Feature%20Store/readme.md).

---

**Use Case Data**

Buying a home usually involves borrowing money from a lending institution, typically through a mortgage secured by the home's value. But how do these institutions manage the risks associated with such large loans, and how are lending standards established?

In the United States, two government-sponsored enterprises (GSEs) play a vital role in the housing market:

- Federal National Mortgage Association ([Fannie Mae](https://www.fanniemae.com/))
- Federal Home Loan Mortgage Corporation ([Freddie Mac](https://www.freddiemac.com/))

These GSEs purchase mortgages from lenders, enabling those lenders to offer more loans. This process also allows Fannie Mae and Freddie Mac to set standards for mortgages, ensuring they are responsible and borrowers are more likely to repay them. This system makes homeownership more affordable and stabilizes the housing market by maintaining a steady flow of liquidity for lenders and keeping interest rates controlled.

However, navigating the complexities of these GSEs and their extensive servicing guides can be challenging.

**Approaches**

[This series](../readme.md) covers many generative AI workflows. These documents are used directly as long context for Gemini in the workflow [Long Context Retrieval With The Vertex AI Gemini API](../Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb). The workflow below uses a [retrieval](./readme.md) approach with the already generated chunks and embeddings.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.69.0'),
    ('google.cloud.bigquery', 'google-cloud-bigquery')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'retrieval-vertex-feature-store'

# make this the BigQuery Project / Dataset / Table prefix to store results
BQ_PROJECT = PROJECT_ID
BQ_DATASET = SERIES.replace('-', '_')
BQ_TABLE = EXPERIMENT
BQ_REGION = REGION[0:2]

# Vertex AI Feature Store names:
FS_NAME = PROJECT_ID.replace('-', '_')
FV_NAME = f"{SERIES}-{EXPERIMENT}".replace('-', '_')

Packages

In [8]:
import os, json, time, glob, datetime

import numpy as np

# Vertex AI
from google.cloud import aiplatform
import vertexai.language_models # for embeddings API
import vertexai.generative_models # for Gemini Models
from vertexai.resources.preview import feature_store

# BigqQery
from google.cloud import bigquery

In [9]:
aiplatform.__version__

'1.71.0'

Clients

In [10]:
# Vertex AI clients
vertexai.init(project = PROJECT_ID, location = REGION)

# BigQuery client
bq = bigquery.Client(project = PROJECT_ID)

---
## Text & Embeddings For Examples

This repository contains a [section for document processing (chunking)](../Chunking/readme.md) that includes an example of processing mulitple large pdfs (over 1000 pages) into chunks: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb).  The chunks of text from that workflow are stored with this repository and loaded by another companion workflow that augments the chunks with text embeddings: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb).

The following code will load the version of the chunks that includes text embeddings and prepare it for a local example of retrival augmented generation.

### Get The Documents

If you are working from a clone of this notebooks [repository](https://github.com/statmike/vertex-ai-mlops) then the documents are already present. The following cell checks for the documents folder and if it is missing gets it (`git clone`):

In [11]:
local_dir = '../Embeddings/files/embeddings-api'

In [12]:
if not os.path.exists(local_dir):
    print('Retrieving documents...')
    parent_dir = os.path.dirname(local_dir)
    temp_dir = os.path.join(parent_dir, 'temp')
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)
    !git clone https://www.github.com/statmike/vertex-ai-mlops {temp_dir}/vertex-ai-mlops
    shutil.copytree(f'{temp_dir}/vertex-ai-mlops/Applied GenAI/Embeddings/files/embeddings-api', local_dir)
    shutil.rmtree(temp_dir)
    print(f'Documents are now in folder `{local_dir}`')
else:
    print(f'Documents Found in folder `{local_dir}`')             

Documents Found in folder `../Embeddings/files/embeddings-api`


### Load The Chunks

In [13]:
jsonl_files = glob.glob(f"{local_dir}/large-files*.jsonl")
jsonl_files.sort()
jsonl_files

['../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0000.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0001.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0002.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0003.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0004.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0005.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0006.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0007.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0008.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0009.jsonl']

In [14]:
chunks = []
for file in jsonl_files:
    with open(file, 'r') as f:
        chunks.extend([json.loads(line) for line in f])
len(chunks)

9040

### Review A Chunk

In [15]:
chunks[0].keys()

dict_keys(['instance', 'predictions', 'status'])

In [16]:
chunks[0]['instance']['chunk_id']

'fannie_part_0_c17'

In [17]:
print(chunks[0]['instance']['content'])

# Selling Guide Fannie Mae Single Family

## Fannie Mae Copyright Notice

### Fannie Mae Copyright Notice

|-|
| Section B3-4.2, Verification of Depository Assets 402 |
| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |
| B3-4.2-02, Depository Accounts (12/14/2022) 405 |
| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |
| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |
| B3-4.2-05, Foreign Assets (05/04/2022) 411 |
| Section B3-4.3, Verification of Non-Depository Assets 412 |
| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |
| B3-4.3-02, Trust Accounts (04/01/2009) 413 |
| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |
| B3-4.3-04, Personal Gifts (09/06/2023) 415 |
| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |
| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |
| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |
| B3-4.3-08, Employer Assistance (09/29/2015) 423 |
| B3-4.3-09,

In [18]:
chunks[0]['predictions'][0]['embeddings']['values'][0:10]

[0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732,
 0.05066155269742012,
 0.046544693410396576,
 0.05509665608406067,
 -0.014074751175940037,
 0.008380400016903877]

### Prepare Chunk Structure

Make a list of dictionaries with information for each chunk:

In [19]:
content_chunks = [
    dict(
        gse = chunk['instance']['gse'],
        chunk_id = chunk['instance']['chunk_id'],
        content = chunk['instance']['content'],
        embedding = chunk['predictions'][0]['embeddings']['values']
    ) for chunk in chunks
]

### Query Embedding

Create a query, or prompt, and get the embedding for it:

Connect to models for text embeddings. Learn more about the model API:
- [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

In [20]:
question = "Does a lender have to perform servicing functions directly?"

In [21]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')

In [22]:
question_embedding = embedder.get_embeddings([question])[0].values
question_embedding[0:10]

[-0.0005117303808219731,
 0.009651427157223225,
 0.01768726110458374,
 0.014538003131747246,
 -0.01829824410378933,
 0.027877431362867355,
 -0.021124685183167458,
 0.008830446749925613,
 -0.02669006586074829,
 0.06414774805307388]

---
## Load To BigQuery - The Offline Store For Vertex AI Feature Store

The offline store for [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview) is BigQuery.  This streamlines ML feature management prior to serving online with feature store.  We will first load the data to BigQuery.

In this case the information to load to BigQuery is local.  It could be in GCS or other BigQuery sources.  You can also get embeddings for information within BigQuery using the [ML.GENERATE_EMBEDDING function](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding) and even use the exact same model as was used for the data imported above.

### Create/Recall Dataset

In [28]:
dataset = bigquery.Dataset(f"{BQ_PROJECT}.{BQ_DATASET}")
dataset.location = BQ_REGION
bq_dataset = bq.create_dataset(dataset, exists_ok = True)

### Load JSON TO BigQuery Table

In [29]:
bq_table = bq_dataset.table(BQ_TABLE)

In [30]:
job_config = bigquery.LoadJobConfig(
    source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
    write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE,
    autodetect = True
)

In [31]:
load_job = bq.load_table_from_json(
    json_rows = content_chunks,
    destination = bq_table,
    job_config = job_config
)
load_job.result()

LoadJob<project=statmike-mlops-349915, location=US, id=51dbc28a-9236-40bf-abaf-8e8671d66c58>

In [32]:
bq.query(f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` LIMIT 5").to_dataframe()

Unnamed: 0,chunk_id,embedding,content,gse
0,freddie_part_1_c275,"[-0.010820388793945312, 0.03051573783159256, 0...",# (g) Texas Equity Uniform Instruments and oth...,freddie
1,freddie_part_6_c129,"[0.033966064453125, 0.02275390923023224, 0.038...",# 9209.5: Property valuation requirements for ...,freddie
2,freddie_part_4_c91,"[0.021045684814453125, 0.0005818947684019804, ...",# 6302.16: Special delivery requirements for r...,freddie
3,freddie_part_2_c479,"[0.0327911376953125, -0.03755122050642967, -0....",# 5401.1: Monthly housing expense-to-income ra...,freddie
4,freddie_part_3_c576,"[0.02548600733280182, 0.06772162765264511, -0....",# 6201.14: Postsettlement adjustments for Mort...,freddie


---
## Retrieval With BigQuery

BigQuery is a fully managed data warehouse where SQL queries run without the need to plan for storage or compute requirements.  Built into this solution is the  `VECTOR_SEARCH` function that can perform brute-force searches for neighboring embeddings and utilize an index for efficient search. BigQuery offers two built-in methods for [creating vector indexes](https://cloud.google.com/bigquery/docs/vector-index): the [Inverted File (IVF) index](https://cloud.google.com/bigquery/docs/vector-index#ivf-index) and the [TreeAH index](https://cloud.google.com/bigquery/docs/vector-index#tree-ah-index).

Check out a companion workflow using BigQuery here: 
- [Retrieval - BigQuery Vector Indexing And Search](Retrieval%20-%20BigQuery%20Vector%20Indexing%20And%20Search.ipynb)

This workflow continues with Vertex AI Feature Store as an online serving infrasturcture that syncs with BigQuery.

---
## Retrieval With Vertex AI Feature Store

[Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview) creates online feature views from BigQuery tables or views, either directly or through the feature registry, and synchronizes the source data in BigQuery with the online serving. The online stores provide a low-latency API for vector matching using either vectors or entity IDs as input. The response includes all features in the feature view, which can encompass the text and metadata for the match, as these can be included as features. Learn more about Vertex AI Feature Store in this repository's [MLOps](../../MLOps/readme.md) section, which includes a deep dive into [feature stores](../../MLOps/Feature%20Store/readme.md).

### Create/Retrieve Online Store

The first step is to create a Vertex AI Feature Store.  There are two serving types to choose from when setting up a feature store: Bigtable and Optimized.  For this work the Optimized online serving is picked becasue it also [provides vector similarity search](https://cloud.google.com/vertex-ai/docs/featurestore/latest/embeddings-search) functionality that Bigtable serving does not.
>**NOTE:** This can take around 10 minutes if creating a new feature store instance

**Reference:**
- [Create an Online Store Instance](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-onlinestore)
- [Online Serving Types](https://cloud.google.com/vertex-ai/docs/featurestore/latest/online-serving-types)

In [33]:
try:
    online_store = feature_store.FeatureOnlineStore(name = FS_NAME)
    print(f"Found the feature store:\n{online_store.resource_name}")
except Exception:
    print("Create the feature store...")
    online_store = feature_store.FeatureOnlineStore.create_optimized_store(
        name = FS_NAME
    )
    print(f"Create the feature store:\n{online_store.resource_name}")

In [34]:
online_store.resource_name

'projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915'

In [35]:
online_store.name

'statmike_mlops_349915'

### Create/Retrieve Feature View With Vector Index: From BigQuery Source

There are two paths to [creating feature views](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-featureview) in feature store. The one used here is syncing a BigQuery table or view directly to the online store. The alternative involves using the feature registry which gives greater control of selecting features (columns) form multiple BigQuery source tables and views.  Learn more about Vertex AI Feature Store in this repository's [MLOps](../../MLOps/readme.md) section, which includes a deep dive into [feature stores](../../MLOps/Feature%20Store/readme.md).

**Vector Index**

The feature view specification includes the vector index specification.  This includes set the column name with the embedding and dimension of the embedding.  For the index you can set:
- `algorithm_config` - can be `TreeAhConfig` or `BruteForceConfig`
- `filter_columns` for any pre-filter columns
- `crowding_column` for any column that may be used to specify limits by unique value for during queries
- `distance_measure_type` to select Euclidean distance, Cosine Similarity, or Dot Product as the measurement method
    - Use 3 for dot product here for these normalized embeddings and read more about why in [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb)

**Reference:**
- [Create a feature view instance](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-featureview)

In [36]:
try:
    feature_view = feature_store.FeatureView(
        name = FV_NAME,
        feature_online_store_id = online_store.resource_name
    )
except Exception:
    feature_view = online_store.create_feature_view(
        name = FV_NAME,
        source = feature_store.utils.FeatureViewBigQuerySource(
            uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}',
            entity_id_columns = ['chunk_id'] 
        ),
        index_config = feature_store.utils.IndexConfig(
            embedding_column = 'embedding',
            dimensions = len(question_embedding),
            algorithm_config = feature_store.utils.TreeAhConfig(
                leaf_node_embedding_count = 250 # default is 1000
            ), #TreeAHConfig is default, can override here with BruteForceConfig
            filter_columns = ['gse'],
            crowding_column = 'gse',
            distance_measure_type = feature_store.utils.DistanceMeasureType(3) # 1=Euclidean/L2, 2=Cosine, 3=dot product
        ),
        sync_config = 'TZ=America/New_York 10 * * * *' # 10 minutes past the hour, every hour
    )   

In [37]:
feature_view.name

'applied_genai_retrieval_vertex_feature_store'

In [38]:
feature_view.resource_name

'projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915/featureViews/applied_genai_retrieval_vertex_feature_store'

### Managing Synchronization

Force a synchronization rather than wait for the next scheduled sync:

In [39]:
force_sync = feature_view.sync()

In [40]:
type(force_sync)

vertexai.resources.preview.feature_store.feature_view.FeatureView.FeatureViewSync

In [41]:
force_sync.to_dict()

{'name': 'projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915/featureViews/applied_genai_retrieval_vertex_feature_store/featureViewSyncs/895940494457044992',
 'createTime': '2024-11-10T19:55:01.636814Z',
 'runTime': {'startTime': '2024-11-10T19:55:01.636814Z'}}

Get updated information about the sync job:

In [42]:
force_sync = feature_view.get_sync(name = force_sync.name)
force_sync.to_dict()

{'name': 'projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915/featureViews/applied_genai_retrieval_vertex_feature_store/featureViewSyncs/895940494457044992',
 'createTime': '2024-11-10T19:55:01.636814Z',
 'runTime': {'startTime': '2024-11-10T19:55:01.636814Z'}}

Wait on the sync job to complete and report timing and rows synced:

In [43]:
waited = 0
while True:
    sync_status = feature_view.get_sync(name = force_sync.name).to_dict()
    if 'endTime' in list(sync_status['runTime'].keys()):
        seconds = (
            datetime.datetime.fromisoformat(sync_status['runTime']['endTime'].replace('Z', '+00:00'))
            -
            datetime.datetime.fromisoformat(sync_status['runTime']['startTime'].replace('Z', '+00:00'))
        ).total_seconds()
        rows = sync_status['syncSummary']['rowSynced']
        print(f"Sync completed in {seconds} seconds and synced {rows} rows.")
        break
    else:
        print(f"Waited {waited} seconds, Update again in 30 seconds...")
        time.sleep(30)
        waited += 30

Waited 0 seconds, Update again in 30 seconds...
Waited 30 seconds, Update again in 30 seconds...
Waited 60 seconds, Update again in 30 seconds...
Waited 90 seconds, Update again in 30 seconds...
Waited 120 seconds, Update again in 30 seconds...
Sync completed in 154.04748 seconds and synced 27120 rows.


Get a list of sync jobs:

In [44]:
list_syncs = feature_view.list_syncs()

Print out the end time and rows synced for each job:

In [None]:
for sync in list_syncs:
    s = feature_view.get_sync(name = sync.name).to_dict()
    ended = datetime.datetime.fromisoformat(s['runTime']['endTime'].replace('Z', '+00:00')).strftime("%m/%d/%Y %H:%M:%S")
    rows = s['syncSummary']['rowSynced']
    print(f"Sync completed at {ended} and synced {rows} rows.")

### Retrieve: Features For Entity

**NOTE:** The embedding is also retrieved.

In [46]:
results = feature_view.read(key = ['fannie_part_0_c1']).to_dict()['features']

Public endpoint for the optimized online store statmike_mlops_349915 is 6457115130579648512.us-central1-1026793852137.featurestore.vertexai.goog


In [47]:
for f, feature in enumerate(results):
    if feature['name'] == 'embedding':
        results.pop(f)
results

[{'name': 'content', 'value': {'string_value': 'Fannie Mae'}},
 {'name': 'gse', 'value': {'string_value': 'fannie'}}]

### Matches: For Entity

Given an entity id value find the neighbors using the similarity search with embeddings:

In [48]:
results = feature_view.search(
    entity_id = 'fannie_part_0_c40',
    neighbor_count = 5,
    return_full_entity = False,
    #per_crowding_attribute_neighbor_count = 5,
    #leaf_nodes_search_fraction = 0.25
)

In [49]:
results

SearchNearestEntitiesResponse(_response=nearest_neighbors {
  neighbors {
    entity_id: "fannie_part_0_c39"
    distance: -0.90960991382598877
  }
  neighbors {
    entity_id: "fannie_part_0_c38"
    distance: -0.84551787376403809
  }
  neighbors {
    entity_id: "fannie_part_0_c5"
    distance: -0.804158091545105
  }
  neighbors {
    entity_id: "fannie_part_0_c37"
    distance: -0.79278743267059326
  }
  neighbors {
    entity_id: "fannie_part_0_c6"
    distance: -0.7925485372543335
  }
}
)

### Matches: For Entity Returning All Features

Given an entity id value find the neighbors using the similarity search with embeddings and return all the features for the matches:

In [50]:
results = feature_view.search(
    entity_id = 'fannie_part_0_c40',
    neighbor_count = 2,
    return_full_entity = True
)

In [51]:
matches = []
for result in results.to_dict()['neighbors']:
    for feature in result['entity_key_values']['key_values']['features']:
        if feature['name'] == 'content':
            matches.append(dict(
                chunk_id = result['entity_id'],
                content = feature['value']['string_value']
            ))
matches

[{'chunk_id': 'fannie_part_0_c39',
  'content': '# Selling Guide Fannie Mae Single Family\n\n## Fannie Mae Copyright Notice\n\n### Fannie Mae Copyright Notice\n\nE-3-08, Acronyms and Glossary of Defined Terms: H (04/05/2023) 1129\n| E-3-09, Acronyms and Glossary of Defined Terms: I (05/03/2023) 1131 |\n| E-3-10, Acronyms and Glossary of Defined Terms: J (04/01/2009) 1134 |\n| applicable terms. E-3-11, Acronyms and No 1134 Glossary of Defined Terms: K (12/14/2022) 1134 |\n| applicable terms. E-3-12, Acronyms and Glossary of Defined Terms: L (09/04/2018) No 1134 1134 |\n| and Glossary of Defined Terms: M (12/13/2023) E-3-13, Acronyms 1137 |\n| E-3-14, Acronyms Terms: N (05/26/2015) Defined Glossary of and 1142 |\n| Terms: O (11/10/2014) Defined Glossary of and Acronyms E-3-15, 1143 |\n| E-3-16, Acronyms E-3-17, Acronyms Terms: P (02/07/2024) Glossary of Defined and 1144 and Glossary of Defined Terms: Q (04/01/2009) 1148 |\n| No applicable terms. 1148 |\n| Terms: R (02/07/2024) Glossary o

### Matches: For Query Embedding

Given a query embedding find the neighbors using the similarity search with embeddings and return all the feature for the matches:

In [52]:
question

'Does a lender have to perform servicing functions directly?'

In [53]:
len(question_embedding)

768

In [54]:
results = feature_view.search(
    embedding_value = question_embedding,
    neighbor_count = 2,
    return_full_entity = True
)

In [55]:
matches = []
for result in results.to_dict()['neighbors']:
    for feature in result['entity_key_values']['key_values']['features']:
        if feature['name'] == 'content':
            matches.append(dict(
                chunk_id = result['entity_id'],
                content = feature['value']['string_value']
            ))
matches

[{'chunk_id': 'fannie_part_0_c352',
  'content': '# A3-3-03, Other Servicing Arrangements (12/15/2015)\n\nIntroduction This topic provides an overview of other servicing arrangements, including: • Subservicing • General Requirements for Subservicing Arrangements • Pledge of Servicing Rights and Transfer of Interest in Servicing Income\n\n## Subservicing\n\nA lender may use other organizations to perform some or all of its servicing functions. Fannie Mae refers to these arrangements as “subservicing” arrangements, meaning that a servicer (the “subservicer”) other than the contractually responsible servicer (the “master” servicer) is performing the servicing functions. The following are not considered to be subservicing arrangements: • when a computer service bureau is used to perform accounting and reporting functions; • when the originating lender sells and assigns servicing to another lender, unless the originating lender continues to be the contractually responsible servicer.'},
 {'c

### Matches: For Query Embedding - Expand Search Candidates

Given a query embedding find the neighbors using the similarity search with embeddings and return all the feature for the matches.  Expand the search candidates by providing an override for `leaf_nodes_search_fraction`:

In [56]:
results = feature_view.search(
    embedding_value = question_embedding,
    neighbor_count = 2,
    return_full_entity = True,
    leaf_nodes_search_fraction = 0.5
)

In [57]:
matches = []
for result in results.to_dict()['neighbors']:
    for feature in result['entity_key_values']['key_values']['features']:
        if feature['name'] == 'content':
            matches.append(dict(
                chunk_id = result['entity_id'],
                content = feature['value']['string_value']
            ))
matches

[{'chunk_id': 'fannie_part_0_c352',
  'content': '# A3-3-03, Other Servicing Arrangements (12/15/2015)\n\nIntroduction This topic provides an overview of other servicing arrangements, including: • Subservicing • General Requirements for Subservicing Arrangements • Pledge of Servicing Rights and Transfer of Interest in Servicing Income\n\n## Subservicing\n\nA lender may use other organizations to perform some or all of its servicing functions. Fannie Mae refers to these arrangements as “subservicing” arrangements, meaning that a servicer (the “subservicer”) other than the contractually responsible servicer (the “master” servicer) is performing the servicing functions. The following are not considered to be subservicing arrangements: • when a computer service bureau is used to perform accounting and reporting functions; • when the originating lender sells and assigns servicing to another lender, unless the originating lender continues to be the contractually responsible servicer.'},
 {'c

### Matches: For Query Embedding - Diversity of Responses

Given a query embedding find the neighbors using the similarity search with embeddings and return all the feature for the matches.  Use the crowding attribute, gse, to request

In [58]:
results = feature_view.search(
    embedding_value = question_embedding,
    #neighbor_count = 4,
    return_full_entity = True,
    per_crowding_attribute_neighbor_count = 4
)

In [59]:
matches = []
for result in results.to_dict()['neighbors']:
    for feature in result['entity_key_values']['key_values']['features']:
        if feature['name'] == 'content':
            matches.append(dict(
                chunk_id = result['entity_id'],
                #content = feature['value']['string_value']
            ))
matches

[{'chunk_id': 'fannie_part_0_c352'},
 {'chunk_id': 'freddie_part_4_c509'},
 {'chunk_id': 'freddie_part_4_c510'},
 {'chunk_id': 'fannie_part_0_c353'}]

### Matches: For Query Embedding - Limit Search Area

Given a query embedding find the neighbors using the similarity search with embeddings and return all the feature for the matches.  Use the attribute GSE to limit search to values with 'fannie':

In [60]:
results = feature_view.search(
    embedding_value = question_embedding,
    neighbor_count = 4,
    return_full_entity = True,
    string_filters = [aiplatform.gapic.NearestNeighborQuery.StringFilter(name = 'gse', allow_tokens = ['fannie'])]
)

In [61]:
matches = []
for result in results.to_dict()['neighbors']:
    for feature in result['entity_key_values']['key_values']['features']:
        if feature['name'] == 'content':
            matches.append(dict(
                chunk_id = result['entity_id'],
                #content = feature['value']['string_value']
            ))
matches

[{'chunk_id': 'fannie_part_0_c352'},
 {'chunk_id': 'fannie_part_0_c353'},
 {'chunk_id': 'fannie_part_0_c326'},
 {'chunk_id': 'fannie_part_0_c92'}]

---
## Retrieval Augmented Generation (RAG)

Build a simple retrieval augmented generation process that enhances a query by retrieving context.  This is done here by constructing three functions for the stages:
- `retrieve` - a function that uses an embedding to search for matching context parts, pieces of texts
    - this uses the system built earlier in this workflow!
- `augment` - prepare chunks into a prompt
- `generate` - make the llm request with the augmented prompt

A final function is used to execute the workflow of rag:
- `rag` - a function that receives the query an orchestrates the workflow through `retrieve` > `augment` > `generate`

### Clients

In [62]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-002")

### Retrieve Function

In [69]:
def retrieve_featurestore(query_embedding, n_matches = 5):
    
    results = feature_view.search(
        embedding_value = question_embedding,
        neighbor_count = n_matches,
        return_full_entity = True
    )
    matches = []
    for result in results.to_dict()['neighbors']:
        for feature in result['entity_key_values']['key_values']['features']:
            if feature['name'] == 'content':
                matches.append(dict(
                    chunk_id = result['entity_id'],
                    content = feature['value']['string_value']
                ))
    
    return matches

### Augment Function

In [70]:
def augment(matches):

    prompt = ''
    for m, match in enumerate(matches):
        prompt += f"Context {m+1}:\n{match['content']}\n\n"
    prompt += f'Answer the following question using the provided contexts:\n'

    return prompt

### Generate Function

In [71]:
def generate(prompt):

    result = llm.generate_content(prompt)

    return result

### RAG Function

In [72]:
def rag(query):
    
    query_embedding = embedder.get_embeddings([query])[0].values
    matches = retrieve_featurestore(query_embedding)
    prompt = augment(matches) + query
    result = generate(prompt)
    
    return result.text

### Example In Use

In [76]:
question

'Does a lender have to perform servicing functions directly?'

In [74]:
print(rag(question))

No.  A lender may use other organizations to perform some or all of its servicing functions through a "subservicing" arrangement (Context 1).  However, the lender remains ultimately responsible, even if they use a subservicer (Context 4).  The specifics of these arrangements, including notice requirements to borrowers, are detailed in the provided texts.



---
### Profiling Performance

Profile the timing of each step in the RAG function for sequential calls. The environment choosen for this workflow is a minimal testing enviornment so load testing (simoultaneous requests) would not be helpful.

In [77]:
profile = []

In [78]:
def rag(query, profile = profile):
    
    timings = {}
    start_time = time.time()
    
    
    # 1. Get embeddings
    embedding_start = time.time()
    query_embedding = embedder.get_embeddings([query])[0].values
    timings['embedding'] = time.time() - embedding_start

    # 2. Retrieve from Bigtable
    retrieval_start = time.time()
    matches = retrieve_featurestore(query_embedding)
    timings['retrieve_featurestore'] = time.time() - retrieval_start

    # 3. Augment the prompt
    augment_start = time.time()
    prompt = augment(matches) + query
    timings['augment'] = time.time() - augment_start

    # 4. Generate text
    generate_start = time.time()
    result = generate(prompt)
    timings['generate'] = time.time() - generate_start

    total_time = time.time() - start_time
    timings['total'] = total_time
    
    profile.append(timings)
    
    return result.text

In [79]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender "may use other organizations to perform some or all of its servicing functions," referring to this as "subservicing."  However, there are stipulations and requirements regarding these subservicing arrangements, as outlined in the provided texts.  The lender remains ultimately responsible, even when using a subservicer or master servicer.



In [80]:
profile

[{'embedding': 0.1379692554473877,
  'retrieve_featurestore': 0.18321871757507324,
  'augment': 4.100799560546875e-05,
  'generate': 0.8281786441802979,
  'total': 1.1494147777557373}]

In [81]:
for i in range(100):
    response = rag(question)

### Report From Profile

In [82]:
all_timings = {}
for timings in profile:
    for key, value in timings.items():
        if key not in all_timings:
            all_timings[key] = []
        all_timings[key].append(value)

In [83]:
for key, values in all_timings.items():
    arr = np.array(values)
    print(f"Statistics for '{key}':")
    print(f"  Min: {np.min(arr):.4f} seconds")
    print(f"  Max: {np.max(arr):.4f} seconds")
    print(f"  Mean: {np.mean(arr):.4f} seconds")
    print(f"  Median: {np.median(arr):.4f} seconds")
    print(f"  Std Dev: {np.std(arr):.4f} seconds")
    print(f"  P95: {np.percentile(arr, 95):.4f} seconds")
    print(f"  P99: {np.percentile(arr, 99):.4f} seconds")
    print("")

Statistics for 'embedding':
  Min: 0.0468 seconds
  Max: 0.2955 seconds
  Mean: 0.0589 seconds
  Median: 0.0534 seconds
  Std Dev: 0.0262 seconds
  P95: 0.0751 seconds
  P99: 0.1380 seconds

Statistics for 'retrieve_featurestore':
  Min: 0.0503 seconds
  Max: 0.1832 seconds
  Mean: 0.0810 seconds
  Median: 0.0629 seconds
  Std Dev: 0.0289 seconds
  P95: 0.1226 seconds
  P99: 0.1423 seconds

Statistics for 'augment':
  Min: 0.0000 seconds
  Max: 0.0000 seconds
  Mean: 0.0000 seconds
  Median: 0.0000 seconds
  Std Dev: 0.0000 seconds
  P95: 0.0000 seconds
  P99: 0.0000 seconds

Statistics for 'generate':
  Min: 0.5262 seconds
  Max: 1.0575 seconds
  Mean: 0.7240 seconds
  Median: 0.6985 seconds
  Std Dev: 0.1103 seconds
  P95: 0.9799 seconds
  P99: 1.0316 seconds

Statistics for 'total':
  Min: 0.6308 seconds
  Max: 1.2209 seconds
  Mean: 0.8639 seconds
  Median: 0.8283 seconds
  Std Dev: 0.1217 seconds
  P95: 1.1393 seconds
  P99: 1.2022 seconds



---
## Remove Resources

The resources created above in BigQuery and Feature Store will persist unless deleted.  The Feature Store service does have ongoing costs even if not used so it might be desirable to remove it here.  Uncomment lines in the following cell to remove resources:

In [None]:
#online_store.delete(force = True)

In [None]:
#bq.delete_table(bq_table)