![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI%2FRetrieval&file=Retrieval+-+Spanner.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Spanner.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FRetrieval%2FRetrieval%2520-%2520Spanner.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Spanner.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Spanner.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Retrieval - Spanner

In prior workflows, a series of documents was [processed into chunks](../Chunking/readme.md), and for each chunk, [embeddings](../Embeddings/readme.md) were created:

- Process: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb)
- Embed: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

Retrieving chunks for a query involves calculating the embedding for the query and then using similarity metrics to find relevant chunks. A thorough review of similarity matching can be found in [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb) - use dot product! As development moves from experiment to application, the process of storing and computing similarity is migrated to a [retrieval](./readme.md) system. This workflow is part of a [series of workflows exploring many retrieval systems](./readme.md).  

A detailed [comparison of many retrieval systems](./readme.md#comparison-of-vector-database-solutions) can be found in the readme as well.

---

**Spanner For Storage, Indexing, And Search**

**Spanner** ([https://cloud.google.com/spanner](https://cloud.google.com/spanner)) is Google Cloud's globally distributed, scalable database, suitable for a wide range of applications, from gaming databases to financial ledgers.

- **Effortless Scalability:** Spanner decouples compute and storage, enabling effortless scalability to accommodate growing data and traffic demands.
- **Always-On Availability:**  Spanner provides automatic maintenance with zero downtime and allows for 100% online schema changes, even with synchronous replication, ensuring continuous availability.
- **Spanner Graph:**  Leverage [Spanner Graph](https://cloud.google.com/spanner/docs/graph/overview) for knowledge graphs, social networks, GraphRAG, and more, using the ISO Graph Query Language.
- **Vertex AI Integration:** Spanner integrates with [Vertex AI](https://cloud.google.com/spanner/docs/ml) for generative AI and custom ML model inference.
- **LangChain Integration:**  Build LLM-powered applications with Spanner's integration with [LangChain](https://cloud.google.com/spanner/docs/langchain).
- **Vector Similarity Search:** Spanner offers [built-in vector similarity search](https://cloud.google.com/spanner/docs/find-k-nearest-neighbors) with indexing for efficient approximate nearest neighbor search in applications like retrieval augmented generation (RAG).

---

**Use Case Data**

Buying a home usually involves borrowing money from a lending institution, typically through a mortgage secured by the home's value. But how do these institutions manage the risks associated with such large loans, and how are lending standards established?

In the United States, two government-sponsored enterprises (GSEs) play a vital role in the housing market:

- Federal National Mortgage Association ([Fannie Mae](https://www.fanniemae.com/))
- Federal Home Loan Mortgage Corporation ([Freddie Mac](https://www.freddiemac.com/))

These GSEs purchase mortgages from lenders, enabling those lenders to offer more loans. This process also allows Fannie Mae and Freddie Mac to set standards for mortgages, ensuring they are responsible and borrowers are more likely to repay them. This system makes homeownership more affordable and stabilizes the housing market by maintaining a steady flow of liquidity for lenders and keeping interest rates controlled.

However, navigating the complexities of these GSEs and their extensive servicing guides can be challenging.

**Approaches**

[This series](../readme.md) covers many generative AI workflows. These documents are used directly as long context for Gemini in the workflow [Long Context Retrieval With The Vertex AI Gemini API](../Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb). The workflow below uses a [retrieval](./readme.md) approach with the already generated chunks and embeddings.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [48]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.69.0'),
    ('google.cloud.spanner', 'google-cloud-spanner'),
    ('google.cloud.sqlalchemy_spanner', 'sqlalchemy-spanner')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable spanner.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'retrieval-spanner'

# Spanner names
SPANNER_INSTANCE_NAME = PROJECT_ID
SPANNER_DATABASE_NAME = SERIES
SPANNER_TABLE_NAME = EXPERIMENT.replace('-', '_')

Packages

In [49]:
import os, json, time, glob, datetime, asyncio

import numpy as np

# Vertex AI
from google.cloud import aiplatform
import vertexai.language_models # for embeddings API
import vertexai.generative_models # for Gemini Models
from vertexai.resources.preview import feature_store

# spanner
from google.cloud import spanner
from google.cloud import spanner_admin_instance_v1
from google.cloud import spanner_admin_database_v1
import sqlalchemy

In [9]:
aiplatform.__version__

'1.71.0'

Clients

In [10]:
# vertex ai clients
vertexai.init(project = PROJECT_ID, location = REGION)

# spanner client
spanner_client = spanner.Client(project = PROJECT_ID)
spanner_instance_client = spanner_admin_instance_v1.InstanceAdminClient()
spanner_database_client = spanner_admin_database_v1.DatabaseAdminClient()

---
## Text & Embeddings For Examples

This repository contains a [section for document processing (chunking)](../Chunking/readme.md) that includes an example of processing mulitple large pdfs (over 1000 pages) into chunks: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb).  The chunks of text from that workflow are stored with this repository and loaded by another companion workflow that augments the chunks with text embeddings: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb).

The following code will load the version of the chunks that includes text embeddings and prepare it for a local example of retrival augmented generation.

### Get The Documents

If you are working from a clone of this notebooks [repository](https://github.com/statmike/vertex-ai-mlops) then the documents are already present. The following cell checks for the documents folder and if it is missing gets it (`git clone`):

In [11]:
local_dir = '../Embeddings/files/embeddings-api'

In [12]:
if not os.path.exists(local_dir):
    print('Retrieving documents...')
    parent_dir = os.path.dirname(local_dir)
    temp_dir = os.path.join(parent_dir, 'temp')
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)
    !git clone https://www.github.com/statmike/vertex-ai-mlops {temp_dir}/vertex-ai-mlops
    shutil.copytree(f'{temp_dir}/vertex-ai-mlops/Applied GenAI/Embeddings/files/embeddings-api', local_dir)
    shutil.rmtree(temp_dir)
    print(f'Documents are now in folder `{local_dir}`')
else:
    print(f'Documents Found in folder `{local_dir}`')             

Documents Found in folder `../Embeddings/files/embeddings-api`


### Load The Chunks

In [13]:
jsonl_files = glob.glob(f"{local_dir}/large-files*.jsonl")
jsonl_files.sort()
jsonl_files

['../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0000.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0001.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0002.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0003.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0004.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0005.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0006.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0007.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0008.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0009.jsonl']

In [14]:
chunks = []
for file in jsonl_files:
    with open(file, 'r') as f:
        chunks.extend([json.loads(line) for line in f])
len(chunks)

9040

### Review A Chunk

In [15]:
chunks[0].keys()

dict_keys(['instance', 'predictions', 'status'])

In [16]:
chunks[0]['instance']['chunk_id']

'fannie_part_0_c17'

In [17]:
print(chunks[0]['instance']['content'])

# Selling Guide Fannie Mae Single Family

## Fannie Mae Copyright Notice

### Fannie Mae Copyright Notice

|-|
| Section B3-4.2, Verification of Depository Assets 402 |
| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |
| B3-4.2-02, Depository Accounts (12/14/2022) 405 |
| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |
| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |
| B3-4.2-05, Foreign Assets (05/04/2022) 411 |
| Section B3-4.3, Verification of Non-Depository Assets 412 |
| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |
| B3-4.3-02, Trust Accounts (04/01/2009) 413 |
| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |
| B3-4.3-04, Personal Gifts (09/06/2023) 415 |
| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |
| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |
| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |
| B3-4.3-08, Employer Assistance (09/29/2015) 423 |
| B3-4.3-09,

In [18]:
chunks[0]['predictions'][0]['embeddings']['values'][0:10]

[0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732,
 0.05066155269742012,
 0.046544693410396576,
 0.05509665608406067,
 -0.014074751175940037,
 0.008380400016903877]

### Prepare Chunk Structure

Make a list of dictionaries with information for each chunk:

In [19]:
content_chunks = [
    dict(
        gse = chunk['instance']['gse'],
        chunk_id = chunk['instance']['chunk_id'],
        content = chunk['instance']['content'],
        embedding = chunk['predictions'][0]['embeddings']['values']
    ) for chunk in chunks
]

### Query Embedding

Create a query, or prompt, and get the embedding for it:

Connect to models for text embeddings. Learn more about the model API:
- [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

In [20]:
question = "Does a lender have to perform servicing functions directly?"

In [21]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')

In [24]:
question_embedding = embedder.get_embeddings([question])[0].values
question_embedding[0:10]

[-0.0005117303808219731,
 0.009651427157223225,
 0.01768726110458374,
 0.014538003131747246,
 -0.01829824410378933,
 0.027877431362867355,
 -0.021124685183167458,
 0.008830446749925613,
 -0.02669006586074829,
 0.06414774805307388]

---
## Setup Spanner

Spanner is the Google fully managed, globally distributed, scalable, database service.  This workflow will guide you through creating an instance, creating a database, adding a table to the database, and loading records to the table.  [Spanner pricing](https://cloud.google.com/spanner/pricing#spanner-pricing) is based on compute, storage and data movement (replication, egress).  There are three tiers based the features needed and on the number of regions and replicas that are needed and these are call [Spanner editions](https://cloud.google.com/spanner/docs/editions-overview).  This workflow uses utilizes a minimal configuration of smallest compute with no replication and at the lowest tier that has the vector search capabilities (Enterprise edition).

The setup here is done with the [Python Client for Cloud Spanner](https://cloud.google.com/python/docs/reference/spanner/latest).  Alternatively, the [console](https://cloud.google.com/spanner/docs/create-query-database-console) and [Cloud SDK `gcloud spanner`](https://cloud.google.com/spanner/docs/getting-started/gcloud) as well as [client libraries](https://cloud.google.com/spanner/docs/getting-started/set-up) in other languages can be used.

### Create/Retrieve An Instance

The starting point for using Spanner is create an instance.  This is where compute size (nnumber of nodes), location(s), and replication is specified.

Documentation References:
- [Instances overview](https://cloud.google.com/spanner/docs/instances)
- [Instance Configurations (regional, dual-region, and multi-region)](https://cloud.google.com/spanner/docs/instance-configurations)
- [Python SDK for Cloud Spanner: `InstanceAdminClient`](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_admin_instance_v1.services.instance_admin)
    - Setup as `spanner_instance_client` here

Use the client to list configurations, if needed:

In [25]:
#spanner_instance_client.list_instance_configs(parent = f"projects/{PROJECT_ID}")

Make a [configuration choice](https://cloud.google.com/spanner/docs/instance-configurations), here a single region. 

In [26]:
# regional configuration
config_id = 'regional-us-central1'

In [27]:
try:
    spanner_instance = spanner_instance_client.get_instance(
        name = f'projects/{PROJECT_ID}/instances/{SPANNER_INSTANCE_NAME}'
    )
    print(f"Found the instance: {spanner_instance.name}")
except Exception:
    print('Creating an instance ...')
    create_instance = spanner_instance_client.create_instance(
        parent = f"projects/{PROJECT_ID}",
        instance_id = SPANNER_INSTANCE_NAME,
        instance = spanner_admin_instance_v1.Instance(
            name = f'projects/{PROJECT_ID}/instances/{SPANNER_INSTANCE_NAME}',
            config = f'projects/{PROJECT_ID}/instanceConfigs/{config_id}',
            display_name = SPANNER_INSTANCE_NAME,
            node_count = 1,
            edition = 'ENTERPRISE' # minnimum needed for vector indexing features
        )
    )
    spanner_instance = create_instance.result()
    spanner_instance = spanner_instance_client.get_instance(
        name = f'projects/{PROJECT_ID}/instances/{SPANNER_INSTANCE_NAME}'
    )
    print(f"Created the instance: {spanner_instance.name}")

Found the instance: projects/statmike-mlops-349915/instances/statmike-mlops-349915


In [28]:
spanner_instance

name: "projects/statmike-mlops-349915/instances/statmike-mlops-349915"
config: "projects/statmike-mlops-349915/instanceConfigs/regional-us-central1"
display_name: "statmike-mlops-349915"
node_count: 1
state: READY
processing_units: 1000
create_time {
  seconds: 1729873343
  nanos: 540788000
}
update_time {
  seconds: 1729873343
  nanos: 540788000
}
edition: ENTERPRISE

---
## Working With Spanner

Now that an instance is created it can be used to create a database, add a table, and load the records.

### Create/Retrieve A Database

[Spanner databases](https://cloud.google.com/spanner/docs/databases) are the container for tables, views, and indexes.  There can be multiple databases on an instance.  Spanner has two available dialect for databases that are choosen at database creation: GoogleSQL or PostgreSQL.  The vector storage and indexing features are specific to the GoogleSQL dialect and it is used in this workflow.  Here a single database with GoogleSQL dialect is created and used for this workflow.

Documentation References:
- [Database overview](https://cloud.google.com/spanner/docs/databases)
- [Choosing the Right Dialect for Your Spanner Database](https://cloud.google.com/spanner/docs/choose-googlesql-or-postgres)
- [Create and managed databases](https://cloud.google.com/spanner/docs/create-manage-databases)
- [Python SDK for Cloud Spanner `DatabaseAdminClient`](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_admin_database_v1.services.database_admin)
    - Setup as `spanner_database_client` here

In [62]:
try:
    spanner_database = spanner_database_client.get_database(
        name = f"{spanner_instance.name}/databases/{SPANNER_DATABASE_NAME}"
    )
    print(f"Found the database: {spanner_database.name}")
except Exception:
    print('Creating a database ...')
    create_database = spanner_database_client.create_database(
        request = spanner_admin_database_v1.types.CreateDatabaseRequest(
            parent = spanner_instance.name,
            create_statement = f'CREATE DATABASE `{SPANNER_DATABASE_NAME}`',
            extra_statements = [], # you could go ahead and CREATE TABLE here   
        )
    )
    spanner_database = create_database.result()
    spanner_database = spanner_database_client.get_database(
        name = f"{spanner_instance.name}/databases/{SPANNER_DATABASE_NAME}"
    )
    print(f"Created the database: {spanner_database.name}")

Found the database: projects/statmike-mlops-349915/instances/statmike-mlops-349915/databases/applied-genai


In [63]:
spanner_database

name: "projects/statmike-mlops-349915/instances/statmike-mlops-349915/databases/applied-genai"
state: READY
create_time {
  seconds: 1729873484
  nanos: 192873000
}
version_retention_period: "1h"
earliest_version_time {
  seconds: 1731511990
  nanos: 491087000
}
encryption_info {
  encryption_type: GOOGLE_DEFAULT_ENCRYPTION
}
database_dialect: GOOGLE_STANDARD_SQL

### Connection To Databases With Client

For work inside a database the Cloud Spanner API is invoked here using the [Python Client For Cloud Spanner](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_v1.client) which was setup above as the `spanner_client` object.

Documentation References:
- [Data type in GoogleSQL](https://cloud.google.com/spanner/docs/reference/standard-sql/data-types)
- [Python Client For Cloud Spanner database Module](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_v1.database)
- [Python Client For Cloud Spanner table Module](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_v1.table.Table)


Use the client, `spanner_client`, to return the database as a Python object:

In [145]:
instance = spanner_client.instance(SPANNER_INSTANCE_NAME)
database = instance.database(SPANNER_DATABASE_NAME)

In [146]:
database.name

'projects/statmike-mlops-349915/instances/statmike-mlops-349915/databases/applied-genai'

Test DML statement (SELECT) with client:

In [147]:
with database.snapshot() as snapshot:
    result = snapshot.execute_sql("SELECT 'Success' as did_it_work")
for r in result:
    print(r)

['Success']


In [148]:
with database.snapshot() as snapshot:
    result = snapshot.execute_sql("SELECT 'Success' as did_it_work")

### Connection To Databases With Client Using SQLAlchemy

The Spanner Client also integrates with various language frameworks.  Of note here, from Python, is [integration wtih SQLAlchemy](https://cloud.google.com/spanner/docs/use-sqlalchemy).  [SQLAlchemy](https://www.sqlalchemy.org/) is a client library that can use connections to orchestrate SQL queries.  

The method of using a connection is called an engine.  The [Spanner dialect for SQLAlchemy](https://github.com/googleapis/python-spanner-sqlalchemy/tree/main) is a Python package that adds engine support for Spanner to SQLAlchemy through the installation with `pip install sqlalchemy-spanner`.  This section shows how to use the client created and used in the previous section as an engine with SQLAlchemy.

Use the client, `spanner_client`, with SQLAlchemy to create an engine:

In [149]:
engine = sqlalchemy.create_engine(
    f"spanner+spanner:///{database.name}",
    connect_args = dict(client = spanner_client)
)
# https://github.com/googleapis/python-spanner-sqlalchemy/tree/main?tab=readme-ov-file#autocommit-mode
autocommit_engine = engine.execution_options(isolation_level = "AUTOCOMMIT")

Test DML statement (SELECT) with SQLAlchemy:

In [150]:
with autocommit_engine.connect() as connection:
    result = connection.execute(sqlalchemy.text("SELECT 'Success' as did_it_work"))
for r in result:
    print(r)

('Success',)


### Query Orchestrator

Use the client and SQLAlchemy engine as the basis for a simple function that executes queries and returns results:

In [155]:
def run_query(query, database = database, engine = autocommit_engine):
    with engine.connect() as connection:
        result = connection.execute(query)
        
    # prepare the response
    rows = []
    try:
        for row in result:
            rows.append(dict(zip(result.keys(), row)))
    except Exception:
        pass
    
    # return the response
    return rows[0] if len(rows) == 1 else rows

Execute a test query:

In [157]:
run_query(sqlalchemy.text("SELECT 'Success' as did_it_work"))

{'did_it_work': 'Success'}

### Create/Retrieve A Table

Create a table. If the table already exists the remove it first to start fresh for this example workflow.

Documentation References:
- [Data types in GoogleSQL](https://cloud.google.com/spanner/docs/reference/standard-sql/data-types)

#### Check For Table And Delete It

This uses queries against the Spanner [Information Schema tables](https://cloud.google.com/spanner/docs/information-schema) to check for the table and any indexes on it.  If found these are dropped to start fresh for this workflow.

Check for the table:

In [264]:
table_result = run_query(sqlalchemy.text(f"SELECT * from information_schema.tables WHERE table_name = '{SPANNER_TABLE_NAME}'"))
table_result

{'TABLE_CATALOG': '',
 'TABLE_SCHEMA': '',
 'TABLE_NAME': 'retrieval_spanner',
 'PARENT_TABLE_NAME': None,
 'ON_DELETE_ACTION': None,
 'TABLE_TYPE': 'BASE TABLE',
 'SPANNER_STATE': 'COMMITTED',
 'INTERLEAVE_TYPE': None,
 'ROW_DELETION_POLICY_EXPRESSION': None}

If the table was found, check for any indexes on it:

In [265]:
if table_result:
    index_result = run_query(sqlalchemy.text(f"SELECT INDEX_NAME, INDEX_TYPE FROM information_schema.indexes WHERE table_name = '{SPANNER_TABLE_NAME}'"))
else:
    index_result = []

In [266]:
index_result

{'INDEX_NAME': 'PRIMARY_KEY', 'INDEX_TYPE': 'PRIMARY_KEY'}

Remove any vector indexes:

In [267]:
if index_result:
    if type(index_result) == dict: index_result = [index_result]
    for index in index_result:
        if index['INDEX_TYPE'] == 'VECTOR':
            run_query(sqlalchemy.text(f"DROP INDEX {index['INDEX_NAME']}"))

Remove the table:

In [268]:
if table_result:
    run_query(sqlalchemy.text(f"DROP TABLE {SPANNER_TABLE_NAME}"))

#### Create A Table

Create a table with schema is done by specifying the appropriate [Data Types](https://cloud.google.com/spanner/docs/reference/standard-sql/data-types).  It can also be helpful to look ahead to the [Spanner vector search](https://cloud.google.com/spanner/docs/find-k-nearest-neighbors) documentation.

In [269]:
run_query(sqlalchemy.text(f"""
    CREATE TABLE {SPANNER_TABLE_NAME} (
        chunk_id STRING(100) NOT NULL,
        embedding ARRAY<FLOAT32>(vector_length=>{len(question_embedding)}),
        gse STRING(25),
        content STRING(MAX)
    ) PRIMARY KEY (chunk_id)
"""))

[]

Checking the information schema for the table was shown above.  The [Python Client For Cloud Spanners `Table` Module](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_v1.table) can also be used:

In [270]:
database.table(SPANNER_TABLE_NAME).exists()

True

### Add, Retrieve, And Delete Rows

Learn about inserting, retrieving, and deleting records/rows with the following simple examples.

#### Get A Record

Dictionaries for each record/row are stored in `content_chunks` from earlier in this workflow:

In [271]:
first_record = content_chunks[0]

In [272]:
first_record.keys()

dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

In [273]:
first_record['chunk_id']

'fannie_part_0_c17'

#### Insert Row

In [274]:
with database.batch() as batch:
    batch.insert(
        table = SPANNER_TABLE_NAME,
        columns = tuple(first_record.keys()),
        values = [tuple(first_record.values())]
    )

#### Retrieve Row

There are multiple helpful ways to retrieve rows.  With SQL, and the Spanner Client as demonstrated here:

Using Spanner Client Using SQL:

In [275]:
with database.snapshot() as snapshot:
    response = snapshot.execute_sql(f"SELECT {','.join(first_record.keys())} FROM {SPANNER_TABLE_NAME} WHERE chunk_id = '{first_record['chunk_id']}'")
results = []
for row in list(response):
    results.append(dict(zip(first_record.keys(), row)))

In [276]:
results[0]['chunk_id']

'fannie_part_0_c17'

In [277]:
results[0]['gse']

'fannie'

Using Spanner Clients Read Method:

In [278]:
with database.snapshot() as snapshot:
    response = snapshot.read(
        table = SPANNER_TABLE_NAME,
        columns = list(first_record.keys()), # or a subset provided as a list [column names]
        keyset = spanner.KeySet(keys=[(first_record['chunk_id'],)]) # or all rows with spanner.KeySet(all_ = True)
    )
    results = []
    for row in list(response):
        results.append(dict(zip(first_record.keys(), row)))

In [279]:
len(results)

1

In [280]:
results[0].keys()

dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

In [281]:
results[0]['chunk_id']

'fannie_part_0_c17'

In [282]:
results[0]['gse']

'fannie'

In [283]:
results[0]['content']

'# Selling Guide Fannie Mae Single Family\n\n## Fannie Mae Copyright Notice\n\n### Fannie Mae Copyright Notice\n\n|-|\n| Section B3-4.2, Verification of Depository Assets 402 |\n| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |\n| B3-4.2-02, Depository Accounts (12/14/2022) 405 |\n| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |\n| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |\n| B3-4.2-05, Foreign Assets (05/04/2022) 411 |\n| Section B3-4.3, Verification of Non-Depository Assets 412 |\n| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |\n| B3-4.3-02, Trust Accounts (04/01/2009) 413 |\n| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |\n| B3-4.3-04, Personal Gifts (09/06/2023) 415 |\n| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |\n| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |\n| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |\n| B3-4.3-08, Employer Assistance (09/29/20

Using SQL With SQLAlchemy:

In [284]:
query = sqlalchemy.text(f"SELECT * FROM `{SPANNER_TABLE_NAME}` WHERE chunk_id = '{first_record['chunk_id']}'")
result = run_query(query)

In [285]:
result.keys()

dict_keys(['chunk_id', 'embedding', 'gse', 'content'])

In [286]:
result['chunk_id']

'fannie_part_0_c17'

In [287]:
type(result['embedding'])

list

In [288]:
result['embedding'][0:10]

[0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732,
 0.05066155269742012,
 0.046544693410396576,
 0.05509665608406067,
 -0.014074751175940037,
 0.008380400016903877]

#### Delete Row

Delete the row added here.  Verify the action by counting the rows before and after the deletion.

In [289]:
run_query(sqlalchemy.text(f"SELECT COUNT(*) as count FROM `{SPANNER_TABLE_NAME}`"))

{'count': 1}

In [290]:
run_query(sqlalchemy.text(f"DELETE FROM `{SPANNER_TABLE_NAME}` WHERE chunk_id = '{first_record['chunk_id']}'"))

[]

In [291]:
run_query(sqlalchemy.text(f"SELECT COUNT(*) as count FROM `{SPANNER_TABLE_NAME}`"))

{'count': 0}

### Load Data

There are a lot of rows so using the batch method for loading:

Documentation References:
- [Insert, update, and delete data using mutations](https://cloud.google.com/spanner/docs/modify-mutation-api#python)
- [Python Client For Database Batch](https://cloud.google.com/python/docs/reference/spanner/latest/google.cloud.spanner_v1.database.Database#google_cloud_spanner_v1_database_Database_batch)

Keep in mind that a single batch can have limits for total size and number of mutations and may need to be broken into multiple batches.

In [292]:
async def insert_data(database, input_data):
    """Inserts data rows into the Spanner table asynchronously."""
    with database.batch() as batch:
        for record in input_data:
            batch.insert(
                table=SPANNER_TABLE_NAME,
                columns=tuple(record.keys()),
                values=[tuple(record.values())]
            )

In [293]:
await insert_data(database, content_chunks)

Verify the results with a row count:

In [294]:
run_query(sqlalchemy.text(f"SELECT COUNT(*) as count FROM `{SPANNER_TABLE_NAME}`"))

{'count': 9040}

---
## Vector Similarity Search, Matching

This section covers the operation of using a vector similarity metric calculation to find nearest neighbors for a query vector while also taking advantage of indexing.  To understand similarity metrics and motivate the intution for choosing one (choose dot product), check out [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb).

### Check For Vector Indexes

At this point in the workflow no vector indexes have been created.  The following cells show how to check for indexes and will be reused later in the workflow to verify the details of indexes after they are created.

In [295]:
run_query(sqlalchemy.text(f"""
    SELECT INDEX_NAME, INDEX_TYPE
    FROM information_schema.indexes
    WHERE
        table_name = '{SPANNER_TABLE_NAME}'
        AND index_type = 'VECTOR'
"""))

[]

### Brute Force Search - No Index

Without an index you can still use distance measures to find nearest neighbor matches through brute force search that compares a query embedding to all rows.

Easily run a brute force (compare to all rows) match with a choice of distance measure function using the [`pgvector` querying notation](https://github.com/pgvector/pgvector?tab=readme-ov-file#querying):
- [COSINE_DISTANCE()](https://cloud.google.com/spanner/docs/reference/standard-sql/mathematical_functions#cosine_distance) for Cosine distance
- [EUCLIDEAN_DISTANCE()](https://cloud.google.com/spanner/docs/reference/standard-sql/mathematical_functions#euclidean_distance) for L2, Euclidean distance
- [DOT_PRODUCT()](https://cloud.google.com/spanner/docs/reference/standard-sql/mathematical_functions#dot_product) for Dot product
    
Documentation Reference: [Perform vector similarity search in Spanner by finding the K-nearest neighbors](https://cloud.google.com/spanner/docs/find-k-nearest-neighbors)

Dot product with `DOT_PRODUCT()`

In [296]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        DOT_PRODUCT(embedding, ARRAY<FLOAT32>{question_embedding}) AS dot_product
    FROM {SPANNER_TABLE_NAME}
    ORDER BY dot_product DESC
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': 0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': 0.6805261373519897},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': 0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': 0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': 0.6683496832847595}]

Euclidean distance with `EUCLIDEAN_DISTANCE()`

In [297]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        EUCLIDEAN_DISTANCE(embedding, ARRAY<FLOAT32>{question_embedding}) AS euclidean_distance
    FROM {SPANNER_TABLE_NAME}
    ORDER BY euclidean_distance
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'euclidean_distance': 0.7615657946265283},
 {'chunk_id': 'freddie_part_4_c509', 'euclidean_distance': 0.7992874727506176},
 {'chunk_id': 'freddie_part_4_c510', 'euclidean_distance': 0.8057848660615564},
 {'chunk_id': 'fannie_part_0_c353', 'euclidean_distance': 0.8094337265330812},
 {'chunk_id': 'fannie_part_0_c326', 'euclidean_distance': 0.8144253513348422}]

Cosine Similarity with `COSINE_DISTANCE()`

In [298]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        COSINE_DISTANCE(embedding, ARRAY<FLOAT32>{question_embedding}) AS cosine_similarity
    FROM {SPANNER_TABLE_NAME}
    ORDER BY cosine_similarity
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'cosine_similarity': 0.28999842323157043},
 {'chunk_id': 'freddie_part_4_c509', 'cosine_similarity': 0.31944418927007767},
 {'chunk_id': 'freddie_part_4_c510', 'cosine_similarity': 0.3246529853234844},
 {'chunk_id': 'fannie_part_0_c353', 'cosine_similarity': 0.32760391792511945},
 {'chunk_id': 'fannie_part_0_c326', 'cosine_similarity': 0.3316462732543465}]

### Brute Force Search With Pre-Filtering - No Index

Extending a brute force match with pre-filtering means including a `WHERE` statement to first filter to rows that meet a desired condition:

Find the top 5 matches where the GSE is 'fannie':

In [299]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        DOT_PRODUCT(embedding, ARRAY<FLOAT32>{question_embedding}) AS dot_product
    FROM {SPANNER_TABLE_NAME}
    WHERE gse = 'fannie'
    ORDER BY dot_product DESC
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': 0.7099841833114624},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': 0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': 0.66834956407547},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': 0.661433756351471},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': 0.6608578562736511}]

Find the top 5 matches where the GSE is 'freddie':

In [300]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        DOT_PRODUCT(embedding, ARRAY<FLOAT32>{question_embedding}) AS dot_product
    FROM {SPANNER_TABLE_NAME}
    WHERE gse = 'freddie'
    ORDER BY dot_product DESC
    LIMIT 5
"""))

[{'chunk_id': 'freddie_part_4_c509', 'dot_product': 0.6805261373519897},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': 0.6753296852111816},
 {'chunk_id': 'freddie_part_4_c472', 'dot_product': 0.6619843244552612},
 {'chunk_id': 'freddie_part_6_c439', 'dot_product': 0.6604535579681396},
 {'chunk_id': 'freddie_part_4_c558', 'dot_product': 0.6575404405593872}]

### Create And Use An Index

Indexes make search across many rows more efficient by first matching partions of rows and then only comparing to rows within the partions.  This section covers [creating indexes](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors#vector-index) and using them in queries.

- ScaNN: [Developed by google](https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md)
    - tree-based quantization index
    - [options](https://cloud.google.com/spanner/docs/reference/standard-sql/data-definition-language#vector_index_option_list) for the distance type, tree depth, number leaves and number of branches

Documentation References:
- [Vector Indexes](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors#vector-index)
- [VECTOR INDEX statements](https://cloud.google.com/spanner/docs/reference/standard-sql/data-definition-language#vector_index_statements)

#### Prepare Table For Vector Index Creation

The column type for the embedding [needs to include](https://cloud.google.com/spanner/docs/reference/standard-sql/data-definition-language#parameters_34) the `vector_length=>INT` information.  It was set during table creation above and verified here:

In [301]:
run_query(sqlalchemy.text(f"SELECT COLUMN_NAME, SPANNER_TYPE from information_schema.columns WHERE table_name = '{SPANNER_TABLE_NAME}'"))

[{'COLUMN_NAME': 'chunk_id', 'SPANNER_TYPE': 'STRING(100)'},
 {'COLUMN_NAME': 'embedding',
  'SPANNER_TYPE': 'ARRAY<FLOAT32>(vector_length=>768)'},
 {'COLUMN_NAME': 'gse', 'SPANNER_TYPE': 'STRING(25)'},
 {'COLUMN_NAME': 'content', 'SPANNER_TYPE': 'STRING(MAX)'}]

#### Index: ScaNN

Reference: [Vector Indexes](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors#vector-index)

In [None]:
run_query(sqlalchemy.text(f"""
    CREATE VECTOR INDEX IF NOT EXISTS embedding_index ON `{SPANNER_TABLE_NAME}`(embedding)
    WHERE embedding IS NOT NULL
    OPTIONS(
        distance_type = 'DOT_PRODUCT',
        tree_depth = 2,
        num_leaves = 100
    )
"""))

Review the index details:

The `INDEX_STATE` might be at 'PREPARE' which means it is still backfilling.  Before continuing wait for a value of ''.

In [330]:
run_query(sqlalchemy.text(f"""
    SELECT *
    FROM information_schema.indexes
    WHERE
        table_name = '{SPANNER_TABLE_NAME}'
        AND index_type = 'VECTOR'
"""))

{'TABLE_CATALOG': '',
 'TABLE_SCHEMA': '',
 'TABLE_NAME': 'retrieval_spanner',
 'INDEX_NAME': 'embedding_index',
 'INDEX_TYPE': 'VECTOR',
 'PARENT_TABLE_NAME': '',
 'IS_UNIQUE': False,
 'IS_NULL_FILTERED': False,
 'INDEX_STATE': 'READ_WRITE',
 'FILTER': 'embedding IS NOT NULL',
 'SPANNER_IS_MANAGED': False,
 'SEARCH_PARTITION_BY': None,
 'SEARCH_ORDER_BY': None}

#### Query An Index

In Spanner you [query using an index](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors#query-vector-embeddings) by using alternative distance measure functions:
    
- [APPROX_COSINE_DISTANCE()](https://cloud.google.com/spanner/docs/reference/standard-sql/mathematical_functions#approx_cosine_distance) for Cosine distance
- [APPROX_EUCLIDEAN_DISTANCE()](https://cloud.google.com/spanner/docs/reference/standard-sql/mathematical_functions#approx_euclidean_distance) for L2, Euclidean distance
- [APPROX_DOT_PRODUCT()](https://cloud.google.com/spanner/docs/reference/standard-sql/mathematical_functions#approx_dot_product) for Dot product

Important Notes:
- The query must use the [FORCE_INDEX directive](https://cloud.google.com/spanner/docs/secondary-indexes#index-directive) to specify the vector index to be used.  Multiple can be present and this set the one to use but is still required even if only one vector index is created.
- The distance function specified must be the one used to create the index, for instance, you cannot request APPROX_COSINE_DISTANCE when the vector index was created with `distance_type = 'DOT_PRODUCT'`
- The distance function must include the `options` definitions with `num_leaves_to_search`
- If the embedding column has not been set to avoid nulls then a WHERE statement must be used to filter out any possible nulls even if they dont exists
- An ORDER BY and LIMIT statement must be present

In [347]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        APPROX_DOT_PRODUCT(embedding, ARRAY<FLOAT32>{question_embedding}, options => JSON '{{\"num_leaves_to_search\": 10}}') AS dot_product
    FROM {SPANNER_TABLE_NAME}@{{FORCE_INDEX=embedding_index}}
    WHERE embedding IS NOT NULL
    ORDER BY dot_product DESC
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': 0.7101420164108276},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': 0.6804349422454834},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': 0.6751958131790161},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': 0.6723408699035645},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': 0.668167233467102}]

#### Query An Index With Pre-Filtering

In [351]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        APPROX_DOT_PRODUCT(embedding, ARRAY<FLOAT32>{question_embedding}, options => JSON '{{\"num_leaves_to_search\": 10}}') AS dot_product
    FROM {SPANNER_TABLE_NAME}@{{FORCE_INDEX=embedding_index}}
    WHERE embedding IS NOT NULL
        AND gse = 'fannie'
    ORDER BY dot_product DESC
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': 0.7101420164108276},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': 0.6723408699035645},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': 0.668167233467102},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': 0.6617515087127686},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': 0.6610549688339233}]

#### Replace/Rebuild/Add Vector Indexes

Multiple indexes are possible for a single embedding column.  The index to use in the query has to be specified with the FORCE_INDEX directive as show above.  There is [not an `ALTER VECTOR INDEX`](https://cloud.google.com/spanner/docs/find-approximate-nearest-neighbors#limitations) statment so building a new index is required to make changes.  As soon as new index is ready it can be added to the FORCE_INDEX directive to replace a prior index.  Outdated indexes can be removed with the [`DROP INDEX index_name` query](https://cloud.google.com/spanner/docs/secondary-indexes#drop-index).

#### Override An Index - Force Brute Force Search

This is as simple as using the primary distance function rather than the approximate version and not specifying and index in the query:

In [352]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        DOT_PRODUCT(embedding, ARRAY<FLOAT32>{question_embedding}) AS dot_product
    FROM {SPANNER_TABLE_NAME}
    ORDER BY dot_product DESC
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': 0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': 0.6805261373519897},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': 0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': 0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': 0.6683496832847595}]

---
## Retrieval Augmented Generation (RAG)

Build a simple retrieval augmented generation process that enhances a query by retrieving context.  This is done here by constructing three functions for the stages:
- `retrieve` - a function that uses an embedding to search for matching context parts, pieces of texts
    - this uses the system built earlier in this workflow!
- `augment` - prepare chunks into a prompt
- `generate` - make the llm request with the augmented prompt

A final function is used to execute the workflow of rag:
- `rag` - a function that receives the query an orchestrates the workflow through `retrieve` > `augment` > `generate`

### Clients

In [359]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-002")

### Retrieve Function

In [360]:
def retrieve_spanner(query_embedding, n_matches = 5):

    matches = run_query(sqlalchemy.text(f"""
        SELECT chunk_id, content
        FROM {SPANNER_TABLE_NAME}@{{FORCE_INDEX=embedding_index}}
        WHERE embedding IS NOT NULL
        ORDER BY APPROX_DOT_PRODUCT(embedding, ARRAY<FLOAT32>{query_embedding}, options => JSON '{{\"num_leaves_to_search\": 10}}') DESC
        LIMIT {n_matches}
    """))
    
    return matches

### Augment Function

In [361]:
def augment(matches):

    prompt = ''
    for m, match in enumerate(matches):
        prompt += f"Context {m+1}:\n{match['content']}\n\n"
    prompt += f'Answer the following question using the provided contexts:\n'

    return prompt

### Generate Function

In [362]:
def generate(prompt):

    result = llm.generate_content(prompt)

    return result

### RAG Function

In [363]:
def rag(query):
    
    query_embedding = embedder.get_embeddings([query])[0].values
    matches = retrieve_spanner(query_embedding)
    prompt = augment(matches) + query
    result = generate(prompt)
    
    return result.text

### Example In Use

In [366]:
question

'Does a lender have to perform servicing functions directly?'

In [365]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender "may use other organizations to perform some or all of its servicing functions," referring to this as "subservicing."  This involves a "master servicer" and a "subservicer," where the subservicer performs some or all of the servicing functions on behalf of the master servicer.  The contexts also detail requirements and regulations surrounding these subservicing arrangements.



---
### Profiling Performance

Profile the timing of each step in the RAG function for sequential calls. The environment choosen for this workflow is a minimal testing enviornment so load testing (simoultaneous requests) would not be helpful.

In [367]:
profile = []

In [368]:
def rag(query, profile = profile):
    
    timings = {}
    start_time = time.time()
    
    
    # 1. Get embeddings
    embedding_start = time.time()
    query_embedding = embedder.get_embeddings([query])[0].values
    timings['embedding'] = time.time() - embedding_start

    # 2. Retrieve from Bigtable
    retrieval_start = time.time()
    matches = retrieve_spanner(query_embedding)
    timings['retrieve_spanner'] = time.time() - retrieval_start

    # 3. Augment the prompt
    augment_start = time.time()
    prompt = augment(matches) + query
    timings['augment'] = time.time() - augment_start

    # 4. Generate text
    generate_start = time.time()
    result = generate(prompt)
    timings['generate'] = time.time() - generate_start

    total_time = time.time() - start_time
    timings['total'] = total_time
    
    profile.append(timings)
    
    return result.text

In [369]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender may use other organizations to perform some or all of its servicing functions, referring to this as "subservicing."  The text also outlines circumstances where arrangements are *not* considered subservicing.  Further, Context 4 details requirements for subservicing arrangements, indicating that it is a permissible and regulated practice.



In [370]:
profile

[{'embedding': 0.17748260498046875,
  'retrieve_spanner': 0.675225019454956,
  'augment': 0.0004048347473144531,
  'generate': 0.8636410236358643,
  'total': 1.7167634963989258}]

In [371]:
for i in range(100):
    response = rag(question)

### Report From Profile

In [372]:
all_timings = {}
for timings in profile:
    for key, value in timings.items():
        if key not in all_timings:
            all_timings[key] = []
        all_timings[key].append(value)

In [373]:
for key, values in all_timings.items():
    arr = np.array(values)
    print(f"Statistics for '{key}':")
    print(f"  Min: {np.min(arr):.4f} seconds")
    print(f"  Max: {np.max(arr):.4f} seconds")
    print(f"  Mean: {np.mean(arr):.4f} seconds")
    print(f"  Median: {np.median(arr):.4f} seconds")
    print(f"  Std Dev: {np.std(arr):.4f} seconds")
    print(f"  P95: {np.percentile(arr, 95):.4f} seconds")
    print(f"  P99: {np.percentile(arr, 99):.4f} seconds")
    print("")

Statistics for 'embedding':
  Min: 0.0473 seconds
  Max: 0.1775 seconds
  Mean: 0.0584 seconds
  Median: 0.0536 seconds
  Std Dev: 0.0184 seconds
  P95: 0.0767 seconds
  P99: 0.1537 seconds

Statistics for 'retrieve_spanner':
  Min: 0.0374 seconds
  Max: 2.2023 seconds
  Mean: 0.0728 seconds
  Median: 0.0414 seconds
  Std Dev: 0.2226 seconds
  P95: 0.1064 seconds
  P99: 0.6752 seconds

Statistics for 'augment':
  Min: 0.0000 seconds
  Max: 0.0004 seconds
  Mean: 0.0000 seconds
  Median: 0.0000 seconds
  Std Dev: 0.0000 seconds
  P95: 0.0001 seconds
  P99: 0.0001 seconds

Statistics for 'generate':
  Min: 0.5126 seconds
  Max: 1.0149 seconds
  Mean: 0.7079 seconds
  Median: 0.6888 seconds
  Std Dev: 0.0831 seconds
  P95: 0.8553 seconds
  P99: 0.9275 seconds

Statistics for 'total':
  Min: 0.6125 seconds
  Max: 2.9663 seconds
  Mean: 0.8392 seconds
  Median: 0.7938 seconds
  Std Dev: 0.2466 seconds
  P95: 0.9833 seconds
  P99: 1.7168 seconds



## Remove Resources

In [133]:
#spanner_database_client.drop_database(database = spanner_database.name)

In [134]:
#spanner_instance_client.delete_instance(name = spanner_instance.name)