![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI%2FRetrieval&file=Retrieval+-+Firestore.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Firestore.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FRetrieval%2FRetrieval%2520-%2520Firestore.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Firestore.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Firestore.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Retrieval - Firestore

In prior workflows, a series of documents was [processed into chunks](../Chunking/readme.md), and for each chunk, [embeddings](../Embeddings/readme.md) were created:

- Process: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb)
- Embed: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

Retrieving chunks for a query involves calculating the embedding for the query and then using similarity metrics to find relevant chunks. A thorough review of similarity matching can be found in [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb) - use dot product! As development moves from experiment to application, the process of storing and computing similarity is migrated to a [retrieval](./readme.md) system. This workflow is part of a [series of workflows exploring many retrieval systems](./readme.md).  

A detailed [comparison of many retrieval systems](./readme.md#comparison-of-vector-database-solutions) can be found in the readme as well.

---

**Firestore For Storage, Indexing, And Search**

[Firestore](https://cloud.google.com/firestore) is a fully managed, serverless document database on Google Cloud that scales automatically to meet any demand, without requiring partitioning or incurring downtime. It's ideal for mobile, web, and server development because it keeps data in sync across client apps with real-time listeners and offers offline support for mobile and web.

- **Data Model:**  Firestore stores data in documents (similar to JSON) comprised of key-value pairs.  Keys (fields) are mapped to values of various supported data types. ([see the full list here](https://cloud.google.com/firestore/docs/concepts/data-types?hl=en)) ([Learn more about the data model](https://cloud.google.com/firestore/docs/data-model?hl=en))
- **Flexible Structure:** Data is organized in a hierarchical structure where collections contain documents. Documents can be nested objects, and collections can have subcollections, providing flexibility in how you structure your data.
- **BigQuery Integration:**  Firestore integrates directly with BigQuery, allowing you to stream data into BigQuery or export query results to Firestore. ([Learn more about BigQuery integration](https://cloud.google.com/firestore/docs/solutions/bigquery?hl=en))
- **Generative AI Features:** Firestore offers integrated GenAI features, such as generating text embeddings and seamless integration with LangChain.
- **Vector Similarity Search:** Firestore provides built-in vector similarity search with indexing for efficient nearest neighbor matching. ([Learn more about vector search](https://cloud.google.com/firestore/docs/vector-search?hl=en))

---

**Use Case Data**

Buying a home usually involves borrowing money from a lending institution, typically through a mortgage secured by the home's value. But how do these institutions manage the risks associated with such large loans, and how are lending standards established?

In the United States, two government-sponsored enterprises (GSEs) play a vital role in the housing market:

- Federal National Mortgage Association ([Fannie Mae](https://www.fanniemae.com/))
- Federal Home Loan Mortgage Corporation ([Freddie Mac](https://www.freddiemac.com/))

These GSEs purchase mortgages from lenders, enabling those lenders to offer more loans. This process also allows Fannie Mae and Freddie Mac to set standards for mortgages, ensuring they are responsible and borrowers are more likely to repay them. This system makes homeownership more affordable and stabilizes the housing market by maintaining a steady flow of liquidity for lenders and keeping interest rates controlled.

However, navigating the complexities of these GSEs and their extensive servicing guides can be challenging.

**Approaches**

[This series](../readme.md) covers many generative AI workflows. These documents are used directly as long context for Gemini in the workflow [Long Context Retrieval With The Vertex AI Gemini API](../Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb). The workflow below uses a [retrieval](./readme.md) approach with the already generated chunks and embeddings.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [23]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [24]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [25]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.69.0'),
    ('google.cloud.firestore', 'google-cloud-firestore')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [26]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable firestore.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [27]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [28]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [29]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'retrieval-firestore'

Packages

In [125]:
import os, json, time, glob, datetime, asyncio

# Vertex AI
from google.cloud import aiplatform
import vertexai.language_models # for embeddings API
import vertexai.generative_models # for Gemini Models
from vertexai.resources.preview import feature_store

# firestore
from google.cloud import firestore
from google.cloud import firestore_v1
from google.cloud import firestore_admin_v1

In [31]:
aiplatform.__version__

'1.71.0'

Clients

In [198]:
# vertex ai clients
vertexai.init(project = PROJECT_ID, location = REGION)

# firestore clients
fs = firestore.Client(project = PROJECT_ID)
fs_async = firestore.AsyncClient(project = PROJECT_ID)
fs_admin = firestore_admin_v1.FirestoreAdminClient()

---
## Text & Embeddings For Examples

This repository contains a [section for document processing (chunking)](../Chunking/readme.md) that includes an example of processing mulitple large pdfs (over 1000 pages) into chunks: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb).  The chunks of text from that workflow are stored with this repository and loaded by another companion workflow that augments the chunks with text embeddings: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb).

The following code will load the version of the chunks that includes text embeddings and prepare it for a local example of retrival augmented generation.

### Get The Documents

If you are working from a clone of this notebooks [repository](https://github.com/statmike/vertex-ai-mlops) then the documents are already present. The following cell checks for the documents folder and if it is missing gets it (`git clone`):

In [213]:
local_dir = '../Embeddings/files/embeddings-api'

In [214]:
if not os.path.exists(local_dir):
    print('Retrieving documents...')
    parent_dir = os.path.dirname(local_dir)
    temp_dir = os.path.join(parent_dir, 'temp')
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)
    !git clone https://www.github.com/statmike/vertex-ai-mlops {temp_dir}/vertex-ai-mlops
    shutil.copytree(f'{temp_dir}/vertex-ai-mlops/Applied GenAI/Embeddings/files/embeddings-api', local_dir)
    shutil.rmtree(temp_dir)
    print(f'Documents are now in folder `{local_dir}`')
else:
    print(f'Documents Found in folder `{local_dir}`')             

Documents Found in folder `../Embeddings/files/embeddings-api`


### Load The Chunks

In [215]:
jsonl_files = glob.glob(f"{local_dir}/large-files*.jsonl")
jsonl_files.sort()
jsonl_files

['../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0000.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0001.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0002.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0003.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0004.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0005.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0006.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0007.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0008.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0009.jsonl']

In [216]:
chunks = []
for file in jsonl_files:
    with open(file, 'r') as f:
        chunks.extend([json.loads(line) for line in f])
len(chunks)

9040

### Review A Chunk

In [217]:
chunks[0].keys()

dict_keys(['instance', 'predictions', 'status'])

In [218]:
chunks[0]['instance']['chunk_id']

'fannie_part_0_c17'

In [219]:
print(chunks[0]['instance']['content'])

# Selling Guide Fannie Mae Single Family

## Fannie Mae Copyright Notice

### Fannie Mae Copyright Notice

|-|
| Section B3-4.2, Verification of Depository Assets 402 |
| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |
| B3-4.2-02, Depository Accounts (12/14/2022) 405 |
| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |
| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |
| B3-4.2-05, Foreign Assets (05/04/2022) 411 |
| Section B3-4.3, Verification of Non-Depository Assets 412 |
| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |
| B3-4.3-02, Trust Accounts (04/01/2009) 413 |
| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |
| B3-4.3-04, Personal Gifts (09/06/2023) 415 |
| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |
| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |
| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |
| B3-4.3-08, Employer Assistance (09/29/2015) 423 |
| B3-4.3-09,

In [220]:
chunks[0]['predictions'][0]['embeddings']['values'][0:10]

[0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732,
 0.05066155269742012,
 0.046544693410396576,
 0.05509665608406067,
 -0.014074751175940037,
 0.008380400016903877]

### Prepare Chunk Structure

Make a list of dictionaries with information for each chunk:

In [221]:
content_chunks = [
    dict(
        gse = chunk['instance']['gse'],
        chunk_id = chunk['instance']['chunk_id'],
        content = chunk['instance']['content'],
        embedding = chunk['predictions'][0]['embeddings']['values']
    ) for chunk in chunks
]

### Query Embedding

Create a query, or prompt, and get the embedding for it:

Connect to models for text embeddings. Learn more about the model API:
- [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

In [222]:
question = "Does a lender have to perform servicing functions directly?"

In [223]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')

In [224]:
question_embedding = embedder.get_embeddings([question])[0].values
question_embedding[0:10]

[-0.0005117303808219731,
 0.009651427157223225,
 0.01768726110458374,
 0.014538003131747246,
 -0.01829824410378933,
 0.027877431362867355,
 -0.021124685183167458,
 0.008830446749925613,
 -0.02669006586074829,
 0.06414774805307388]

---
## Setup Firestore

Firestore is a scalable database for mobile, web, and server development. Scalable because:
- there is no need to pre-provision storage, just add documents and pay for what you use
- flexibile data model supporting hierarchical structures
- multi-region data replication
- data synchronization to update data on connected devices
- much more!

**Firestore Databases**

Firestore as a service hosts databases.  Setting up a new database starts with choosing between two modes:
- Firestore in Native mode (**used below**)
    - scales to millions of concurrent clients.  Great for mobile and web apps.
- Firestore in Datastore Mode
    - scales to millions of writes per seconds
    - backwards compatible with datastore APIs
    - no real-time capabilities from Firestore

The first database created can be named `(default)` and will have a [free quota tier](https://cloud.google.com/firestore/quotas) for getting started (see pricing next).
    
**Firestore Pricing**

Regardless of the mode choosen [the pricing](https://cloud.google.com/firestore/pricing) is the same and based on size of data stored and network usage (read, write, delete, transfer).  For the `(default)` database you get a [free tier](https://cloud.google.com/firestore/pricing) for stored data, reads, writes, deletes, and data transfers with daily limits. This free tier covers this demonstration workflow as long as the database is named `(default)` and the createion of the database below uses this name.

**Firestore Data Structure**

Within a database data is stored as **documents**. Think of a document as a dictionary or JSON where there can be many field with values, including nested fields and values (nesting is referred to as a map).  These documents are limits in size to 1MB.  A document has [these supported data types](https://cloud.google.com/firestore/docs/concepts/data-types) which are more expansive than JSON and even include a type for vectors which is used in this workflow.

Documents are store in **collections** which are just named containers for a group of documents that make queries simple.  

Hierarchies of upto 100 levels can be created in a specific way.  A document can contain a **subcollection** which is just a nested container for group of documents.  The key concept here is that the subcollection must be contained within a document, not directly under a collection or another subcollection.

Remember that Firestore is schemaless so documents can have different fields or even the same fields with different data types even if they are in the same collection.  Documents just need unique keys, the name, like an id, and Firestore can create random IDs automatically if needed.

**Working With Firestore**

Firestore has rich ecosystem of [client libraries](https://cloud.google.com/firestore/docs/reference/libraries) as well as direct HTTP and RPC calls.  There are mobile and web SDK for common environments (Web, iOS+, Android, Flutter).  Then there are server client libraries  in many common languages (C#, Go, Java, Node.js, PHP, Python, and Ruby).  These are accompanied by the Firebase Admin SDK.

Here the [Google Cloud Python Client Library for Firestore](https://cloud.google.com/python/docs/reference/firestore/latest/index.html) is used.

### Create/Retrieve A Database

The starting point for using Firestore is a database.  No preplanning is need for storage size or compute so creating the database is the key point and involved choosing a mode (covered above) and using the name `(default)` for free tier quota.

Documentation References:
- [Create and manage databases](https://cloud.google.com/firestore/docs/manage-databases)
- [Choosing between Native mode and Datastore mode](https://cloud.google.com/firestore/docs/firestore-or-datastore)
- [Python Client for Cloud Firestore API](https://cloud.google.com/python/docs/reference/firestore/latest)
    - [FirestoreAdminClient](https://cloud.google.com/python/docs/reference/firestore/latest/google.cloud.firestore_admin_v1.services.firestore_admin.client.FirestoreAdminClient)

In [225]:
try:
    database = fs_admin.get_database(name = f'projects/{PROJECT_ID}/databases/(default)')
    print(f"Found the `default` database: {database.name}")
except Exception:
    print(f"Creating the `default` database...")
    create_db = fs_admin.create_database(
        request = firestore_admin_v1.types.CreateDatabaseRequest(
            parent = f"projects/{PROJECT_ID}",
            database_id = '(default)',
            database = firestore_admin_v1.types.database.Database(
                type_ = firestore_admin_v1.types.database.Database.DatabaseType.FIRESTORE_NATIVE,
                location_id = REGION
            )
        )
    )
    print('Waiting on creation to complete...')
    create_db.result()
    database = fs_admin.get_database(name = f'projects/{PROJECT_ID}/databases/(default)')
    print(f'Created the `default` database: {database.name}')

Found the `default` database: projects/statmike-mlops-349915/databases/(default)


In [226]:
database

name: "projects/statmike-mlops-349915/databases/(default)"
uid: "b29a6d74-7ad0-4232-9850-519e3471912b"
create_time {
  seconds: 1729703138
  nanos: 163054000
}
update_time {
  seconds: 1729703138
  nanos: 163054000
}
location_id: "us-central1"
type_: FIRESTORE_NATIVE
concurrency_mode: PESSIMISTIC
version_retention_period {
  seconds: 3600
}
earliest_version_time {
  seconds: 1731779065
  nanos: 660085000
}
app_engine_integration_mode: DISABLED
point_in_time_recovery_enablement: POINT_IN_TIME_RECOVERY_DISABLED
delete_protection_state: DELETE_PROTECTION_DISABLED
etag: "IMrH9ZTB4YkDMInatbn+pIkD"

---
## Working With Firestore

### Create/Retrieve Collection

The client is setup above without a specific database reference which will defer to the `default` database.  Collection can be directly referenced even if they don't exists yet.

Documentation References:
- [Use a server client library](https://cloud.google.com/firestore/docs/create-database-server-client-library)
- [Python Client for Cloud Firestore API](https://cloud.google.com/python/docs/reference/firestore/latest)
    - [Firestore Client](https://cloud.google.com/python/docs/reference/firestore/latest/google.cloud.firestore_v1.client)

In [227]:
collection = fs.collection(f'{SERIES}-{EXPERIMENT}')
doc_list = collection.limit(1).get()

In [228]:
if doc_list:
    print(f"Collection '{collection.id}' exists.")
else:
    print(f"Collection '{collection.id}' does not exist.")

Collection 'applied-genai-retrieval-firestore' exists.


Delete content from the environment to start this workflow fresh:

In [229]:
if doc_list:
    clear_collection = fs.recursive_delete(collection)

### Prepare Data For Firestore

Basically, think dictionary or JSON, with `key:value` pairs and nesting is supported (and referred to as a map).  Read more about [data structure](https://cloud.google.com/firestore/docs/concepts/structure-data) and [supported data types](https://cloud.google.com/firestore/docs/concepts/data-types), specifically the vector type that is used to store the embedding vector rather than array which is also supported.

#### Get A Record/Document

Dictionaries for each record/row are stored in `content_chunks` from earlier in this workflow:

In [230]:
first_record = content_chunks[0].copy()

In [231]:
first_record.keys()

dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

In [232]:
first_record['chunk_id']

'fannie_part_0_c17'

In [233]:
type(first_record['embedding'])

list

#### Prepare The Document

Convert the embedding array (list of floats) into the `Vector` data type for Firestore:

In [234]:
first_record['embedding'] = firestore_v1.vector.Vector(first_record['embedding'])

In [235]:
type(first_record['embedding'])

google.cloud.firestore_v1.vector.Vector

In [236]:
len(first_record['embedding'])

768

In [237]:
first_record['embedding'][0:5]

(0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732)

### Add, Retrive, And Delete Documents To The Database>Collection

Learn about inserting, retrieving, and deleting records/rows with the following simple examples.

From the collection object you can refer to a `.document('name')` object by name, even prior to creating it.  From this document object you can add its data to the database with `.set()`, update its contents with `.update` and remove the document with `.delete()`. 

#### Insert Document


In [238]:
document = collection.document(first_record['chunk_id'])
document.set(first_record)

update_time {
  seconds: 1731782673
  nanos: 262245000
}

In [275]:
# alternative - combined steps
#collection.document(first_record['chunk_id']).set(first_record)

#### Retrieve Document

In [262]:
local_document = document.get()

In [263]:
local_document = local_document.to_dict()
local_document.keys()

dict_keys(['content', 'embedding', 'chunk_id', 'gse'])

In [264]:
print(local_document['gse'])

fannie


In [265]:
type(local_document['embedding'])

google.cloud.firestore_v1.vector.Vector

#### Delete Document Field

In [266]:
document.update(dict(content = firestore.DELETE_FIELD))

update_time {
  seconds: 1731783297
  nanos: 799459000
}

In [267]:
local_document = document.get()

In [268]:
local_document = local_document.to_dict()
local_document.keys()

dict_keys(['chunk_id', 'embedding', 'gse'])

#### Delete Document

In [269]:
document.delete()

DatetimeWithNanoseconds(2024, 11, 16, 18, 55, 1, 455058, tzinfo=datetime.timezone.utc)

In [270]:
document.get().exists

False

### Load Data

There are a lot of rows to load but [using `asyncio`](https://docs.python.org/3/library/asyncio.html) with the async connection makes this easy to orchestrate:

Verify the collection is empty:

In [297]:
async_collection = fs_async.collection(f'{SERIES}-{EXPERIMENT}')

In [298]:
doc_list = await async_collection.limit(1).get()
len(doc_list)

0

Prepare the embedding values as `Vector` type for all records:

In [254]:
for chunk in content_chunks:
    if type(chunk['embedding']) == list:
        chunk['embedding'] = firestore_v1.vector.Vector(chunk['embedding'])

Create a list of tasks that will add documents to the collection:

In [None]:
tasks = [async_collection.document(chunk['chunk_id']).set(chunk) for chunk in content_chunks]

Run all the tasks with `asyncio.gather` and `await` the results: `results = await asyncio.gather(*tasks)`

Here, the tasks are run in batches with the following for loop:

In [303]:
results = []
for b in range(0, len(tasks), 1000):
    batch = tasks[b:b+1000]
    print(f"Running tasks {len(results)+1} through {len(results)+len(batch)}...")
    batch_results = await asyncio.gather(*batch)
    results.extend(batch_results)

Running tasks 1 through 1000...
Running tasks 1001 through 2000...
Running tasks 2001 through 3000...
Running tasks 3001 through 4000...
Running tasks 4001 through 5000...
Running tasks 5001 through 6000...
Running tasks 6001 through 7000...
Running tasks 7001 through 8000...
Running tasks 8001 through 9000...
Running tasks 9001 through 9040...


Verify the results:

In [305]:
doc_list = collection.limit(1).get()
len(doc_list)

1

---
## Vector Similarity Search, Matching

This section covers the operation of using a vector similarity metric calculation to find nearest neighbors for a query vector while also taking advantage of indexing.  To understand similarity metrics and motivate the intution for choosing one (choose dot product), check out [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb).


**Notes On [Vector Search](https://firebase.google.com/docs/firestore/vector-search) With Firestore**

The workflow below shows setting up indexes and using them for vector search.  Searching requires an index to be created. Multiple indexes can be created.  The distance measure is part of the search query.

### Check For Vector Indexes

At this point in the workflow no vector indexes have been created.  The following cells show how to check for indexes and will be reused later in the workflow to verify the details of indexes after they are created.

In [306]:
indexes = list(fs_admin.list_indexes(
    parent = f"{database.name}/collectionGroups/{collection.id}"
))
indexes

[]

### Create And Use An Index

Indexes are the only way to use vector search in Firestore.  Firestore does not offer approximate search methods (IVF, HNSW, ScaNN, etc.) but does offer an index for brute force searches across all rows with an index type named `flat`.

Indexes are created on the database with the admin client.  Then, they can be used allong with colletion and the [.find_nearest()](https://firebase.google.com/docs/firestore/vector-search#make_a_nearest-neighbor_query) method.

To do pre-filter, searching for neighbors in a subset or documents, a composite index will be needed that includes the field(s) to be used for subsetting as well as the embedding field. The query will automatically use the index matches the query parameters and error out if a suitable index is not available.  The workflow here creates indexs for just embedding as well as embedding with a pre-filter field of `gse`.

**Distance Metric Choices**
- `DOT_PRODUCT` for inner product or dot product
- `COSINE` for cosine similarity
- `EUCLIDEAN` for Euclidean distance

Documentation Links For This Section:
- [Search with vector embeddings](https://firebase.google.com/docs/firestore/vector-search#create_and_manage_vector_indexes)

### Index: Flat (Brute Force) - Just Embedding

Check for existing index that has just the embedding field:

In [307]:
indexes = list(fs_admin.list_indexes(
    parent = f"{database.name}/collectionGroups/{collection.id}"
))

vector_index = next(
    (index for index in indexes if {'embedding'} == {field.field_path for field in index.fields if field.field_path != '__name__'}),
    None
)

vector_index

Create The index if not yet created:

In [308]:
if vector_index:
    print(f'Found an index for just the embedding:\n')
else:
    print(f'Creating an index for just the embedding ...\n')
    create_index = fs_admin.create_index(
        request = firestore_admin_v1.types.CreateIndexRequest(
            parent = f"{database.name}/collectionGroups/{collection.id}",
            index = firestore_admin_v1.types.Index(
                query_scope = firestore_admin_v1.types.Index.QueryScope.COLLECTION,
                fields = [
                    firestore_admin_v1.types.Index.IndexField(
                        field_path = 'embedding',
                        vector_config = firestore_admin_v1.types.Index.IndexField.VectorConfig(
                            dimension = len(question_embedding),
                            flat = firestore_admin_v1.types.Index.IndexField.VectorConfig.FlatIndex()
                        )
                    ),
                ]
            )
        )
    )
    index = create_index.result()
    vector_index = fs_admin.get_index(name = index.name)

Creating an index for just the embedding ...



Review the index details:

In [309]:
vector_index

name: "projects/statmike-mlops-349915/databases/(default)/collectionGroups/applied-genai-retrieval-firestore/indexes/CICAgNi47oMK"
query_scope: COLLECTION
fields {
  field_path: "__name__"
  order: ASCENDING
}
fields {
  field_path: "embedding"
  vector_config {
    dimension: 768
    flat {
    }
  }
}
state: READY

Query the index for matches:

In [310]:
matches = collection.find_nearest(
    vector_field = 'embedding',
    query_vector = firestore_v1.vector.Vector(question_embedding),
    limit = 5,
    distance_measure = firestore_v1.base_vector_query.DistanceMeasure.DOT_PRODUCT,
    distance_result_field = "dot_product"
).get()

In [311]:
[(match.id, match.get('dot_product')) for match in matches]

[('fannie_part_0_c352', 0.7099842015202704),
 ('freddie_part_4_c509', 0.6805260859043879),
 ('freddie_part_4_c510', 0.6753296984114657),
 ('fannie_part_0_c353', 0.6723706814818051),
 ('fannie_part_0_c326', 0.6683496311110355)]

In [312]:
matches[0].get('gse')

'fannie'

In [313]:
matches[0].to_dict()['content']

'# A3-3-03, Other Servicing Arrangements (12/15/2015)\n\nIntroduction This topic provides an overview of other servicing arrangements, including: • Subservicing • General Requirements for Subservicing Arrangements • Pledge of Servicing Rights and Transfer of Interest in Servicing Income\n\n## Subservicing\n\nA lender may use other organizations to perform some or all of its servicing functions. Fannie Mae refers to these arrangements as “subservicing” arrangements, meaning that a servicer (the “subservicer”) other than the contractually responsible servicer (the “master” servicer) is performing the servicing functions. The following are not considered to be subservicing arrangements: • when a computer service bureau is used to perform accounting and reporting functions; • when the originating lender sells and assigns servicing to another lender, unless the originating lender continues to be the contractually responsible servicer.'

### Index: Flat (Brute Force) - Composite with Field = `gse`

Check for existing index that has the embedding field and the gse field:

In [314]:
indexes = list(fs_admin.list_indexes(
    parent = f"{database.name}/collectionGroups/{collection.id}"
))

composite_index = next(
    (index for index in indexes if {'embedding', 'gse'} == {field.field_path for field in index.fields if field.field_path != '__name__'}),
    None
)

composite_index

Create The index if not yet created:

In [315]:
if composite_index:
    print(f'Found a composite index for the embedding and gse:\n')
else:
    print(f'Creating a composite index for just the embedding and gse ...\n')
    create_index = fs_admin.create_index(
        request = firestore_admin_v1.types.CreateIndexRequest(
            parent = f"{database.name}/collectionGroups/{collection.id}",
            index = firestore_admin_v1.types.Index(
                query_scope = firestore_admin_v1.types.Index.QueryScope.COLLECTION,
                fields = [
                    firestore_admin_v1.types.Index.IndexField(
                        field_path = 'gse',
                        order = firestore_admin_v1.types.Index.IndexField.Order.ASCENDING
                    ),
                    firestore_admin_v1.types.Index.IndexField(
                        field_path = 'embedding',
                        vector_config = firestore_admin_v1.types.Index.IndexField.VectorConfig(
                            dimension = len(question_embedding),
                            flat = firestore_admin_v1.types.Index.IndexField.VectorConfig.FlatIndex()
                        )
                    ),
                ]
            )
        )
    )
    index = create_index.result()
    composite_index = fs_admin.get_index(name = index.name)

Creating a composite index for just the embedding and gse ...



Review the index details:

In [316]:
composite_index

name: "projects/statmike-mlops-349915/databases/(default)/collectionGroups/applied-genai-retrieval-firestore/indexes/CICAgOi36pgK"
query_scope: COLLECTION
fields {
  field_path: "gse"
  order: ASCENDING
}
fields {
  field_path: "__name__"
  order: ASCENDING
}
fields {
  field_path: "embedding"
  vector_config {
    dimension: 768
    flat {
    }
  }
}
state: READY

Query the index for matches:

In [317]:
matches = collection.find_nearest(
    vector_field = 'embedding',
    query_vector = firestore_v1.vector.Vector(question_embedding),
    limit = 5,
    distance_measure = firestore_v1.base_vector_query.DistanceMeasure.DOT_PRODUCT,
    distance_result_field = "dot_product"
).get()

In [318]:
[(match.id, match.get('dot_product')) for match in matches]

[('fannie_part_0_c352', 0.7099842015202704),
 ('freddie_part_4_c509', 0.6805260859043879),
 ('freddie_part_4_c510', 0.6753296984114657),
 ('fannie_part_0_c353', 0.6723706814818051),
 ('fannie_part_0_c326', 0.6683496311110355)]

Query the index for matches with pre-filtering:

In [319]:
matches = collection.where(filter = firestore_v1.base_query.FieldFilter('gse', '==', 'fannie')).find_nearest(
    vector_field = 'embedding',
    query_vector = firestore_v1.vector.Vector(question_embedding),
    limit = 5,
    distance_measure = firestore_v1.base_vector_query.DistanceMeasure.DOT_PRODUCT,
    distance_result_field = "dot_product"
).get()

In [320]:
[(match.id, match.get('dot_product')) for match in matches]

[('fannie_part_0_c352', 0.7099842015202704),
 ('fannie_part_0_c353', 0.6723706814818051),
 ('fannie_part_0_c326', 0.6683496311110355),
 ('fannie_part_0_c92', 0.6614337345375677),
 ('fannie_part_0_c240', 0.6608578617010461)]

In [321]:
matches[0].get('gse')

'fannie'

In [322]:
matches[0].to_dict()['content']

'# A3-3-03, Other Servicing Arrangements (12/15/2015)\n\nIntroduction This topic provides an overview of other servicing arrangements, including: • Subservicing • General Requirements for Subservicing Arrangements • Pledge of Servicing Rights and Transfer of Interest in Servicing Income\n\n## Subservicing\n\nA lender may use other organizations to perform some or all of its servicing functions. Fannie Mae refers to these arrangements as “subservicing” arrangements, meaning that a servicer (the “subservicer”) other than the contractually responsible servicer (the “master” servicer) is performing the servicing functions. The following are not considered to be subservicing arrangements: • when a computer service bureau is used to perform accounting and reporting functions; • when the originating lender sells and assigns servicing to another lender, unless the originating lender continues to be the contractually responsible servicer.'

---
## Retrieval Augmented Generation (RAG)

Build a simple retrieval augmented generation process that enhances a query by retrieving context.  This is done here by constructing three functions for the stages:
- `retrieve` - a function that uses an embedding to search for matching context parts, pieces of texts
    - this uses the system built earlier in this workflow!
- `augment` - prepare chunks into a prompt
- `generate` - make the llm request with the augmented prompt

A final function is used to execute the workflow of rag:
- `rag` - a function that receives the query an orchestrates the workflow through `retrieve` > `augment` > `generate`

### Clients

In [323]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-002")

### Retrieve Function

In [324]:
def retrieve_firestore(query_embedding, n_matches = 5):

    matches = collection.find_nearest(
        vector_field = 'embedding',
        query_vector = firestore_v1.vector.Vector(query_embedding),
        limit = n_matches,
        distance_measure = firestore_v1.base_vector_query.DistanceMeasure.DOT_PRODUCT,
        distance_result_field = "dot_product"
    ).get()
    matches = [match.to_dict() for match in matches]
    
    return matches

### Augment Function

In [325]:
def augment(matches):

    prompt = ''
    for m, match in enumerate(matches):
        prompt += f"Context {m+1}:\n{match['content']}\n\n"
    prompt += f'Answer the following question using the provided contexts:\n'

    return prompt

### Generate Function

In [326]:
def generate(prompt):

    result = llm.generate_content(prompt)

    return result

### RAG Function

In [327]:
def rag(query):
    
    query_embedding = embedder.get_embeddings([query])[0].values
    matches = retrieve_firestore(query_embedding)
    prompt = augment(matches) + query
    result = generate(prompt)
    
    return result.text

### Example In Use

In [328]:
question

'Does a lender have to perform servicing functions directly?'

In [329]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender may use other organizations to perform some or all of its servicing functions, referring to this as "subservicing."  The contexts also detail the requirements and regulations surrounding these subservicing arrangements, including the roles of master servicers and subservicers.



---
### Profiling Performance

Profile the timing of each step in the RAG function for sequential calls. The environment choosen for this workflow is a minimal testing enviornment so load testing (simoultaneous requests) would not be helpful.

In [330]:
profile = []

In [331]:
def rag(query, profile = profile):
    
    timings = {}
    start_time = time.time()
    
    
    # 1. Get embeddings
    embedding_start = time.time()
    query_embedding = embedder.get_embeddings([query])[0].values
    timings['embedding'] = time.time() - embedding_start

    # 2. Retrieve from Bigtable
    retrieval_start = time.time()
    matches = retrieve_firestore(query_embedding)
    timings['retrieval_firestore'] = time.time() - retrieval_start

    # 3. Augment the prompt
    augment_start = time.time()
    prompt = augment(matches) + query
    timings['augment'] = time.time() - augment_start

    # 4. Generate text
    generate_start = time.time()
    result = generate(prompt)
    timings['generate'] = time.time() - generate_start

    total_time = time.time() - start_time
    timings['total'] = total_time
    
    profile.append(timings)
    
    return result.text

In [332]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender may use other organizations to perform some or all of its servicing functions, referring to this as "subservicing."  This involves a "master servicer" and a "subservicer," where the subservicer performs the functions on behalf of the master servicer.  However, there are stipulations and requirements around these arrangements, as detailed in the provided texts.



In [333]:
profile

[{'embedding': 0.17287921905517578,
  'retrieval_firestore': 1.753068447113037,
  'augment': 0.00027871131896972656,
  'generate': 0.8971724510192871,
  'total': 2.82340669631958}]

In [334]:
for i in range(100):
    response = rag(question)

### Report From Profile

In [335]:
all_timings = {}
for timings in profile:
    for key, value in timings.items():
        if key not in all_timings:
            all_timings[key] = []
        all_timings[key].append(value)

In [336]:
for key, values in all_timings.items():
    arr = np.array(values)
    print(f"Statistics for '{key}':")
    print(f"  Min: {np.min(arr):.4f} seconds")
    print(f"  Max: {np.max(arr):.4f} seconds")
    print(f"  Mean: {np.mean(arr):.4f} seconds")
    print(f"  Median: {np.median(arr):.4f} seconds")
    print(f"  Std Dev: {np.std(arr):.4f} seconds")
    print(f"  P95: {np.percentile(arr, 95):.4f} seconds")
    print(f"  P99: {np.percentile(arr, 99):.4f} seconds")
    print("")

Statistics for 'embedding':
  Min: 0.0466 seconds
  Max: 10.0823 seconds
  Mean: 0.1069 seconds
  Median: 0.0523 seconds
  Std Dev: 0.7057 seconds
  P95: 0.0744 seconds
  P99: 0.2228 seconds

Statistics for 'retrieval_firestore':
  Min: 0.1388 seconds
  Max: 1.7531 seconds
  Mean: 0.2297 seconds
  Median: 0.2286 seconds
  Std Dev: 0.1299 seconds
  P95: 0.2787 seconds
  P99: 0.5592 seconds

Statistics for 'augment':
  Min: 0.0000 seconds
  Max: 0.0003 seconds
  Mean: 0.0000 seconds
  Median: 0.0000 seconds
  Std Dev: 0.0000 seconds
  P95: 0.0001 seconds
  P99: 0.0001 seconds

Statistics for 'generate':
  Min: 0.5483 seconds
  Max: 1.1185 seconds
  Mean: 0.7176 seconds
  Median: 0.6960 seconds
  Std Dev: 0.0992 seconds
  P95: 0.8972 seconds
  P99: 1.0408 seconds

Statistics for 'total':
  Min: 0.7528 seconds
  Max: 11.1640 seconds
  Mean: 1.0542 seconds
  Median: 0.9677 seconds
  Std Dev: 0.7379 seconds
  P95: 1.2523 seconds
  P99: 1.7210 seconds



## Remove Resources

In [293]:
# delete all documents in the collection
#fs.recursive_delete(collection)

In [284]:
#for index in list(fs_admin.list_indexes(parent = f"{database.name}/collectionGroups/{collection.id}")):
#    fs_admin.delete_index(name = index.name)

In [286]:
# delete the database - be careful in case the database was being used for work prior to this workflow
#fs_admin.delete_database(name = database.name)