![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI%2FRetrieval&file=Retrieval+-+Memorystore.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Memorystore.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FRetrieval%2FRetrieval%2520-%2520Memorystore.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Memorystore.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Retrieval/Retrieval%20-%20Memorystore.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Retrieval - Memorystore (Redis)

In prior workflows, a series of documents was [processed into chunks](../Chunking/readme.md), and for each chunk, [embeddings](../Embeddings/readme.md) were created:

- Process: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb)
- Embed: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

Retrieving chunks for a query involves calculating the embedding for the query and then using similarity metrics to find relevant chunks. A thorough review of similarity matching can be found in [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb) - use dot product! As development moves from experiment to application, the process of storing and computing similarity is migrated to a [retrieval](./readme.md) system. This workflow is part of a [series of workflows exploring many retrieval systems](./readme.md).  

A detailed [comparison of many retrieval systems](./readme.md#comparison-of-vector-database-solutions) can be found in the readme as well.

---

**Memorystore For Storage, Indexing, And Search**

[Google Cloud Memorystore](https://cloud.google.com/memorystore) is a fully managed in-memory data store service that can be used to enhance the performance of applications by caching frequently accessed data. Memorystore offers several in-memory database options, including [Redis](https://cloud.google.com/memorystore/docs/redis), [Redis Cluster](https://cloud.google.com/memorystore/docs/cluster), [Memcached](https://cloud.google.com/memorystore/docs/memcached), and [Valkey](https://cloud.google.com/memorystore/docs/valkey/). Each option provides unique features and capabilities to suit different needs.

This example uses [Redis](https://redis.io/docs/latest/), which is an open-source, in-memory data store that can be used as a database, cache, message broker, and streaming engine. In Redis, data is stored as key-value pairs. The key serves as an identifier to request or retrieve specific data. Redis offers a [variety of data types](https://redis.io/docs/latest/develop/data-types/) for values, including strings, lists, sets, hashes, and sorted sets. These data types provide flexibility in how you structure and organize your data. In addition to basic retrieval using keys, Redis offers querying capabilities across data elements, [including vector search](https://cloud.google.com/memorystore/docs/redis/about-vector-search) for efficient similarity matching.

---

**Use Case Data**

Buying a home usually involves borrowing money from a lending institution, typically through a mortgage secured by the home's value. But how do these institutions manage the risks associated with such large loans, and how are lending standards established?

In the United States, two government-sponsored enterprises (GSEs) play a vital role in the housing market:

- Federal National Mortgage Association ([Fannie Mae](https://www.fanniemae.com/))
- Federal Home Loan Mortgage Corporation ([Freddie Mac](https://www.freddiemac.com/))

These GSEs purchase mortgages from lenders, enabling those lenders to offer more loans. This process also allows Fannie Mae and Freddie Mac to set standards for mortgages, ensuring they are responsible and borrowers are more likely to repay them. This system makes homeownership more affordable and stabilizes the housing market by maintaining a steady flow of liquidity for lenders and keeping interest rates controlled.

However, navigating the complexities of these GSEs and their extensive servicing guides can be challenging.

**Approaches**

[This series](../readme.md) covers many generative AI workflows. These documents are used directly as long context for Gemini in the workflow [Long Context Retrieval With The Vertex AI Gemini API](../Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb). The workflow below uses a [retrieval](./readme.md) approach with the already generated chunks and embeddings.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.69.0'),
    ('google.cloud.redis', 'google-cloud-redis'),
    ('redis', 'redis', '5.0.8')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable redis.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'retrieval-memorystore-redis'

# Redis names
REDIS_INSTANCE_NAME = EXPERIMENT

Packages

In [8]:
import os, json, time, glob, datetime, tempfile

import numpy as np
import redis

# Vertex AI
from google.cloud import aiplatform
import vertexai.language_models # for embeddings API
import vertexai.generative_models # for Gemini Models
from vertexai.resources.preview import feature_store

# memorystore
from google.cloud import redis_v1

In [9]:
aiplatform.__version__

'1.71.0'

Clients

In [10]:
# vertex ai clients
vertexai.init(project = PROJECT_ID, location = REGION)

# memorystore clients
redis_client = redis_v1.CloudRedisClient()

---
## Text & Embeddings For Examples

This repository contains a [section for document processing (chunking)](../Chunking/readme.md) that includes an example of processing mulitple large pdfs (over 1000 pages) into chunks: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb).  The chunks of text from that workflow are stored with this repository and loaded by another companion workflow that augments the chunks with text embeddings: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb).

The following code will load the version of the chunks that includes text embeddings and prepare it for a local example of retrival augmented generation.

### Get The Documents

If you are working from a clone of this notebooks [repository](https://github.com/statmike/vertex-ai-mlops) then the documents are already present. The following cell checks for the documents folder and if it is missing gets it (`git clone`):

In [11]:
local_dir = '../Embeddings/files/embeddings-api'

In [12]:
if not os.path.exists(local_dir):
    print('Retrieving documents...')
    parent_dir = os.path.dirname(local_dir)
    temp_dir = os.path.join(parent_dir, 'temp')
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)
    !git clone https://www.github.com/statmike/vertex-ai-mlops {temp_dir}/vertex-ai-mlops
    shutil.copytree(f'{temp_dir}/vertex-ai-mlops/Applied GenAI/Embeddings/files/embeddings-api', local_dir)
    shutil.rmtree(temp_dir)
    print(f'Documents are now in folder `{local_dir}`')
else:
    print(f'Documents Found in folder `{local_dir}`')             

Documents Found in folder `../Embeddings/files/embeddings-api`


### Load The Chunks

In [13]:
jsonl_files = glob.glob(f"{local_dir}/large-files*.jsonl")
jsonl_files.sort()
jsonl_files

['../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0000.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0001.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0002.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0003.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0004.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0005.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0006.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0007.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0008.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0009.jsonl']

In [14]:
chunks = []
for file in jsonl_files:
    with open(file, 'r') as f:
        chunks.extend([json.loads(line) for line in f])
len(chunks)

9040

### Review A Chunk

In [15]:
chunks[0].keys()

dict_keys(['instance', 'predictions', 'status'])

In [16]:
chunks[0]['instance']['chunk_id']

'fannie_part_0_c17'

In [17]:
print(chunks[0]['instance']['content'])

# Selling Guide Fannie Mae Single Family

## Fannie Mae Copyright Notice

### Fannie Mae Copyright Notice

|-|
| Section B3-4.2, Verification of Depository Assets 402 |
| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |
| B3-4.2-02, Depository Accounts (12/14/2022) 405 |
| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |
| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |
| B3-4.2-05, Foreign Assets (05/04/2022) 411 |
| Section B3-4.3, Verification of Non-Depository Assets 412 |
| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |
| B3-4.3-02, Trust Accounts (04/01/2009) 413 |
| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |
| B3-4.3-04, Personal Gifts (09/06/2023) 415 |
| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |
| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |
| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |
| B3-4.3-08, Employer Assistance (09/29/2015) 423 |
| B3-4.3-09,

In [18]:
chunks[0]['predictions'][0]['embeddings']['values'][0:10]

[0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732,
 0.05066155269742012,
 0.046544693410396576,
 0.05509665608406067,
 -0.014074751175940037,
 0.008380400016903877]

### Prepare Chunk Structure

Make a list of dictionaries with information for each chunk:

In [19]:
content_chunks = [
    dict(
        gse = chunk['instance']['gse'],
        chunk_id = chunk['instance']['chunk_id'],
        content = chunk['instance']['content'],
        embedding = chunk['predictions'][0]['embeddings']['values']
    ) for chunk in chunks
]

### Query Embedding

Create a query, or prompt, and get the embedding for it:

Connect to models for text embeddings. Learn more about the model API:
- [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

In [20]:
question = "Does a lender have to perform servicing functions directly?"

In [21]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')

In [22]:
question_embedding = embedder.get_embeddings([question])[0].values
question_embedding[0:10]

[-0.0005117303808219731,
 0.009651427157223225,
 0.01768726110458374,
 0.014538003131747246,
 -0.01829824410378933,
 0.027877431362867355,
 -0.021124685183167458,
 0.008830446749925613,
 -0.02669006586074829,
 0.06414774805307388]

---
## Setup Memorystore (Redis)

**Memorystore - Choosing and engine**

Memorystore offers multiple caching engines as a service: [Valkey](https://cloud.google.com/memorystore/docs/valkey/), [Redis](https://cloud.google.com/memorystore/docs/redis), [Redis Cluster](https://cloud.google.com/memorystore/docs/cluster), and [Memcached](https://cloud.google.com/memorystore/docs/memcached). Redis is offered as a single instance or a clustered instance where the instance is a series of shards that contains subsets of the cached data.

This workflow will use a single instance of [Redis](https://cloud.google.com/memorystore/docs/redis) because it [supports vector search](https://cloud.google.com/memorystore/docs/redis/about-vector-search).

**Memorystore data structure**

Memorystore, Redis in this case, is a key-value pair database.  In this case think of the key as the value that would be used to requests or retrieve data with, like a primary key in a database, or an entity id.  And data is actually a flexible, multi-parameter, object with multiple [possible data types](https://redis.io/docs/latest/develop/data-types/).  Beyond retriving data based on the key there are also querying capabilties across the data elements, including vector search across elements stored as embedding.

**Memorystore and Redis Data Types**

Redis has a concept called [modules](https://redis.io/docs/latest/develop/reference/modules/) which can extend the functionality of Redis. For instance, the [JSON data type module extension](https://redis.io/docs/latest/develop/data-types/json/) allows for storing values as JSON data.  Many examples found on the web for using this vector search capability  in Redis use it with the JSON module, like [this one](https://redis.io/docs/latest/develop/get-started/vector-database/).  In the [supported functionality](https://cloud.google.com/memorystore/docs/redis/supported-versions) for Memorystore Redis **modules are not supported**.  For this reason, the native [HASH data type](https://redis.io/docs/latest/develop/data-types/#hashes) is best to use for vector search feature.  While a Python dictionary is comparable to [a HASH](https://redis.io/docs/latest/develop/data-types/hashes/) it requires coverting vector embeddings to a [serialized form as bytes](https://cloud.google.com/memorystore/docs/redis/indexing-vectors) rather than an array of floats.

**Notes on Redis And The HASH Data Type**

[Redis Hashes](https://redis.io/docs/latest/develop/data-types/hashes/) are a native data type.  There is not a practical limit to the number of field that can be present in a value field other than available memory.  Redis has multiple databases and each connnection defaults to the database indexed 0.  Making use of more than one database is and advantaced topic - [read more here](https://redis.io/docs/latest/commands/select/).  With a database data is stored by key.  There is a practice of using prefixes within key values that is commonly referred to as a namespace. Redis offers TTL or expiration features for [keys](https://redis.io/docs/latest/develop/use/keyspace/#key-expiration) and individual [field expriation](https://redis.io/docs/latest/develop/data-types/hashes/#field-expiration). 

**Vector Search With Redis**

Google Cloud Memorystore Redis support these [versions of Redis](https://cloud.google.com/memorystore/docs/redis/supported-versions).  This [includes version 7.2](https://cloud.google.com/memorystore/docs/redis/supported-versions#redis_version_72) which is the first to offer built-in vector search functionality.

**Understanding Memorystore Costs**

[Pricing for Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/pricing) is straightforward and based on these parameters that are part of the instance creation:
- **Service Tier**: [read more about service tiers](https://cloud.google.com/memorystore/docs/redis/redis-tiers)
    - Basic - a simple Redis cache with a standalone instance (**Used in this workflow.**)
    - Standard - A high availability instance with cross-zone replication and automatic failover
- **Provisioned capacity**: 
    - Size in GB priced in $ per GB per hour. Capacity tiers make additional capacity progressively cheaper. (**The Minimum of 1 GB is used in this workflow.**)
- **Region**: 
    - Pricing varies by the region choosen for the instance.
- **Replicas**: 
    - For the Standard Service Tier you have the option to enable read replicas for distributed reads.

### Create/Retrieve An Instance

The startying point for using Redis on Memorystore is an instance.  This is where the engine (Redis), tier, memory limits, and version are all selected and launched.

Documentation References:
- [Create and managed Redis instances](https://cloud.google.com/memorystore/docs/redis/create-manage-instances)
- [Redis tier capabilities](https://cloud.google.com/memorystore/docs/redis/redis-tiers)
- [Pythond SDK for Memorystore Redis Admin](https://cloud.google.com/python/docs/reference/redis/latest/google.cloud.redis_v1.services.cloud_redis.CloudRedisClient)

In [23]:
try:
    redis_instance = redis_client.get_instance(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/instances/{REDIS_INSTANCE_NAME}"
    )
    print(f"Retrieved Redis instance: {redis_instance.name}")
except Exception:
    print(f"Creating Redis instance ...")
    create_instance = redis_client.create_instance(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        instance_id = REDIS_INSTANCE_NAME,
        instance = redis_v1.Instance(
            name = f"projects/{PROJECT_ID}/locations/{REGION}/instances/{REDIS_INSTANCE_NAME}",
            tier = redis_v1.Instance.Tier.BASIC,
            memory_size_gb = 1,
            redis_version = 'REDIS_7_2',
            transit_encryption_mode=redis_v1.Instance.TransitEncryptionMode.SERVER_AUTHENTICATION  # Enable TLS
        )
    )
    response = create_instance.result()
    redis_instance = redis_client.get_instance(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/instances/{REDIS_INSTANCE_NAME}"
    )
    print(f"Created Redis instance: {redis_instance.name}")

Retrieved Redis instance: projects/statmike-mlops-349915/locations/us-central1/instances/retrieval-memorystore-redis


In [24]:
redis_instance.name

'projects/statmike-mlops-349915/locations/us-central1/instances/retrieval-memorystore-redis'

---
## Working With Memory Store Redis

### Connect To Memorystore Redis Instance

Connect to the Redis instance using the [redis-py](https://redis-py.readthedocs.io/en/stable/) Python package. The following code creates two clients using [redis.Redis()](https://redis-py.readthedocs.io/en/stable/connections.html#generic-client):
- `decode_client` with `decode_responses = True`
- `bytes_client` with `decode_responses = False`

**Why two clients?**  We will be storing data in the [Redis Hashes](https://redis.io/docs/latest/develop/data-types/hashes/) data type which is like a Python dictionary. When using a Redis client to retrieve information its default form is bytes.  Adding the option `decode_responses = True` to the client setup will automatically decode responses.  In this workflow we are usinng vector embeddings which will need to be encoded as bytes prior to being sent to Redis.  This causes an issues on retrieval unless the client uses the default `decode_responses = Falses`.  For this reason two clients are setup.  The `bytes_client` is only needed when retrieving embedding values.

Store the certificate for connecting to the the instance in a temp file:

In [25]:
with tempfile.NamedTemporaryFile(delete=False) as cert_file:
    cert_file.write(redis_instance.server_ca_certs[0].cert.encode())  # Write certificate content
    cert_path = cert_file.name

Create the `decode_client` that will decode responses from bytes:

In [26]:
decode_client = redis.Redis(
    host = redis_instance.host,
    port = redis_instance.port,
    ssl = True,
    ssl_ca_certs = cert_path,
    decode_responses = True
)

Create the `bytes_client` that will not decode responses from bytes:

In [27]:
bytes_client = redis.Redis(
    host = redis_instance.host,
    port = redis_instance.port,
    ssl = True,
    ssl_ca_certs = cert_path,
    decode_responses = False
)

Test the connections with the `.ping()` method:

In [28]:
try:
    decode_client.ping() and bytes_client.ping()
    print("Connected to Redis successfully!")
except Exception as e:
    print(f"Error connecting to Redis: {e}")

Connected to Redis successfully!


### Prepare Data For Redis

There are multiple storage [data formats](https://redis.io/docs/latest/develop/data-types/) possible with Redis.  [Hashing](https://redis.io/docs/latest/develop/data-types/hashes/) is like a Python dictionary and offers many advantages, like the vector search capability that will be used here.  To use the hash data type the embeddings [need to be serialized as bytes before inserting](https://cloud.google.com/memorystore/docs/redis/indexing-vectors).

#### Get A Record

Dictionaries for each record/row are stored in `content_chunks` from earlier in this workflow:

In [34]:
first_record = content_chunks[0].copy()

In [35]:
first_record.keys()

dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

In [36]:
first_record['chunk_id']

'fannie_part_0_c17'

#### Prepare The Record

Use Numpy to convert the list of floats to an array and then to bytes:

In [37]:
first_record['embedding'] = np.array(first_record['embedding']).astype('float32').tobytes()

In [38]:
type(first_record['embedding'])

bytes

In [39]:
len(first_record['embedding'])

3072

### Add, Retrive, And Delete Records To The Instance

Learn about inserting, retrieving, and deleting records/rows with the following simple examples.

This uses the client to execute [Hash Commands](https://redis.io/docs/latest/commands/?group=hash)


#### Insert Row

Hash command [HSET](https://redis.io/docs/latest/commands/hset/)

In [44]:
first_record.keys()

dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

In [45]:
first_record['chunk_id']

'fannie_part_0_c17'

In [46]:
if decode_client.exists(first_record['chunk_id']):
    print(f"Found this record already in the database: {first_record['chunk_id']}")
else:
    print(f"Adding the record to the database: {first_record['chunk_id']}")
    decode_client.hset(first_record['chunk_id'], mapping = first_record)

Adding the record to the database: fannie_part_0_c17


Verify the record count after the insertion:

In [47]:
decode_client.dbsize()

1

#### Check For Values In The Row

There are multiple helpful command to check rows:

- Check for a specific value on a row by key: [HEXISTS](https://redis.io/docs/latest/commands/hexists/)
- List all keys for a record: [HKEYS](https://redis.io/docs/latest/commands/hkeys/)
- Get a count of keys for a record: [HLEN](https://redis.io/docs/latest/commands/hlen/)

In [66]:
decode_client.hexists(first_record['chunk_id'], 'chunk_id')

True

In [67]:
decode_client.hlen(first_record['chunk_id'])

4

In [68]:
decode_client.hkeys(first_record['chunk_id'])

['embedding', 'chunk_id', 'content', 'gse']

#### Execute Commands Directly

In this section the client are being used with native methods to interact with Redis.  It is also possible to use the `.execute_command()` method to directly execute commands.  This can be helpful as we will see later on in this workflow when creating and using vector indexes. 

Here is a comparison using the previous `hkeys` requests for all keys available for a specific record:

In [90]:
decode_client.hkeys(first_record['chunk_id'])

['embedding', 'chunk_id', 'content', 'gse']

In [91]:
decode_client.execute_command(f"HKEYS {first_record['chunk_id']}")

['embedding', 'chunk_id', 'content', 'gse']

#### Retrieve Values From Row

There are multiple helpful ways to retrieve rows:
- Get a single value from a row with [HGET](https://redis.io/docs/latest/commands/hget/)
- Get multiple values from a row with [HMGET](https://redis.io/docs/latest/commands/hmget/)
- Get all values from a row with [HGETALL](https://redis.io/docs/latest/commands/hgetall/)

Retrieve the value of `gse` for the record key:

In [58]:
decode_client.hget(first_record['chunk_id'], 'gse')

'fannie'

Retrieve the value of `gse` for the record key using the bytes client:
Note that the response is not decoded to a string:

In [48]:
bytes_client.hget(first_record['chunk_id'], 'gse')

b'fannie'

Retrieve the value of `content` for the record key:

In [49]:
decode_client.hget(first_record['chunk_id'], 'content')

'# Selling Guide Fannie Mae Single Family\n\n## Fannie Mae Copyright Notice\n\n### Fannie Mae Copyright Notice\n\n|-|\n| Section B3-4.2, Verification of Depository Assets 402 |\n| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |\n| B3-4.2-02, Depository Accounts (12/14/2022) 405 |\n| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |\n| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |\n| B3-4.2-05, Foreign Assets (05/04/2022) 411 |\n| Section B3-4.3, Verification of Non-Depository Assets 412 |\n| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |\n| B3-4.3-02, Trust Accounts (04/01/2009) 413 |\n| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |\n| B3-4.3-04, Personal Gifts (09/06/2023) 415 |\n| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |\n| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |\n| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |\n| B3-4.3-08, Employer Assistance (09/29/20

Retrieve multiple values, `gse` and `chunk_id`, for the record key:

In [50]:
decode_client.hmget(first_record['chunk_id'], ['gse', 'chunk_id'])

['fannie', 'fannie_part_0_c17']

Retrieve all values using the record key.  Note that the embedding was converted to bytes prior to storage and need to be returned as bytes and decoded locally. This requires using the `bytes_client`:

In [51]:
result = bytes_client.hgetall(first_record['chunk_id'])

In [53]:
result.keys()

dict_keys([b'embedding', b'chunk_id', b'content', b'gse'])

In [54]:
result[b'chunk_id']

b'fannie_part_0_c17'

Retrieve the value of `embedding` using the `bytes_client` since it was converted to bytes prior to be stored:

In [69]:
result = bytes_client.hget(first_record['chunk_id'], 'embedding')

In [70]:
type(result)

bytes

In [71]:
result[0:20]

b'o\x1c\x00=\xf2k\xfa<\x93\x042<\xbdn\x7f=.?\x04='

Convert the `embedding` value back to a list:

In [72]:
result = np.frombuffer(result).astype('float32').tolist()
result[0:5]

[5.866788208941463e-15,
 1.7867350394357162e-12,
 2.238900525478732e-13,
 5.023794965100525e-13,
 1.7146986224190288e-19]

#### Delete Row

Delete the row added here.  Verify the action by counting the rows before and after the deletion.

Hash Command [HDEL](https://redis.io/docs/latest/commands/hdel/)

In [73]:
decode_client.dbsize()

1

In [74]:
decode_client.delete(first_record['chunk_id'])

1

In [75]:
decode_client.dbsize()

0

### Load Data

In the single record example above a record was inserted using `HSET`.  Now all the records need to be inserted.  Redis has a usefull construct called [Redis pipelining](https://redis.io/docs/latest/develop/use/pipelining/) that allows for issuing mutiple commands all at once and note needing to wait for each individual command squentially which can drastically improve performance.  Check out [this example](https://redis-py.readthedocs.io/en/stable/examples/pipeline_examples.html).

This section do 4 things:
- Prepare the embedding value as bytes for all the records
- Use a pipeline to check for presence of each value in the database
- Use a pipeline to load all the records that were not found in the database
- Verify the record count of the database

Get the starting record count of the database:

In [77]:
decode_client.dbsize()

0

Prepare the embedding values as bytes for all records:

In [79]:
for chunk in content_chunks:
    if type(chunk['embedding']) != bytes:
        chunk['embedding'] = np.array(chunk['embedding']).astype('float32').tobytes()

Use a pipeline to check for the existance of each record in the database:

In [84]:
with decode_client.pipeline() as pipe:
    for chunk in content_chunks:
        pipe.exists(chunk['chunk_id'])
    exists_results = pipe.execute()

In [85]:
print(sum(exists_results), ' records already in database')

0  records already in database


Use a pipeline to load all records that were not already in the database:

In [86]:
with decode_client.pipeline() as pipe:
    load_indexes = []
    for i, (chunk, exists) in enumerate(zip(content_chunks, exists_results)):
        if not exists:
            load_indexes.append(i)
            pipe.hset(chunk['chunk_id'], mapping=chunk)
    load_results = pipe.execute()

In [87]:
# check for loading issues and give first failure id info for diagnostic
if all(load_results):
    print(f'All chunks({len(load_results)}) loaded successfully.')
else:
    print(f"During loading {load_results.count(0)} records were not successfully loaded.")
    first_fail_index = load_indexes[load_results.index(0)]
    print(f"Start troubleshooting with the record at index {first_fail_index} which as has 'chunk_id' = {content_chunks[first_fail_index]['chunk_id']}")

All chunks(9040) loaded successfully.


Verify the record count of the database:

In [88]:
decode_client.dbsize()

9040

---
## Vector Similarity Search, Matching

This section covers the operation of using a vector similarity metric calculation to find nearest neighbors for a query vector while also taking advantage of indexing.  To understand similarity metrics and motivate the intution for choosing one (choose dot product), check out [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb).

**Notes On [Vector Search](https://cloud.google.com/memorystore/docs/redis/about-vector-search) With Redis**

The workflow below shows setting up indexes and using them for vector search.  Searching requires an index to be created. Multiple indexes can be created and the search parameters require specifying the desired index to search.  The distance measure is part of the index.

### Check For Vector Indexes

At this point in the workflow no vector indexes have been created.  The following cells show how to check for indexes and will be reused later in the workflow to verify the details of indexes after they are created.

In [92]:
decode_client.execute_command(
    'FT._LIST'
)

[]

### Create And Use An Index

Indexes are the only way to search in Redis.  Usually indexes make search across many rows more efficient by first matching partitions or rows and then only comparing to rows within the selected partions.  This is still true with the HNSW partion type.  Brute force searches across all rows are also achieved through an index type of FLAT in Redis.  Since everything in Redis is in-memory, even these FLAT index searches across all rows are incredibly fast.

Details for [creating indexes](https://cloud.google.com/memorystore/docs/redis/ftcreate) with `FT.CREATE`:
- FLAT: Brute Force
    - an index of all records with embeddings that match the `PREFIX`
- HNSW: Hierarchical Navigable Small World
    - create a multilayer graph
    - faster queries across a smaller range or records retrieved using the graph

The index is specified during the search making it possible to have multiple indexes.  In fact, to do pre-filtered queries it would require a separate index for each desired pre-filter condition.

**Distance Metric Choices**
- `DISTANCE_METRIC IP` for inner product or dot product
- `DISTANCE_METRIC COSINE` for cosine similarity
- `DISTANCE_METRIC L2` for Euclidean distance

Documentation Links For This Section:
- [Google Cloud Memorystore Documentation For Vector Search Command](https://cloud.google.com/memorystore/docs/redis/vector-commands)
- [Redis Documentation For Vectors](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/vectors/)
- [Redis Documentation Examples for Vector Similarity](https://redis-py.readthedocs.io/en/stable/examples/search_vector_similarity_examples.htm)

#### Define Functions For Search and Parsing Responses


In [172]:
def parse_matches(results, include_embed = False):
    encoding = bytes_client.get_encoder().encoding
    responses = []
    for i in range(2, len(results), 2):
        response = {}
        for j in range(0, len(results[i]), 2):
            key = results[i][j].decode(encoding)
            value = results[i][j+1]
            if key == 'embedding':
                if include_embed:
                    response[key] = np.frombuffer(value, dtype='float32').tolist()
            elif key == 'distance':
                response[key] = float(value.decode(encoding))
            else:
                response[key] = value.decode(encoding)
        responses.append(response)
    responses.reverse()    
    return responses

In [208]:
def vector_search(index_name, query_embedding, n_matches = 5):

    query_args = [
        f"FT.SEARCH {index_name}",
        f"*=>[KNN {n_matches} @embedding $query_embedding AS distance]",
        "PARAMS",
        2,
        "query_embedding",
        np.array(query_embedding).astype('float32').tobytes(),
        "DIALECT",
        2,
    ]

    return bytes_client.execute_command(*query_args)

#### Index: Flat (Brute Force)

**References**
- [Redis: FLAT Index](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/vectors/#flat-index)

Create the index:

This index is a `FLAT` index with three options used (`TYPE`, `DIM`, `DISTANCE_METRIC`) so the attributed count is 3*2 = 6.

In [209]:
INDEX_NAME = 'flat-index'

try:
    check_index = decode_client.ft(INDEX_NAME).info()
    print(f"This index already exists: {check_index['index_name']}")
except Exception:
    print(f'Create the index ...')

    command = (
        f"FT.CREATE {INDEX_NAME} ON HASH "
        f"SCHEMA embedding VECTOR FLAT 6 "
        f"TYPE FLOAT32 "
        f"DIM {len(question_embedding)} "
        f"DISTANCE_METRIC IP"
    )

    decode_client.execute_command(command)
    
    print(f'Checking for index backfill ...')
    while True:
        check_index = decode_client.ft(INDEX_NAME).info()
        if check_index['backfill_in_progress'] == '1':
            complete_pct = 100*float(check_index['backfill_complete_percent'])
            print(f"Backfill still in progress with {complete_pct:.2f} percent complete ...")
            time.sleep(1)
        else:
            print(f"Backfill complete.")
            print(f"Index created and covers {check_index['num_docs']} records")
            break

This index already exists: flat-index


Review the index details:

In [210]:
decode_client.execute_command('FT._LIST')

['flat-index']

In [211]:
decode_client.ft(INDEX_NAME).info()

{'index_name': 'flat-index',
 'index_options': [],
 'index_definition': ['key_type',
  'HASH',
  'prefixes',
  [''],
  'default_score',
  '1'],
 'attributes': [['identifier',
   'embedding',
   'attribute',
   'embedding',
   'type',
   'VECTOR',
   'index',
   ['capacity',
    10240,
    'dimensions',
    768,
    'distance_metric',
    'IP',
    'data_type',
    'FLOAT32',
    'algorithm',
    ['name', 'FLAT', 'block_size', 1024]]]],
 'num_docs': '9040',
 'num_terms': '0',
 'num_records': '9040',
 'hash_indexing_failures': '0',
 'backfill_in_progress': '0',
 'backfill_complete_percent': '1.000000',
 'mutation_queue_size': '0',
 'recent_mutations_queue_delay': '0 sec'}

Query the index for matches:

In [212]:
matches = parse_matches(vector_search(INDEX_NAME, question_embedding))

In [214]:
matches[0]

{'distance': 0.290015816689,
 'chunk_id': 'fannie_part_0_c352',
 'content': '# A3-3-03, Other Servicing Arrangements (12/15/2015)\n\nIntroduction This topic provides an overview of other servicing arrangements, including: • Subservicing • General Requirements for Subservicing Arrangements • Pledge of Servicing Rights and Transfer of Interest in Servicing Income\n\n## Subservicing\n\nA lender may use other organizations to perform some or all of its servicing functions. Fannie Mae refers to these arrangements as “subservicing” arrangements, meaning that a servicer (the “subservicer”) other than the contractually responsible servicer (the “master” servicer) is performing the servicing functions. The following are not considered to be subservicing arrangements: • when a computer service bureau is used to perform accounting and reporting functions; • when the originating lender sells and assigns servicing to another lender, unless the originating lender continues to be the contractually re

#### Index: HNSW

**References**
- [Redis: HNSW Index](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/vectors/#hnsw-index)

Create the index:

This index is a `HNSW` index with five options used (`TYPE`, `DIM`, `DISTANCE_METRIC`, `M`, `EF_CONSTRUCTION`) so the attributed count is 5*2 = 10.

In [219]:
INDEX_NAME = 'hnsw-index'

try:
    check_index = decode_client.ft(INDEX_NAME).info()
    print(f"This index already exists: {check_index['index_name']}")
except Exception:
    print(f'Create the index ...')

    command = (
        f"FT.CREATE {INDEX_NAME} ON HASH "
        f"SCHEMA embedding VECTOR HNSW 10 "
        f"TYPE FLOAT32 "
        f"DIM {len(question_embedding)} "
        f"DISTANCE_METRIC IP "
        f"M 10 "
        f"EF_CONSTRUCTION 40"
    )

    decode_client.execute_command(command)
    
    print(f'Checking for index backfill ...')
    while True:
        check_index = decode_client.ft(INDEX_NAME).info()
        if check_index['backfill_in_progress'] == '1':
            complete_pct = 100*float(check_index['backfill_complete_percent'])
            print(f"Backfill still in progress with {complete_pct:.2f} percent complete ...")
            time.sleep(1)
        else:
            print(f"Backfill complete.")
            print(f"Index created and covers {check_index['num_docs']} records")
            break

Create the index ...
Checking for index backfill ...
Backfill still in progress with 0.00 percent complete ...
Backfill still in progress with 52.91 percent complete ...
Backfill complete.
Index created and covers 9040 records


Review the index details:

In [220]:
decode_client.execute_command('FT._LIST')

['hnsw-index', 'flat-index']

In [221]:
decode_client.ft(INDEX_NAME).info()

{'index_name': 'hnsw-index',
 'index_options': [],
 'index_definition': ['key_type',
  'HASH',
  'prefixes',
  [''],
  'default_score',
  '1'],
 'attributes': [['identifier',
   'embedding',
   'attribute',
   'embedding',
   'type',
   'VECTOR',
   'index',
   ['capacity',
    10240,
    'dimensions',
    768,
    'distance_metric',
    'IP',
    'data_type',
    'FLOAT32',
    'algorithm',
    ['name', 'HNSW', 'm', 10, 'ef_construction', 40, 'ef_runtime', 10]]]],
 'num_docs': '9040',
 'num_terms': '0',
 'num_records': '9040',
 'hash_indexing_failures': '0',
 'backfill_in_progress': '0',
 'backfill_complete_percent': '1.000000',
 'mutation_queue_size': '0',
 'recent_mutations_queue_delay': '0 sec'}

Query the index for matches:

In [222]:
matches = parse_matches(vector_search(INDEX_NAME, question_embedding))

In [225]:
matches[0]

{'distance': 0.319473981857,
 'chunk_id': 'freddie_part_4_c509',
 'content': "# (1) Notice requirements\n\nThe notice must advise the Borrower of the following: 1. The date the new Servicing Agent or Master Servicer undertakes the performance of the Servicing obligations 2. The name and address of the Servicer undertaking the performance of the Servicing obligations 3. The names and telephone numbers of the contact persons or departments where the Borrowers' inquiries relating to the transfer should be directed. (If toll-free numbers are not available, the letter must indicate that collect calls will be accepted.) Such names and telephone numbers must be provided for the party previously performing the Servicing obligations as well as the new Servicing Agent or Master Servicer undertaking the performance of the Servicing obligations. 4. The date when the party previously performing the Servicing obligation will no longer collect the Borrowers' payments and when the new Servicing Agent 

---
## Retrieval Augmented Generation (RAG)

Build a simple retrieval augmented generation process that enhances a query by retrieving context.  This is done here by constructing three functions for the stages:
- `retrieve` - a function that uses an embedding to search for matching context parts, pieces of texts
    - this uses the system built earlier in this workflow!
- `augment` - prepare chunks into a prompt
- `generate` - make the llm request with the augmented prompt

A final function is used to execute the workflow of rag:
- `rag` - a function that receives the query an orchestrates the workflow through `retrieve` > `augment` > `generate`

### Clients

In [227]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-002")

### Retrieve Function

In [228]:
def retrieve_memorystore(query_embedding, n_matches = 5):
    
    matches = parse_matches(vector_search('flat-index', query_embedding))
    
    return matches

### Augment Function

In [229]:
def augment(matches):

    prompt = ''
    for m, match in enumerate(matches):
        prompt += f"Context {m+1}:\n{match['content']}\n\n"
    prompt += f'Answer the following question using the provided contexts:\n'

    return prompt

### Generate Function

In [230]:
def generate(prompt):

    result = llm.generate_content(prompt)

    return result

### RAG Function

In [231]:
def rag(query):
    
    query_embedding = embedder.get_embeddings([query])[0].values
    matches = retrieve_memorystore(query_embedding)
    prompt = augment(matches) + query
    result = generate(prompt)
    
    return result.text

### Example In Use

In [232]:
question

'Does a lender have to perform servicing functions directly?'

In [233]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Context 1 explicitly states that a lender may use other organizations ("subservicing arrangements") to perform some or all of its servicing functions.  However,  the lender remains contractually responsible (the "master servicer") unless they sell and assign servicing to another lender and relinquish that responsibility.



---
### Profiling Performance

Profile the timing of each step in the RAG function for sequential calls. The environment choosen for this workflow is a minimal testing enviornment so load testing (simoultaneous requests) would not be helpful.

In [234]:
profile = []

In [235]:
def rag(query, profile = profile):
    
    timings = {}
    start_time = time.time()
    
    
    # 1. Get embeddings
    embedding_start = time.time()
    query_embedding = embedder.get_embeddings([query])[0].values
    timings['embedding'] = time.time() - embedding_start

    # 2. Retrieve from Bigtable
    retrieval_start = time.time()
    matches = retrieve_memorystore(query_embedding)
    timings['retrieval_memorystore'] = time.time() - retrieval_start

    # 3. Augment the prompt
    augment_start = time.time()
    prompt = augment(matches) + query
    timings['augment'] = time.time() - augment_start

    # 4. Generate text
    generate_start = time.time()
    result = generate(prompt)
    timings['generate'] = time.time() - generate_start

    total_time = time.time() - start_time
    timings['total'] = total_time
    
    profile.append(timings)
    
    return result.text

In [236]:
print(rag(question))

No.  A lender may use other organizations to perform some or all of its servicing functions through subservicing arrangements (Context 1).  However, the lender (master servicer) remains contractually responsible (Context 1).  The use of a subservicer must not interfere with the lender's ability to meet Fannie Mae's requirements (Context 4).



In [237]:
profile

[{'embedding': 0.1565110683441162,
  'retrieval_memorystore': 0.030549049377441406,
  'augment': 3.9577484130859375e-05,
  'generate': 0.6989932060241699,
  'total': 0.8860993385314941}]

In [238]:
for i in range(100):
    response = rag(question)

### Report From Profile

In [239]:
all_timings = {}
for timings in profile:
    for key, value in timings.items():
        if key not in all_timings:
            all_timings[key] = []
        all_timings[key].append(value)

In [240]:
for key, values in all_timings.items():
    arr = np.array(values)
    print(f"Statistics for '{key}':")
    print(f"  Min: {np.min(arr):.4f} seconds")
    print(f"  Max: {np.max(arr):.4f} seconds")
    print(f"  Mean: {np.mean(arr):.4f} seconds")
    print(f"  Median: {np.median(arr):.4f} seconds")
    print(f"  Std Dev: {np.std(arr):.4f} seconds")
    print(f"  P95: {np.percentile(arr, 95):.4f} seconds")
    print(f"  P99: {np.percentile(arr, 99):.4f} seconds")
    print("")

Statistics for 'embedding':
  Min: 0.0479 seconds
  Max: 0.1565 seconds
  Mean: 0.0556 seconds
  Median: 0.0522 seconds
  Std Dev: 0.0126 seconds
  P95: 0.0745 seconds
  P99: 0.0926 seconds

Statistics for 'retrieval_memorystore':
  Min: 0.0052 seconds
  Max: 0.0305 seconds
  Mean: 0.0068 seconds
  Median: 0.0067 seconds
  Std Dev: 0.0025 seconds
  P95: 0.0076 seconds
  P99: 0.0108 seconds

Statistics for 'augment':
  Min: 0.0000 seconds
  Max: 0.0001 seconds
  Mean: 0.0000 seconds
  Median: 0.0000 seconds
  Std Dev: 0.0000 seconds
  P95: 0.0000 seconds
  P99: 0.0000 seconds

Statistics for 'generate':
  Min: 0.5160 seconds
  Max: 1.0820 seconds
  Mean: 0.7345 seconds
  Median: 0.7339 seconds
  Std Dev: 0.1026 seconds
  P95: 0.9450 seconds
  P99: 1.0642 seconds

Statistics for 'total':
  Min: 0.5742 seconds
  Max: 1.1403 seconds
  Mean: 0.7969 seconds
  Median: 0.7938 seconds
  Std Dev: 0.1029 seconds
  P95: 1.0023 seconds
  P99: 1.1225 seconds



## Remove Resources

In [226]:
#decode_client.execute_command('FT.DROPINDEX flat-index')
#decode_client.execute_command('FT.DROPINDEX hnsw-index')

In [30]:
#decode_client.flushall() # empty all records in the instance, across all databases

In [26]:
#redis_client.delete_instance(name = redis_instance.name)