![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI%2FRetrieval&file=Retrieval+-+AlloyDB+For+PostgreSQL.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20AlloyDB%20For%20PostgreSQL.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FRetrieval%2FRetrieval%2520-%2520AlloyDB%2520For%2520PostgreSQL.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Retrieval/Retrieval%20-%20AlloyDB%20For%20PostgreSQL.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Retrieval/Retrieval%20-%20AlloyDB%20For%20PostgreSQL.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Retrieval - AlloyDB For PostgreSQL

In prior workflows, a series of documents was [processed into chunks](../Chunking/readme.md), and for each chunk, [embeddings](../Embeddings/readme.md) were created:

- Process: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb)
- Embed: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

Retrieving chunks for a query involves calculating the embedding for the query and then using similarity metrics to find relevant chunks. A thorough review of similarity matching can be found in [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb) - use dot product! As development moves from experiment to application, the process of storing and computing similarity is migrated to a [retrieval](./readme.md) system. This workflow is part of a [series of workflows exploring many retrieval systems](./readme.md).  

A detailed [comparison of many retrieval systems](./readme.md#comparison-of-vector-database-solutions) can be found in the readme as well.

---

**AlloyDB For Storage, Indexing, And Search**

- [**AlloyDB for PostgreSQL**](https://cloud.google.com/alloydb) is a fully managed database service on Google Cloud that is compatible with and significantly faster than standard PostgreSQL. 
- AlloyDB boasts a comprehensive suite of [generative AI features](https://cloud.google.com/alloydb/docs/ai), including the ability to generate embeddings and predictions through seamless integration with Vertex AI.  You can even request predictions from any [endpoint in Google Cloud](https://cloud.google.com/alloydb/docs/ai/model-endpoint-register-model#add-generic).
- Similarity metrics are built into AlloyDB through an optimized implementation of [`pgvector`](https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing), simply called [`vector`](https://cloud.google.com/alloydb/docs/ai/store-embeddings#required-extension) in AlloyDB. This allows for the creation of highly efficient inverted file (IVF) indexes for accelerated similarity search. 
- Additionally, AlloyDB offers the `alloydb_scann` extension, which implements the [ScaNN algorithm](https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md) for super-efficient nearest neighbor matching.

---

**Use Case Data**

Buying a home usually involves borrowing money from a lending institution, typically through a mortgage secured by the home's value. But how do these institutions manage the risks associated with such large loans, and how are lending standards established?

In the United States, two government-sponsored enterprises (GSEs) play a vital role in the housing market:

- Federal National Mortgage Association ([Fannie Mae](https://www.fanniemae.com/))
- Federal Home Loan Mortgage Corporation ([Freddie Mac](https://www.freddiemac.com/))

These GSEs purchase mortgages from lenders, enabling those lenders to offer more loans. This process also allows Fannie Mae and Freddie Mac to set standards for mortgages, ensuring they are responsible and borrowers are more likely to repay them. This system makes homeownership more affordable and stabilizes the housing market by maintaining a steady flow of liquidity for lenders and keeping interest rates controlled.

However, navigating the complexities of these GSEs and their extensive servicing guides can be challenging.

**Approaches**

[This series](../readme.md) covers many generative AI workflows. These documents are used directly as long context for Gemini in the workflow [Long Context Retrieval With The Vertex AI Gemini API](../Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb). The workflow below uses a [retrieval](./readme.md) approach with the already generated chunks and embeddings.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.69.0'),
    ('google.cloud.alloydb', 'google-cloud-alloydb'),
    ('google.cloud.alloydb.connector', 'google-cloud-alloydb-connector'),
    ('sqlalchemy', 'sqlalchemy'),
    ('pg8000', 'pg8000'),
    ('asyncpg', 'asyncpg')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable alloydb.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [None]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'retrieval-alloydb'

# AlloyDB Names
ALLOYDB_CLUSTER_NAME = PROJECT_ID
ALLOYDB_INSTANCE_NAME = PROJECT_ID
ALLOYDB_DATABASE_NAME = SERIES
ALLOYDB_TABLE_NAME = EXPERIMENT
ALLOYDB_NETWORK = 'Default' # Replace this with your network name if you are not using Default network

ALLOYDB_USER = 'test_alloydb'
ALLOYDB_PASS = 'test_alloydb_pass'

Packages

In [8]:
#!pip install google-cloud-aiplatform -U -q --user --force-reinstall

In [None]:
import os, json, time, glob, datetime, asyncio

import numpy as np

# Compute

from google.cloud import compute

# Vertex AI
from google.cloud import aiplatform
import vertexai.language_models # for embeddings API
import vertexai.generative_models # for Gemini Models
from vertexai.resources.preview import feature_store

# AlloyDB
from google.cloud import alloydb
import google.cloud.alloydb.connector
import pg8000
import sqlalchemy
import asyncpg
import sqlalchemy.ext.asyncio

In [10]:
aiplatform.__version__

'1.71.0'

Clients

In [11]:
# vertex ai clients
vertexai.init(project = PROJECT_ID, location = REGION)

# alloydb
alloydb_client = alloydb.AlloyDBAdminClient()

---
## Text & Embeddings For Examples

This repository contains a [section for document processing (chunking)](../Chunking/readme.md) that includes an example of processing mulitple large pdfs (over 1000 pages) into chunks: [Large Document Processing - Document AI Layout Parser](../Chunking/Large%20Document%20Processing%20-%20Document%20AI%20Layout%20Parser.ipynb).  The chunks of text from that workflow are stored with this repository and loaded by another companion workflow that augments the chunks with text embeddings: [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb).

The following code will load the version of the chunks that includes text embeddings and prepare it for a local example of retrival augmented generation.

### Get The Documents

If you are working from a clone of this notebooks [repository](https://github.com/statmike/vertex-ai-mlops) then the documents are already present. The following cell checks for the documents folder and if it is missing gets it (`git clone`):

In [12]:
local_dir = '../Embeddings/files/embeddings-api'

In [13]:
if not os.path.exists(local_dir):
    print('Retrieving documents...')
    parent_dir = os.path.dirname(local_dir)
    temp_dir = os.path.join(parent_dir, 'temp')
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)
    !git clone https://www.github.com/statmike/vertex-ai-mlops {temp_dir}/vertex-ai-mlops
    shutil.copytree(f'{temp_dir}/vertex-ai-mlops/Applied GenAI/Embeddings/files/embeddings-api', local_dir)
    shutil.rmtree(temp_dir)
    print(f'Documents are now in folder `{local_dir}`')
else:
    print(f'Documents Found in folder `{local_dir}`')             

Documents Found in folder `../Embeddings/files/embeddings-api`


### Load The Chunks

In [14]:
jsonl_files = glob.glob(f"{local_dir}/large-files*.jsonl")
jsonl_files.sort()
jsonl_files

['../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0000.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0001.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0002.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0003.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0004.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0005.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0006.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0007.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0008.jsonl',
 '../Embeddings/files/embeddings-api/large-files-chunk-embeddings-0009.jsonl']

In [15]:
chunks = []
for file in jsonl_files:
    with open(file, 'r') as f:
        chunks.extend([json.loads(line) for line in f])
len(chunks)

9040

### Review A Chunk

In [16]:
chunks[0].keys()

dict_keys(['instance', 'predictions', 'status'])

In [17]:
chunks[0]['instance']['chunk_id']

'fannie_part_0_c17'

In [18]:
print(chunks[0]['instance']['content'])

# Selling Guide Fannie Mae Single Family

## Fannie Mae Copyright Notice

### Fannie Mae Copyright Notice

|-|
| Section B3-4.2, Verification of Depository Assets 402 |
| B3-4.2-01, Verification of Deposits and Assets (05/04/2022) 403 |
| B3-4.2-02, Depository Accounts (12/14/2022) 405 |
| B3-4.2-03, Individual Development Accounts (02/06/2019) 408 |
| B3-4.2-04, Pooled Savings (Community Savings Funds) (04/01/2009) 411 |
| B3-4.2-05, Foreign Assets (05/04/2022) 411 |
| Section B3-4.3, Verification of Non-Depository Assets 412 |
| B3-4.3-01, Stocks, Stock Options, Bonds, and Mutual Funds (06/30/2015) 412 |
| B3-4.3-02, Trust Accounts (04/01/2009) 413 |
| B3-4.3-03, Retirement Accounts (06/30/2015) 414 |
| B3-4.3-04, Personal Gifts (09/06/2023) 415 |
| B3-4.3-05, Gifts of Equity (10/07/2020) 418 |
| B3-4.3-06, Grants and Lender Contributions (12/14/2022) 419 |
| B3-4.3-07, Disaster Relief Grants or Loans (04/01/2009) 423 |
| B3-4.3-08, Employer Assistance (09/29/2015) 423 |
| B3-4.3-09,

In [19]:
chunks[0]['predictions'][0]['embeddings']['values'][0:10]

[0.031277116388082504,
 0.03056905046105385,
 0.010865348391234875,
 0.0623614676296711,
 0.03228681534528732,
 0.05066155269742012,
 0.046544693410396576,
 0.05509665608406067,
 -0.014074751175940037,
 0.008380400016903877]

### Prepare Chunk Structure

Make a list of dictionaries with information for each chunk:

In [20]:
content_chunks = [
    dict(
        gse = chunk['instance']['gse'],
        chunk_id = chunk['instance']['chunk_id'],
        content = chunk['instance']['content'],
        embedding = chunk['predictions'][0]['embeddings']['values']
    ) for chunk in chunks
]

### Query Embedding

Create a query, or prompt, and get the embedding for it:

Connect to models for text embeddings. Learn more about the model API:
- [Vertex AI Text Embeddings API](../Embeddings/Vertex%20AI%20Text%20Embeddings%20API.ipynb)

In [21]:
question = "Does a lender have to perform servicing functions directly?"

In [22]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')

In [23]:
question_embedding = embedder.get_embeddings([question])[0].values
question_embedding[0:10]

[-0.0005117303808219731,
 0.009651427157223225,
 0.01768726110458374,
 0.014538003131747246,
 -0.01829824410378933,
 0.027877431362867355,
 -0.021124685183167458,
 0.008830446749925613,
 -0.02669006586074829,
 0.06414774805307388]

---
## Setup AlloyDB

AlloyDB is a fully managed, PostgreSQL-compatible database service on Google Cloud designed for high performance and scalability. This workflow will guide you through creating an AlloyDB cluster, configuring it with a read/write instance, loading data, and running queries. While AlloyDB offers high availability and read replicas, this example focuses on a single-node cluster to demonstrate the fundamental setup. You'll learn about different AlloyDB instance types and how they influence performance and cost (see [pricing](https://cloud.google.com/alloydb/pricing)). This workflow utilizes a minimal configuration for demonstration purposes, keeping costs low. At the end of this notebook, you'll find instructions for shutting down and deleting resources to avoid ongoing charges.


The setup here is done with the [Python Client for AlloyDB](https://cloud.google.com/python/docs/reference/alloydb/latest).  Alternatively, the console and [Cloud SDK `gcloud alloydb`](https://cloud.google.com/sdk/gcloud/reference/alloydb) as well as clients in other languages can be used.


### AlloyDB Network

AlloyDB instances use public or private IPs on a [Virtual Private Cloud(VPC)](https://cloud.google.com/vpc/docs/overview).

In [None]:
if ALLOYDB_NETWORK != 'Default':
    try:
        client = compute.NetworksClient()
        request = compute.GetNetworkRequest(
            project = PROJECT_ID,
            network = ALLOYDB_NETWORK,
        )
        client.get(compute.GetNetworkRequest(
            project = PROJECT_ID,
            network = ALLOYDB_NETWORK))
        print(f"Network '{ALLOYDB_NETWORK}' exists in project '{project_id}'.")
    except Exception:
        print(f"Network '{ALLOYDB_NETWORK} does not exists in project '{PROJECT_ID}'\nPlease check the network entered and if need review the following link:\nhttps://cloud.google.com/vpc/docs/create-modify-vpc-networks'")

### Create/Retrieve Cluster

The starting point for using AlloyDB is creating a cluster to hold our instance.  This is where the network can be set and an initial user can be set.  The compute choices follow in the instance creation.

The documentation can be referenced for:
- [Create Clusters](https://cloud.google.com/alloydb/docs/cluster-create)
- [Python Client `.create_cluster()`](https://cloud.google.com/python/docs/reference/alloydb/latest/google.cloud.alloydb_v1.services.alloy_db_admin.AlloyDBAdminClient#google_cloud_alloydb_v1_services_alloy_db_admin_AlloyDBAdminClient_create_cluster)

In [None]:
try:
    alloydb_cluster = alloydb_client.get_cluster(
        name = f'projects/{PROJECT_ID}/locations/{REGION}/clusters/{ALLOYDB_CLUSTER_NAME}'
    )
    print(f"Found the cluster: {alloydb_cluster.name}")

except Exception:
    print('Creating a cluster ...')
    cluster = alloydb.Cluster(
        initial_user = alloydb.UserPassword(
            user = ALLOYDB_USER,
            password = ALLOYDB_PASS
        ),
    )
    # Conditionally add network config
    if ALLOYDB_NETWORK != 'Default':
        cluster.network_config = alloydb.Cluster.NetworkConfig(
            network = f"projects/{PROJECT_ID}/global/networks/{ALLOYDB_NETWORK}"
        )
    create_cluster = alloydb_client.create_cluster(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        cluster_id = ALLOYDB_CLUSTER_NAME,
        cluster = cluster
    )
    alloydb_cluster = create_cluster.result()
    print(f"Created the cluster: {alloydb_cluster.name}")

Found the cluster: projects/statmike-mlops-349915/locations/us-central1/clusters/statmike-mlops-349915


In [159]:
#alloydb_cluster

### Create/Retrieve Instance

Now that we have a cluster the next step is creating a primary instance.  This is where compute size and location is selected. Later, the compute can still be [scaled up or down](https://cloud.google.com/alloydb/docs/instance-read-pool-scale).

The documentation can be referenced for:
- [Create primary instance](https://cloud.google.com/alloydb/docs/cluster-settings?resource=prim-instance)
- [Python Client `.create_innstance`](https://cloud.google.com/python/docs/reference/alloydb/latest/google.cloud.alloydb_v1.services.alloy_db_admin.AlloyDBAdminClient#google_cloud_alloydb_v1_services_alloy_db_admin_AlloyDBAdminClient_create_instance)


In [26]:
try:
    alloydb_instance = alloydb_client.get_instance(
        name = f"{alloydb_cluster.name}/instances/{ALLOYDB_INSTANCE_NAME}"
    )
    print(f"Found the instance: {alloydb_instance.name}")
except Exception:
    print('Creating an instance ...')
    create_instance = alloydb_client.create_instance(
        parent = alloydb_cluster.name,
        instance_id = ALLOYDB_INSTANCE_NAME,
        instance = alloydb.Instance(
            instance_type = alloydb.Instance.InstanceType.PRIMARY, # PRIMARY supports read/write, READ_POOL support read only
            machine_config = alloydb.Instance.MachineConfig(cpu_count = 2),
            availability_type = alloydb.Instance.AvailabilityType.ZONAL, # ZONAL is a a single zone, REGIONAL is multi-zone within the region (high availability)
            gce_zone = REGION+'-a', # add a zone to the region with availability_type = ZONAL
        )
    )
    alloydb_instance = create_instance.result()
    print(f"Created the instance: {alloydb_instance.name}")

Found the instance: projects/statmike-mlops-349915/locations/us-central1/clusters/statmike-mlops-349915/instances/statmike-mlops-349915


In [160]:
#alloydb_instance

name: "projects/statmike-mlops-349915/locations/us-central1/clusters/statmike-mlops-349915/instances/statmike-mlops-349915"
uid: "2770ab1d-63b9-459b-969f-d20cf42ef5e5"
create_time {
  seconds: 1730380657
  nanos: 626971868
}
update_time {
  seconds: 1731189236
  nanos: 212953983
}
state: READY
instance_type: PRIMARY
machine_config {
  cpu_count: 2
}
availability_type: ZONAL
ip_address: "172.16.0.5"
writable_node {
  zone_id: "us-central1-a"
}
query_insights_config {
  record_application_tags: false
  record_client_address: false
  query_string_length: 1024
  query_plans_per_minute: 5
}
client_connection_config {
  ssl_config {
    ssl_mode: ENCRYPTED_ONLY
  }
}

### Create/Retrieve User

We need a user account to login and use the AlloyDB instance.  In production this should be taken very seriously and access should be configured carefully to protect the environment.  An example user with password was created along with the cluster above.  Below, the Python cliennt is used to retrieve the user information.  In production greater care should be given to:
- [Managing User Roles](https://cloud.google.com/alloydb/docs/database-users/about)
- [Managed IAM authentication](https://cloud.google.com/alloydb/docs/manage-iam-authn)
- [Managing Password Policies](https://cloud.google.com/alloydb/docs/manage-password-policy)


In [28]:
alloydb_client.get_user(name = f"{alloydb_cluster.name}/users/{ALLOYDB_USER}")

name: "projects/statmike-mlops-349915/locations/us-central1/clusters/statmike-mlops-349915/users/test_alloydb"
database_roles: "alloydbsuperuser"
user_type: ALLOYDB_BUILT_IN

### Connections To Databases

AlloyDB has a default database, [postgres](https://cloud.google.com/alloydb/docs/database-create), like any other PostgreSQL instasnce.

There are many ways to [connect to a database](https://cloud.google.com/alloydb/docs/connection-overview) depending on where and how you need to connect.

Here we want to use Python and will use the convenient [AlloyDB Language Connectors](https://cloud.google.com/alloydb/docs/language-connectors-overview).

That means we will [create a connector](https://cloud.google.com/alloydb/docs/connect-language-connectors) and interact with the database through the connector.  A connector has three parts:
- a **connection tool**, in this case provided by [AlloyDB Python Connector](https://github.com/GoogleCloudPlatform/alloydb-python-connector/tree/main)
- drivers to create a **connection pool**
    - synchronous with [pg8000](https://github.com/tlocke/pg8000)
    - asynchronous with [asyncpg](https://github.com/MagicStack/asyncpg)
- a client library that can use connection pools to **orchestrate SQL queries**, [SQLAlchemy](https://www.sqlalchemy.org/)

#### Connection Tool

In [29]:
sync_connector = google.cloud.alloydb.connector.Connector()
async_connector = google.cloud.alloydb.connector.AsyncConnector()

#### Connection

In [30]:
def get_sync_conn(
    connector: google.cloud.alloydb.connector.Connector,
    db: str
):
    def getconn():
        conn = connector.connect(
            alloydb_instance.name,
            "pg8000",
            user = ALLOYDB_USER,
            password = ALLOYDB_PASS,
            db = db
        )
        return conn
    return getconn

In [31]:
async def get_async_conn(
    connector: google.cloud.alloydb.connector.AsyncConnector,
    db: str
):
    async def getconn():
        conn = await connector.connect(
            alloydb_instance.name,
            "asyncpg",
            user = ALLOYDB_USER,
            password = ALLOYDB_PASS,
            db = db
        )
        return conn
    return getconn

#### Connection Pool

In [32]:
def get_sync_pool(
    connector: google.cloud.alloydb.connector.Connector,
    db: str
) -> sqlalchemy.engine.Engine:

    pool = sqlalchemy.create_engine(
        "postgresql+pg8000://",
        creator = get_sync_conn(connector, db)
    )
    pool.dialect.description_encoding = None
    pool.execution_options(isolation_level="AUTOCOMMIT")
    return pool

In [33]:
async def get_async_pool(
    connector: google.cloud.alloydb.connector.Connector,
    db: str
) -> sqlalchemy.engine.Engine:

    pool = sqlalchemy.ext.asyncio.create_async_engine(
        "postgresql+asyncpg://",
        async_creator = await get_async_conn(connector, db)
    )
    pool.dialect.description_encoding = None
    pool.execution_options(isolation_level="AUTOCOMMIT")
    return pool

In [34]:
sync_pool = get_sync_pool(sync_connector, 'postgres')

In [35]:
async_pool = await get_async_pool(async_connector, 'postgres')

#### Query Orchestrator

Use the a pool as a context manager to orchstrate queries

In [36]:
def run_query(query, pool = None, connector = sync_connector):
    # get the current connnection pool:
    if pool is None:
        pool = sync_pool
        
    # run the query and get the response as 'result'
    with pool.connect().execution_options(isolation_level="AUTOCOMMIT") as connection:
        result = connection.execute(query)
        #connector.close()
        
    # prepare the response
    rows = []
    try:
        for row in result:
            rows.append(dict(zip(result.keys(), row)))
    except Exception:
        pass
    
    # return the response
    return rows[0] if len(rows) == 1 else rows

In [37]:
async def async_run_query(query, pool = None, connector = async_connector):
    # get the current connection pool
    if pool is None:
        pool = async_pool
        
    # run the query and get the response as 'result'
    async with pool.connect() as connection:
        result = await connection.execute(query)
        await connection.commit()
        #await connector.close()
        
    # prepare the response
    rows = []
    try:
        for row in result:
            rows.append(dict(zip(result.keys(), row)))
    except Exception:
        pass
    
    # return the response
    return rows[0] if len(rows) == 1 else rows

#### Execute A Test Query

When submitting SQL statements either of the connectors should work for DML (SELECT, INSERT, DELETE, UPDATE) but the synchronous connector should be preferred for DDL (CREATE, ALTER, DROP) statements.

In [38]:
query = sqlalchemy.text("SELECT 'Success' as did_it_work")

In [39]:
run_query(query)

{'did_it_work': 'Success'}

#### Execute Async Queries

Use [`asyncio`](https://docs.python.org/3/library/asyncio.html) to work with async queries.

In [40]:
await async_run_query(query)

{'did_it_work': 'Success'}

In [41]:
queries = [query]*5
tasks = [async_run_query(q) for q in queries]
results = await asyncio.gather(*tasks)
results

[{'did_it_work': 'Success'},
 {'did_it_work': 'Success'},
 {'did_it_work': 'Success'},
 {'did_it_work': 'Success'},
 {'did_it_work': 'Success'}]

---
## Working With AlloyDB

Now that a connection to PostgreSQL is established the environment can be interacted with using, SQL!

### Create A Database

**PostgreSQL References:**
- [CREATE DATABASE](https://www.postgresql.org/docs/current/sql-createdatabase.html)
- [The catalog `pg_database`](https://www.postgresql.org/docs/current/catalog-pg-database.html)

In [58]:
query = sqlalchemy.text(f"SELECT datname FROM pg_database WHERE datname = '{ALLOYDB_DATABASE_NAME}'")
result = run_query(query)
result

[]

In [59]:
if not result:
    query = sqlalchemy.text(f"CREATE DATABASE \"{ALLOYDB_DATABASE_NAME}\"")
    run_query(query)

In [60]:
query = sqlalchemy.text(f"SELECT * FROM pg_database WHERE datname = '{ALLOYDB_DATABASE_NAME}'")
run_query(query)

{'oid': 118285,
 'datname': 'applied-genai',
 'datdba': 16470,
 'encoding': 6,
 'datlocprovider': 'i',
 'datistemplate': False,
 'datallowconn': True,
 'datconnlimit': -1,
 'datfrozenxid': 720,
 'datminmxid': 1,
 'dattablespace': 1663,
 'datcollate': 'C',
 'datctype': 'C',
 'daticulocale': 'und-x-icu',
 'datcollversion': '153.112',
 'datacl': None}

### Move Connection To New Database

Note that the connection pool connects to a specific database.  Now that a new database is created we can switch the connection pool to it by first closing the existing connection pool and creating a new one.

Verify database for current connection:

In [61]:
run_query(sqlalchemy.text('SELECT current_database()'))

{'current_database': 'postgres'}

In [62]:
await async_run_query(sqlalchemy.text('SELECT current_database()'))

{'current_database': 'postgres'}

Close the current connection and create a new one:

In [42]:
sync_pool.dispose()
sync_connector.close()
sync_connector = google.cloud.alloydb.connector.Connector()
sync_pool = get_sync_pool(sync_connector, ALLOYDB_DATABASE_NAME)

await async_pool.dispose()
await async_connector.close()
async_connector = google.cloud.alloydb.connector.AsyncConnector()
async_pool = await get_async_pool(async_connector, ALLOYDB_DATABASE_NAME)

Verify the database of the new connection:

In [43]:
run_query(sqlalchemy.text('SELECT current_database()'))

{'current_database': 'applied-genai'}

In [44]:
await async_run_query(sqlalchemy.text('SELECT current_database()'))

{'current_database': 'applied-genai'}

### Create Table


**PostgreSQL References:**
- [CREATE TABLE statement](https://www.postgresql.org/docs/current/sql-createtable.html)
- [Information Schema Tables](https://www.postgresql.org/docs/current/information-schema.html)

In [66]:
result = run_query(sqlalchemy.text(f"SELECT * from information_schema.tables WHERE table_name = '{ALLOYDB_TABLE_NAME}'"))
result

[]

In [67]:
run_query(sqlalchemy.text(f"DROP TABLE IF EXISTS \"{ALLOYDB_TABLE_NAME}\""))

[]

In [68]:
run_query(
    sqlalchemy.text(f"""
            CREATE TABLE IF NOT EXISTS \"{ALLOYDB_TABLE_NAME}\" (
                chunk_id VARCHAR(100) NOT NULL PRIMARY KEY,
                gse VARCHAR(50),
                content TEXT,
                embedding REAL[]
            );
        """
    )
)

[]

In [69]:
result = run_query(sqlalchemy.text(f"SELECT * from information_schema.tables WHERE table_name = '{ALLOYDB_TABLE_NAME}'"))
result

{'table_catalog': 'applied-genai',
 'table_schema': 'public',
 'table_name': 'retrieval-alloydb',
 'table_type': 'BASE TABLE',
 'self_referencing_column_name': None,
 'reference_generation': None,
 'user_defined_type_catalog': None,
 'user_defined_type_schema': None,
 'user_defined_type_name': None,
 'is_insertable_into': 'YES',
 'is_typed': 'NO',
 'commit_action': None}

In [70]:
run_query(sqlalchemy.text(f"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{ALLOYDB_TABLE_NAME}'"))

[{'column_name': 'embedding', 'data_type': 'ARRAY'},
 {'column_name': 'chunk_id', 'data_type': 'character varying'},
 {'column_name': 'gse', 'data_type': 'character varying'},
 {'column_name': 'content', 'data_type': 'text'}]

### Add, Retrieve, And Delete Rows

Learn about inserting, retrieving and deleting records/rows with the following simple examples.

#### Get A Record

Dictionaries for each record/row are stored in `content_chunks` from earlier in this workflow:

In [71]:
first_record = content_chunks[0]

In [72]:
first_record.keys()

dict_keys(['gse', 'chunk_id', 'content', 'embedding'])

In [73]:
first_record['chunk_id']

'fannie_part_0_c17'

#### Insert Row

In [74]:
table = sqlalchemy.Table(
    ALLOYDB_TABLE_NAME,
    sqlalchemy.MetaData(),
    autoload_with = sync_pool
)

In [75]:
for c in table.columns:
    print(c)

retrieval-alloydb.chunk_id
retrieval-alloydb.gse
retrieval-alloydb.content
retrieval-alloydb.embedding


In [76]:
insert_row = sqlalchemy.insert(table).values(first_record)

In [77]:
run_query(insert_row)

[]

#### Retrieve Row

There are two helpful ways to retrieve rows.  Both with SQL and with the sqlalchemy clients `select` method.  Both are demonstrated here.

Using SQL:

In [78]:
query = sqlalchemy.text(f"SELECT * FROM \"{ALLOYDB_TABLE_NAME}\" WHERE chunk_id = '{first_record['chunk_id']}'")
result = run_query(query)

In [79]:
result.keys()

dict_keys(['chunk_id', 'gse', 'content', 'embedding'])

In [80]:
result['chunk_id']

'fannie_part_0_c17'

Using sqlalchemy clients `select` method:

In [81]:
query = sqlalchemy.select(table).where(table.columns.chunk_id == first_record['chunk_id'])
result = run_query(query)

In [82]:
result.keys()

dict_keys(['chunk_id', 'gse', 'content', 'embedding'])

In [83]:
result['chunk_id']

'fannie_part_0_c17'

In [84]:
type(result['embedding'])

list

In [85]:
result['embedding'][0:10]

[0.031277116,
 0.03056905,
 0.010865348,
 0.062361468,
 0.032286815,
 0.050661553,
 0.046544693,
 0.055096656,
 -0.014074751,
 0.0083804]

#### Delete Row

Delete the row added here.  Verify the action by counting the rows before and after the deletion.

In [86]:
run_query(sqlalchemy.text(f"SELECT COUNT(*) as count FROM \"{ALLOYDB_TABLE_NAME}\""))

{'count': 1}

In [87]:
run_query(sqlalchemy.text(f"DELETE FROM \"{ALLOYDB_TABLE_NAME}\" WHERE chunk_id = '{first_record['chunk_id']}'"))

[]

In [88]:
run_query(sqlalchemy.text(f"SELECT COUNT(*) as count FROM \"{ALLOYDB_TABLE_NAME}\""))

{'count': 0}

## Load Data 

There are a lot of rows to load but [using `asyncio`](https://docs.python.org/3/library/asyncio.html) with the async connection makes this easy to orchestrate:

Create a list of query statements:

In [89]:
queries = [sqlalchemy.insert(table).values(c) for c in content_chunks]

Create a list of task that will run the queries.  Do not `await` these yet.

In [90]:
tasks = [async_run_query(query) for query in queries]

Run all the tasks with `asyncio.gather` and await the result.

In [91]:
results = await asyncio.gather(*tasks)

Verify the results with a row count:

In [93]:
run_query(sqlalchemy.text(f"SELECT COUNT(*) as count FROM \"{ALLOYDB_TABLE_NAME}\""))

{'count': 9040}

---
## Setup AlloyDB for Vector Similarity Search

To store embeddings as vectors and then do indexing and matching the database needs some required extensions.

**Store Embeddings As Vectors**

[Google provides](https://cloud.google.com/alloydb/docs/ai/store-embeddings#required-extension) a version of [`pgvector`](https://github.com/pgvector/pgvector#indexing) named `vector` that includes functions and operators for working with vector values.  This also include the `pgvector` indexing options of IVFFlat and HNSW.

```CREATE EXTENSION IF NOT EXISTS vector```

**Create Indexes For Vectors Using ScaNN**

Indexing of vectors allows for faster approximate search.  The `vector` package above includes the `pgvector` functionality of IVFFLat, and HNSW index types.  The Google developed [ScaNN index](https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md) can be added as an extension named `alloydb_scann`.

```CREATE EXTENSION IF NOT EXISTS alloydb_scann```

In [94]:
run_query(sqlalchemy.text(f"CREATE EXTENSION IF NOT EXISTS vector"))

[]

In [95]:
run_query(sqlalchemy.text(f"CREATE EXTENSION IF NOT EXISTS alloydb_scann"))

[]

### Convert `embedding` Column To Vector Data Type

The data was loaded/inserted above with the embedding stored in a column named 'embedding' as an ARRAY of float values.  This column can now be converted to the vector type with the specific dimension using an `ALTER TABLE` command.

In [96]:
run_query(sqlalchemy.text(f"ALTER TABLE \"{ALLOYDB_TABLE_NAME}\" ALTER COLUMN embedding TYPE vector({len(question_embedding)});"))

[]

In [97]:
run_query(sqlalchemy.text(f"SELECT column_name, data_type FROM information_schema.columns WHERE table_name = '{ALLOYDB_TABLE_NAME}'"))

[{'column_name': 'embedding', 'data_type': 'USER-DEFINED'},
 {'column_name': 'chunk_id', 'data_type': 'character varying'},
 {'column_name': 'gse', 'data_type': 'character varying'},
 {'column_name': 'content', 'data_type': 'text'}]

In [98]:
query = sqlalchemy.text(f"SELECT * FROM \"{ALLOYDB_TABLE_NAME}\" WHERE chunk_id = '{first_record['chunk_id']}'")
result = run_query(query)
result.keys()

dict_keys(['chunk_id', 'gse', 'content', 'embedding'])

In [99]:
type(result['embedding'])

str

In [100]:
result['embedding'][0:100]

'[0.031277116,0.03056905,0.010865348,0.062361468,0.032286815,0.050661553,0.046544693,0.055096656,-0.0'

---
## Vector Similarity Search, Matching

This section covers the operation of using a vector similarity metric calculation to find nearest neighbors for a query vector while also taking advantage of indexing.  To understand similarity metrics and motivate the intution for choosing one (choose dot product), check out [The Math of Similarity](../Embeddings/The%20Math%20of%20Similarity.ipynb).

### Check For Vector Indexes

At this point in the workflow no vector indexes have been created.  The following cells show how to check for indexes and will be reused later in the workflow to verify the details of indexes after they are created.

In [102]:
run_query(sqlalchemy.text(f"""
    SELECT *
    FROM pg_indexes 
    WHERE tablename = '{ALLOYDB_TABLE_NAME}'
"""))

{'schemaname': 'public',
 'tablename': 'retrieval-alloydb',
 'indexname': 'retrieval-alloydb_pkey',
 'tablespace': None,
 'indexdef': 'CREATE UNIQUE INDEX "retrieval-alloydb_pkey" ON public."retrieval-alloydb" USING btree (chunk_id)'}

### Brute Force Search - No Index

Without an index you can still use distance measures to find nearest neighbor matches through brute force search that compare a query embedding to all rows.

Easily run a brute force (compare to all rows) match with a choice of distance measure using the [`pgvector` querying notation](https://github.com/pgvector/pgvector?tab=readme-ov-file#querying):
- `<=>` for Cosine distance
- `<->` for L2, Euclidean distance
- `<#>` for Dot product
    - this is actually the negative of the inner product

Dot product with `<#>`

In [103]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547}]

Euclidean distance with `<->`

In [109]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        embedding <-> '{question_embedding}' AS euclidean_distance
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY euclidean_distance
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'euclidean_distance': 0.7615658337594855},
 {'chunk_id': 'freddie_part_4_c509', 'euclidean_distance': 0.7992875100367289},
 {'chunk_id': 'freddie_part_4_c510', 'euclidean_distance': 0.8057848660615564},
 {'chunk_id': 'fannie_part_0_c353', 'euclidean_distance': 0.8094337265330812},
 {'chunk_id': 'fannie_part_0_c326', 'euclidean_distance': 0.8144253147417732}]

Cosine Similarity with `<=>`

In [113]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        embedding <=> '{question_embedding}' AS cosine_similarity
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY cosine_similarity
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'cosine_similarity': 0.2899983636254655},
 {'chunk_id': 'freddie_part_4_c509', 'cosine_similarity': 0.31944424887732137},
 {'chunk_id': 'freddie_part_4_c510', 'cosine_similarity': 0.3246529458452222},
 {'chunk_id': 'fannie_part_0_c353', 'cosine_similarity': 0.32760391792511945},
 {'chunk_id': 'fannie_part_0_c326', 'cosine_similarity': 0.33164633285935186}]

### Brute Force Search With Pre-Filtering - No Index

Extending a brute force match with pre-filtering means including a `WHERE` statement to first filter to row that meet a desired condition:

Find the top 5 matches where the GSE is 'fannie':

In [111]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': -0.6608578562736511}]

Find the top 5 matches where the GSE is 'freddie':

In [112]:
run_query(sqlalchemy.text(f"""
    SELECT
        chunk_id,
        embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'freddie'
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'freddie_part_4_c472', 'dot_product': -0.661984384059906},
 {'chunk_id': 'freddie_part_6_c439', 'dot_product': -0.6604534983634949},
 {'chunk_id': 'freddie_part_4_c558', 'dot_product': -0.6575403213500977}]

### Create And Use An Index

Indexes make search across many rows more efficient by first matching partions of rows and then only comparing to rows within the partions.  This section covers [creating indexes](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors) and using them in queries.

- IVF: Inverted File Lists, a general approach where the quanitzation can be selected
    - partions rows into list, only searches a subset that are closest to the query vector
    - fast build, low memory, slow query
    - can increase the number of list used in searches at query time for greater recall
    - Reference: [AlloyDB Create An Index with IVF](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=ivf#create-index)
- IVFFlat: Inverted File Lists, specifically with flat quantization
    - partions rows into list, only searches a subset that are closest to the query vector
    - fast build, low memory usage, slower query
    - can increase the number of lists used in search at query time for greater recall
    - Reference: [pgvector Indexing IVFFlat](https://github.com/pgvector/pgvector?tab=readme-ov-file#ivfflat)
    - Reference: [AlloyDB Create An Index With IVFFlat](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=ivfflat#create-index)
- [HNSW](https://arxiv.org/abs/1603.09320): Hierarchical Navigable Small World graphs
    - creates a multilayer graph
    - slower build, more memory, faster query
    - can increase the number of candidates in the search for greater recall
    - Reference: [pgvector Indexing HNSW](https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw)
    - Reference: [AlloyDB Create An Index With HNSW](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=hnsw#create-index)
- ScaNN: [Developed by google](https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md)
    - tree-based quantization index
    - faster build with less memory than HNSW
    - faster query times
    - Reference: [AlloyDB Create An Index With ScaNN](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=scann#create-index)
    
The query optimizer will use indexes to speed up queries.  If multiple indexes are present on the embedding column then the optimizer will select the best for the query.  Some queries may also trigger full row scans, brute force, matching.  The example below use the [PosgreSQL `EXPLAIN ANALYZE`](https://www.postgresql.org/docs/current/sql-explain.html) to understand the impact of index on differen type of matching queries.

#### IVF

Reference: [AlloyDB Create An Index with IVF](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=ivf#create-index)

Create the index:

In [46]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS ivf_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING ivf (embedding vector_ip_ops)
    WITH (lists = 100, quantizer = 'FLAT')
"""))

[]

Review the index details:

In [47]:
run_query(sqlalchemy.text(f"SELECT * FROM pg_indexes  WHERE tablename = '{ALLOYDB_TABLE_NAME}' AND indexname = 'ivf_index'"))

{'schemaname': 'public',
 'tablename': 'retrieval-alloydb',
 'indexname': 'ivf_index',
 'tablespace': None,
 'indexdef': 'CREATE INDEX ivf_index ON public."retrieval-alloydb" USING ivf (embedding vector_ip_ops) WITH (lists=\'100\', quantizer=\'FLAT\')'}

In [48]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_progress_create_index'))

[]

In [49]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_ann_indexes'))

[]

Use distance measure, dot product with `<#>`, with the index:

In [50]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'freddie_part_4_c472', 'dot_product': -0.661984384059906},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_2_c417', 'dot_product': -0.6559132933616638}]

Use `EXPLAIN ANALYZE` to understand the query execution.  Note that the index was used:

In [51]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=115.68..115.88 rows=5 width=27) (actual time=0.245..0.271 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using ivf_index on "retrieval-alloydb"  (cost=115.68..487.18 rows=9040 width=27) (actual time=0.244..0.268 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.088 ms'},
 {'QUERY PLAN': 'Execution Time: 0.298 ms'}]

Use the query option `ivfflat.probes = 10` to specify the number of partitions to scan:

In [52]:
run_query(sqlalchemy.text(f"""
SET LOCAL ivf.probes = 10;
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547}]

Use `EXPLAIN ANALYZE` to understand the impact of the query option and notice the longer exectuion time:

In [53]:
result = run_query(sqlalchemy.text(f"""
SET LOCAL ivf.probes = 10;
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=1156.78..1158.23 rows=5 width=27) (actual time=1.328..1.379 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using ivf_index on "retrieval-alloydb"  (cost=1156.78..3784.42 rows=9040 width=27) (actual time=1.326..1.375 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.100 ms'},
 {'QUERY PLAN': 'Execution Time: 1.407 ms'}]

Add a filter, `gse = 'fannie'`, to the query and note that it still returns the request number of matches.  This is pre-filtering.

In [54]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_2_c417', 'dot_product': -0.6559132933616638},
 {'chunk_id': 'fannie_part_0_c335', 'dot_product': -0.6521918177604675}]

Use `EXPLAIN ANALYZE` to see if the index is still used in the pre-filtering query.  Note that it is!

In [55]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=115.68..116.27 rows=5 width=27) (actual time=0.316..0.344 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using ivf_index on "retrieval-alloydb"  (cost=115.68..472.38 rows=3032 width=27) (actual time=0.315..0.342 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.071 ms'},
 {'QUERY PLAN': 'Execution Time: 0.363 ms'}]

Drop the index:

In [56]:
run_query(sqlalchemy.text('DROP INDEX IF EXISTS ivf_index'))

[]

#### IVFFlat

Reference: 
- [pgvector Indexing IVFFlat](https://github.com/pgvector/pgvector?tab=readme-ov-file#ivfflat)
- [AlloyDB Create An Index With IVFFlat](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=ivfflat#create-index)

Create the index:

In [57]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS ivfflat_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING ivfflat (embedding vector_ip_ops)
    WITH (lists = 100)
"""))

[]

Review the index details:

In [58]:
run_query(sqlalchemy.text(f"SELECT * FROM pg_indexes  WHERE tablename = '{ALLOYDB_TABLE_NAME}' AND indexname = 'ivfflat_index'"))

{'schemaname': 'public',
 'tablename': 'retrieval-alloydb',
 'indexname': 'ivfflat_index',
 'tablespace': None,
 'indexdef': 'CREATE INDEX ivfflat_index ON public."retrieval-alloydb" USING ivfflat (embedding vector_ip_ops) WITH (lists=\'100\')'}

In [59]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_progress_create_index'))

[]

In [60]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_ann_indexes'))

[]

Use distance measure, dot product with `<#>`, with the index:

In [61]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547}]

Use `EXPLAIN ANALYZE` to understand the query execution.  Note that the index was used:

In [62]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=115.68..115.88 rows=5 width=27) (actual time=0.221..0.253 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using ivfflat_index on "retrieval-alloydb"  (cost=115.68..487.18 rows=9040 width=27) (actual time=0.219..0.249 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.087 ms'},
 {'QUERY PLAN': 'Execution Time: 0.278 ms'}]

Use the query option `ivfflat.probes = 10` to specify the number of partitions to scan:

In [63]:
run_query(sqlalchemy.text(f"""
SET LOCAL ivfflat.probes = 10;
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547}]

Use `EXPLAIN ANALYZE` to understand the impact of the query option and notice the longer exectuion time:

In [64]:
result = run_query(sqlalchemy.text(f"""
SET LOCAL ivfflat.probes = 10;
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=1156.78..1158.23 rows=5 width=27) (actual time=0.920..0.943 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using ivfflat_index on "retrieval-alloydb"  (cost=1156.78..3784.42 rows=9040 width=27) (actual time=0.918..0.940 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.084 ms'},
 {'QUERY PLAN': 'Execution Time: 0.964 ms'}]

Add a filter, `gse = 'fannie'`, to the query and note that it still returns the request number of matches.  This is pre-filtering.

In [65]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': -0.6608578562736511}]

Use `EXPLAIN ANALYZE` to see if the index is still used in the pre-filtering query.  Note that it is!

In [66]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=115.68..116.27 rows=5 width=27) (actual time=0.197..0.222 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using ivfflat_index on "retrieval-alloydb"  (cost=115.68..472.38 rows=3032 width=27) (actual time=0.196..0.220 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.080 ms'},
 {'QUERY PLAN': 'Execution Time: 0.241 ms'}]

Drop the index:

In [67]:
run_query(sqlalchemy.text('DROP INDEX IF EXISTS ivfflat_index'))

[]

#### HNSW

Reference:
- [pgvector Indexing HNSW](https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw)
- [AlloyDB Create An Index With HNSW](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=hnsw#create-index)

Create the index:

In [68]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS hnsw_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING hnsw (embedding vector_ip_ops)
    WITH (m = 10, ef_construction = 40)
"""))

[]

Review the index details:

In [69]:
run_query(sqlalchemy.text(f"SELECT * FROM pg_indexes  WHERE tablename = '{ALLOYDB_TABLE_NAME}' AND indexname = 'hnsw_index'"))

{'schemaname': 'public',
 'tablename': 'retrieval-alloydb',
 'indexname': 'hnsw_index',
 'tablespace': None,
 'indexdef': 'CREATE INDEX hnsw_index ON public."retrieval-alloydb" USING hnsw (embedding vector_ip_ops) WITH (m=\'10\', ef_construction=\'40\')'}

In [70]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_progress_create_index'))

[]

In [71]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_ann_indexes'))

[]

Use distance measure, dot product with `<#>`, with the index:

In [72]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547}]

Use `EXPLAIN ANALYZE` to understand the query execution.  Note that the index was used:

In [73]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=100.38..102.98 rows=5 width=27) (actual time=0.597..0.620 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using hnsw_index on "retrieval-alloydb"  (cost=100.38..4809.38 rows=9040 width=27) (actual time=0.595..0.617 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.080 ms'},
 {'QUERY PLAN': 'Execution Time: 0.644 ms'}]

Use the query option `hnsw.ef_search = 80` to specify the number of partitions to scan:

In [80]:
run_query(sqlalchemy.text(f"""
SET LOCAL hnsw.ef_search = 80;
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547}]

Use `EXPLAIN ANALYZE` to understand the impact of the query option and notice the longer exectuion time:

In [79]:
result = run_query(sqlalchemy.text(f"""
SET LOCAL hnsw.ef_search = 80;
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=100.38..102.98 rows=5 width=27) (actual time=1.008..1.032 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using hnsw_index on "retrieval-alloydb"  (cost=100.38..4809.38 rows=9040 width=27) (actual time=1.006..1.029 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.069 ms'},
 {'QUERY PLAN': 'Execution Time: 1.052 ms'}]

Add a filter, `gse = 'fannie'`, to the query and note that it still returns the request number of matches.  This is pre-filtering.

In [118]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'dot_product': -0.7099841833114624},
 {'chunk_id': 'fannie_part_0_c353', 'dot_product': -0.6723706722259521},
 {'chunk_id': 'fannie_part_0_c326', 'dot_product': -0.66834956407547},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': -0.6608578562736511}]

Use `EXPLAIN ANALYZE` to see if the index is still used in the pre-filtering query.  Note that it is!

In [76]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=100.38..108.15 rows=5 width=27) (actual time=0.842..0.872 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using hnsw_index on "retrieval-alloydb"  (cost=100.38..4816.95 rows=3032 width=27) (actual time=0.840..0.869 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.095 ms'},
 {'QUERY PLAN': 'Execution Time: 0.894 ms'}]

Drop the index:

In [120]:
run_query(sqlalchemy.text('DROP INDEX IF EXISTS hnsw_index'))

[]

#### ScaNN

Reference: [AlloyDB Create An Index With ScaNN](https://cloud.google.com/alloydb/docs/ai/store-index-query-vectors?resource=scann#create-index)

Create the index:

In [81]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS scann_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING scann (embedding dot_product)
    WITH (num_leaves = 100)
"""))

[]

Review the index details:

In [82]:
run_query(sqlalchemy.text(f"SELECT * FROM pg_indexes  WHERE tablename = '{ALLOYDB_TABLE_NAME}' AND indexname = 'scann_index'"))

{'schemaname': 'public',
 'tablename': 'retrieval-alloydb',
 'indexname': 'scann_index',
 'tablespace': None,
 'indexdef': 'CREATE INDEX scann_index ON public."retrieval-alloydb" USING scann (embedding dot_product) WITH (num_leaves=\'100\')'}

In [83]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_progress_create_index'))

[]

In [84]:
run_query(sqlalchemy.text('SELECT * FROM pg_stat_ann_indexes'))

{'relid': 118292,
 'indexrelid': 137025,
 'schemaname': 'public',
 'relname': 'retrieval-alloydb',
 'indexrelname': 'scann_index',
 'indextype': 'scann',
 'indexconfig': ['num_leaves=100'],
 'indexsize': '2168 kB',
 'indexscan': 0,
 'insertcount': 9041,
 'deletecount': 1,
 'updatecount': 0,
 'partitioncount': 100,
 'distribution': {'average': 165.9,
  'maximum': 517,
  'minimum': 42,
  'outliers': [517, 513, 436, 404, 351, 334, 333, 330, 310, 309]}}

Use distance measure, dot product with `<#>`, with the index:

In [85]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_2_c417', 'dot_product': -0.6559132933616638},
 {'chunk_id': 'fannie_part_2_c788', 'dot_product': -0.644848644733429},
 {'chunk_id': 'freddie_part_4_c472', 'dot_product': -0.661984384059906},
 {'chunk_id': 'fannie_part_2_c793', 'dot_product': -0.6455321311950684}]

Use `EXPLAIN ANALYZE` to understand the query execution.  Note that the index was used:

In [86]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=2.51..6.52 rows=1 width=27) (actual time=0.258..0.293 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using scann_index on "retrieval-alloydb"  (cost=2.51..6.52 rows=1 width=27) (actual time=0.255..0.290 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.116 ms'},
 {'QUERY PLAN': 'Execution Time: 0.331 ms'}]

Use the query options to specify the number of partitions to scan:

In [87]:
run_query(sqlalchemy.text(f"""
SET LOCAL scann.num_leaves_to_search = 2;
SET LOCAL scann.pre_reordering_num_neighbors=50;
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'freddie_part_4_c509', 'dot_product': -0.680526077747345},
 {'chunk_id': 'freddie_part_4_c510', 'dot_product': -0.6753296852111816},
 {'chunk_id': 'freddie_part_4_c472', 'dot_product': -0.661984384059906},
 {'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': -0.6608578562736511}]

Use `EXPLAIN ANALYZE` to understand the impact of the query options:

In [88]:
result = run_query(sqlalchemy.text(f"""
SET LOCAL scann.num_leaves_to_search = 2;
SET LOCAL scann.pre_reordering_num_neighbors=50;
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=16.36..16.74 rows=5 width=27) (actual time=0.497..0.511 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using scann_index on "retrieval-alloydb"  (cost=16.36..712.77 rows=9040 width=27) (actual time=0.494..0.508 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.075 ms'},
 {'QUERY PLAN': 'Execution Time: 0.540 ms'}]

Add a filter, `gse = 'fannie'`, to the query and note that it still returns the request number of matches.  This is pre-filtering.

In [89]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_2_c417', 'dot_product': -0.6559132933616638},
 {'chunk_id': 'fannie_part_2_c788', 'dot_product': -0.644848644733429},
 {'chunk_id': 'fannie_part_2_c793', 'dot_product': -0.6455321311950684},
 {'chunk_id': 'fannie_part_0_c240', 'dot_product': -0.6608578562736511}]

Use `EXPLAIN ANALYZE` to see if the index is still used in the pre-filtering query.  Note that it is!

In [90]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    WHERE gse = 'fannie'
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=8.18..8.77 rows=5 width=27) (actual time=0.159..0.194 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using scann_index on "retrieval-alloydb"  (cost=8.18..364.88 rows=3032 width=27) (actual time=0.158..0.192 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.095 ms'},
 {'QUERY PLAN': 'Execution Time: 0.217 ms'}]

Drop the index:

In [91]:
run_query(sqlalchemy.text('DROP INDEX IF EXISTS scann_index'))

[]

### Queries With Multiple Indexes

The optimizer will choose wheather to use an index or not and when multiple indexes are present it will select the most appliable to the query.  This example recreats both IVFFlat and HNSW indexes and run various queries to examine the choices the optimizer makes.

Create the indexes:

In [92]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS ivf_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING ivf (embedding vector_ip_ops)
    WITH (lists = 100, quantizer = 'FLAT')
"""))

[]

In [93]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS ivfflat_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING ivfflat (embedding vector_ip_ops)
    WITH (lists = 100)
"""))

[]

In [94]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS hnsw_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING hnsw (embedding vector_ip_ops)
    WITH (m = 10, ef_construction = 40)
"""))

[]

In [95]:
run_query(sqlalchemy.text(f"""
    CREATE INDEX IF NOT EXISTS scann_index
    ON \"{ALLOYDB_TABLE_NAME}\"
    USING scann (embedding dot_product)
    WITH (num_leaves = 100)
"""))

[]

Review the index details:

In [96]:
run_query(sqlalchemy.text(f"SELECT * FROM pg_indexes  WHERE tablename = '{ALLOYDB_TABLE_NAME}' AND indexname LIKE '%_index'"))

[{'schemaname': 'public',
  'tablename': 'retrieval-alloydb',
  'indexname': 'hnsw_index',
  'tablespace': None,
  'indexdef': 'CREATE INDEX hnsw_index ON public."retrieval-alloydb" USING hnsw (embedding vector_ip_ops) WITH (m=\'10\', ef_construction=\'40\')'},
 {'schemaname': 'public',
  'tablename': 'retrieval-alloydb',
  'indexname': 'ivf_index',
  'tablespace': None,
  'indexdef': 'CREATE INDEX ivf_index ON public."retrieval-alloydb" USING ivf (embedding vector_ip_ops) WITH (lists=\'100\', quantizer=\'FLAT\')'},
 {'schemaname': 'public',
  'tablename': 'retrieval-alloydb',
  'indexname': 'ivfflat_index',
  'tablespace': None,
  'indexdef': 'CREATE INDEX ivfflat_index ON public."retrieval-alloydb" USING ivfflat (embedding vector_ip_ops) WITH (lists=\'100\')'},
 {'schemaname': 'public',
  'tablename': 'retrieval-alloydb',
  'indexname': 'scann_index',
  'tablespace': None,
  'indexdef': 'CREATE INDEX scann_index ON public."retrieval-alloydb" USING scann (embedding dot_product) WITH (

Use distance measure, dot product with `<#>`, with the index:

In [98]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c92', 'dot_product': -0.6614338159561157},
 {'chunk_id': 'fannie_part_2_c417', 'dot_product': -0.6559132933616638},
 {'chunk_id': 'fannie_part_2_c788', 'dot_product': -0.644848644733429},
 {'chunk_id': 'freddie_part_4_c472', 'dot_product': -0.661984384059906},
 {'chunk_id': 'fannie_part_2_c793', 'dot_product': -0.6455321311950684}]

Use `EXPLAIN ANALYZE` to understand the query execution.  Note which index was used:

In [99]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <#> '{question_embedding}' AS dot_product
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY dot_product
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=2.51..6.52 rows=1 width=27) (actual time=0.144..0.183 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Index Scan using scann_index on "retrieval-alloydb"  (cost=2.51..6.52 rows=1 width=27) (actual time=0.142..0.180 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.085 ms'},
 {'QUERY PLAN': 'Execution Time: 0.209 ms'}]

#### Conditions Where Indexes Are Ignored - Forced Brute Force

There are conditions where the optimizer might not use an availabe index and instead do a full row scan for matches - brute force.

Using a different distance measure than was specified when building the index.  Here the cosine similarity is requested instead of dot product:

In [101]:
run_query(sqlalchemy.text(f"""
    SELECT chunk_id, embedding <=> '{question_embedding}' AS cosine_similarity
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY cosine_similarity
    LIMIT 5
"""))

[{'chunk_id': 'fannie_part_0_c352', 'cosine_similarity': 0.2899983636254655},
 {'chunk_id': 'freddie_part_4_c509', 'cosine_similarity': 0.31944424887732137},
 {'chunk_id': 'freddie_part_4_c510', 'cosine_similarity': 0.3246529458452222},
 {'chunk_id': 'fannie_part_0_c353', 'cosine_similarity': 0.32760391792511945},
 {'chunk_id': 'fannie_part_0_c326', 'cosine_similarity': 0.33164633285935186}]

Use `EXPLAIN ANALYZE` to understand the query execution.  Note that neither index was used:

In [103]:
result = run_query(sqlalchemy.text(f"""
EXPLAIN ANALYZE
    SELECT chunk_id, embedding <=> '{question_embedding}' AS cosine_similarity
    FROM \"{ALLOYDB_TABLE_NAME}\"
    ORDER BY cosine_similarity
    LIMIT 5
"""))
result[0:2] + result[-2:]

[{'QUERY PLAN': 'Limit  (cost=1149.01..1149.02 rows=1 width=27) (actual time=37.032..37.035 rows=5 loops=1)'},
 {'QUERY PLAN': '  ->  Sort  (cost=1149.01..1149.02 rows=1 width=27) (actual time=37.030..37.032 rows=5 loops=1)'},
 {'QUERY PLAN': 'Planning Time: 0.090 ms'},
 {'QUERY PLAN': 'Execution Time: 37.066 ms'}]

---
## Retrieval Augmented Generation (RAG)

Build a simple retrieval augmented generation process that enhances a query by retrieving context.  This is done here by constructing three functions for the stages:
- `retrieve` - a function that uses an embedding to search for matching context parts, pieces of texts
    - this uses the system built earlier in this workflow!
- `augment` - prepare chunks into a prompt
- `generate` - make the llm request with the augmented prompt

A final function is used to execute the workflow of rag:
- `rag` - a function that receives the query an orchestrates the workflow through `retrieve` > `augment` > `generate`

### Clients

In [115]:
embedder = vertexai.language_models.TextEmbeddingModel.from_pretrained('text-embedding-004')
llm = vertexai.generative_models.GenerativeModel("gemini-1.5-flash-002")

### Retrieve Function

In [116]:
def retrieve_alloydb(query_embedding, n_matches = 5):
    
    matches = run_query(
        sqlalchemy.text(
            f"""
                SELECT chunk_id, content
                FROM \"{ALLOYDB_TABLE_NAME}\"
                ORDER BY embedding <#> '{question_embedding}'
                LIMIT {n_matches}
            """)
    )
    
    return matches

### Augment Function

In [117]:
def augment(matches):

    prompt = ''
    for m, match in enumerate(matches):
        prompt += f"Context {m+1}:\n{match['content']}\n\n"
    prompt += f'Answer the following question using the provided contexts:\n'

    return prompt

### Generate Function

In [118]:
def generate(prompt):

    result = llm.generate_content(prompt)

    return result

### RAG Function

In [119]:
def rag(query):
    
    query_embedding = embedder.get_embeddings([query])[0].values
    matches = retrieve_alloydb(query_embedding)
    prompt = augment(matches) + query
    result = generate(prompt)
    
    return result.text

### Example In Use

In [122]:
question

'Does a lender have to perform servicing functions directly?'

In [121]:
print(rag(question))

No, a lender does not have to perform servicing functions directly.  Several contexts demonstrate this:

* **Context 1** indicates that a lender's prior servicer *may* become Fannie Mae's servicer, implying that the lender isn't obligated to service the loan themselves.  The lender can assign servicing concurrently with the mortgage sale or through post-delivery transfers.

* **Context 2** describes a "Servicing Marketplace" where lenders sell servicing rights to servicers through purchase and sale agreements.  This explicitly shows the separation of lending and servicing functions.

* **Context 4** states that a seller (presumably a lender) that is *not* a servicer must enter into a concurrent transfer of servicing to another entity.

* **Context 3** details a purchase and sale agreement for servicing rights, where a servicer (not necessarily the lender) purchases the rights from the seller.  The agreement even allows for a servicer to use a third party (subservicer) on its behalf.




---
### Profiling Performance

Profile the timing of each step in the RAG function for sequential calls. The environment choosen for this workflow is a minimal testing enviornment so load testing (simoultaneous requests) would not be helpful.

In [146]:
profile = []

In [147]:
async def rag(query, profile = profile):
    
    timings = {}
    start_time = time.time()
    
    
    # 1. Get embeddings
    embedding_start = time.time()
    query_embedding = embedder.get_embeddings([query])[0].values
    timings['embedding'] = time.time() - embedding_start

    # 2. Retrieve from Bigtable
    retrieval_start = time.time()
    matches = retrieve_alloydb(query_embedding)
    timings['retrieve_alloydb'] = time.time() - retrieval_start

    # 3. Augment the prompt
    augment_start = time.time()
    prompt = augment(matches) + query
    timings['augment'] = time.time() - augment_start

    # 4. Generate text
    generate_start = time.time()
    result = generate(prompt)
    timings['generate'] = time.time() - generate_start

    total_time = time.time() - start_time
    timings['total'] = total_time
    
    profile.append(timings)
    
    return result.text

In [148]:
print(await rag(question))

No, the provided text indicates that lenders do not have to perform servicing functions directly.  Context 1 states that a lender's prior servicer may become Fannie Mae's servicer, implying a separation of lending and servicing functions.  Context 2 discusses purchase and sale agreements between lenders and servicers, showing that servicing can be handled by a separate entity.  Context 4 explicitly states that a seller (which can be a lender) that is not also a servicer must transfer servicing to another entity.  Finally, Context 3 describes a scenario where Fannie Mae facilitates the transfer of servicing rights, further highlighting that the lender is not necessarily the servicer.



In [149]:
profile

[{'embedding': 0.08630251884460449,
  'retrieve_alloydb': 0.008429527282714844,
  'augment': 2.002716064453125e-05,
  'generate': 1.1318769454956055,
  'total': 1.2266345024108887}]

In [150]:
for i in range(100):
    response = await rag(question)

### Report From Profile

In [151]:
all_timings = {}
for timings in profile:
    for key, value in timings.items():
        if key not in all_timings:
            all_timings[key] = []
        all_timings[key].append(value)

In [152]:
for key, values in all_timings.items():
    arr = np.array(values)
    print(f"Statistics for '{key}':")
    print(f"  Min: {np.min(arr):.4f} seconds")
    print(f"  Max: {np.max(arr):.4f} seconds")
    print(f"  Mean: {np.mean(arr):.4f} seconds")
    print(f"  Median: {np.median(arr):.4f} seconds")
    print(f"  Std Dev: {np.std(arr):.4f} seconds")
    print(f"  P95: {np.percentile(arr, 95):.4f} seconds")
    print(f"  P99: {np.percentile(arr, 99):.4f} seconds")
    print("")

Statistics for 'embedding':
  Min: 0.0472 seconds
  Max: 0.1105 seconds
  Mean: 0.0603 seconds
  Median: 0.0541 seconds
  Std Dev: 0.0133 seconds
  P95: 0.0898 seconds
  P99: 0.0997 seconds

Statistics for 'retrieve_alloydb':
  Min: 0.0053 seconds
  Max: 0.0164 seconds
  Mean: 0.0062 seconds
  Median: 0.0058 seconds
  Std Dev: 0.0015 seconds
  P95: 0.0070 seconds
  P99: 0.0145 seconds

Statistics for 'augment':
  Min: 0.0000 seconds
  Max: 0.0001 seconds
  Mean: 0.0000 seconds
  Median: 0.0000 seconds
  Std Dev: 0.0000 seconds
  P95: 0.0000 seconds
  P99: 0.0000 seconds

Statistics for 'generate':
  Min: 0.7651 seconds
  Max: 2.0540 seconds
  Mean: 1.2197 seconds
  Median: 1.1426 seconds
  Std Dev: 0.2604 seconds
  P95: 1.8170 seconds
  P99: 1.9223 seconds

Statistics for 'total':
  Min: 0.8568 seconds
  Max: 2.1156 seconds
  Mean: 1.2862 seconds
  Median: 1.2241 seconds
  Std Dev: 0.2596 seconds
  P95: 1.8792 seconds
  P99: 2.0201 seconds



## Remove Resources

In [44]:
# can't drop the database of an active connection, switch connection to postgres (default) database

#sync_pool.dispose()
#sync_connector.close()
#sync_connector = google.cloud.alloydb.connector.Connector()
#sync_pool = get_sync_pool(sync_connector, 'postgres')

#await async_pool.dispose()
#await async_connector.close()
#async_connector = google.cloud.alloydb.connector.AsyncConnector()
#async_pool = await get_async_pool(async_connector, 'postgres')

#query = sqlalchemy.text(f"DROP DATABASE IF EXISTS \"{ALLOYDB_DATABASE_NAME}\"")
#run_query(query)

In [56]:
#delete_instance = alloydb_client.delete_instance(name = alloydb_instance.name)
#delete_instance.result()

<google.api_core.operation.Operation at 0x7f918e7714e0>

In [60]:
#delete_cluster = alloydb_client.delete_cluster(request = dict(name = alloydb_cluster.name, force = True))
#delete_cluster.result()

<google.api_core.operation.Operation at 0x7f918e6dd0f0>