# Implementing Recommendation Engines with Matching Engine

### VPC Network peering
Matching engine is a high performance vector matching service that requires a seperate VPC to ensure performance. 

Below are the one-time instructions to set up a peering network. 

**Once created, be sure to your notebook instance running this particular notebook is in the subnetwork... https://cloud.google.com/vertex-ai/docs/matching-engine/match-eng-setup**

Steps in this notebook:
1. Build and deploy a brute force and ANN index
2. Test the recall accuracy between BF and ANN
Note BF will always be 100% recall but at cost of speed and computational complexity
Here's a good benchmark of Matching Engine (ScaNN is the algorithm)

![](https://1.bp.blogspot.com/--mbMV8fQY28/XxsvbGL_l-I/AAAAAAAAGQ0/Br9B3XGnBa07barUxC4XTi8hSDxYzwAEgCLcBGAsYHQ/s640/image5.png)

## Load env config

In [1]:
CREATE_NEW_ASSETS = 'True' # 'True' | 'False'

In [2]:
# naming convention for all cloud resources
VERSION        = "v1"                  # TODO
PREFIX         = f'ndr-{VERSION}'      # TODO

print(f"PREFIX = {PREFIX}")

PREFIX = ndr-v1


In [3]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "ndr-v1"
VERSION                  = "v1"

APP                      = "sp"
MODEL_TYPE               = "2tower"
FRAMEWORK                = "tfrs"
DATA_VERSION             = "v1"
TRACK_HISTORY            = "5"

BUCKET_NAME              = "ndr-v1-hybrid-vertex-bucket"
BUCKET_URI               = "gs://ndr-v1-hybrid-vertex-bucket"
SOURCE_BUCKET            = "spotify-million-playlist-dataset"

DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://ndr-v1-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"

CANDIDATE_PREFIX         = "candidates"
TRAIN_DIR_PREFIX      

In [4]:
# local-train-v1/run-20230919-150451/candidates/candidate_embeddings.json

EXPERIMENT_NAME   = "local-train-v1"      # TODO
RUN_NAME          = "run-20230919-150451" # TODO

RUN_DIR_PATH = f'{EXPERIMENT_NAME}/{RUN_NAME}'

print(f"EXPERIMENT_NAME : {EXPERIMENT_NAME}")
print(f"RUN_NAME        : {RUN_NAME}")
print(f"RUN_DIR_PATH    : {RUN_DIR_PATH}")

EXPERIMENT_NAME : local-train-v1
RUN_NAME        : run-20230919-150451
RUN_DIR_PATH    : local-train-v1/run-20230919-150451


In [6]:
# GCP_PROJECTS = !gcloud config get-value project
# PROJECT_ID = GCP_PROJECTS[0]

# PROJECT_NUM = !gcloud projects list --filter="$PROJECT_ID" --format="value(PROJECT_NUMBER)"
# PROJECT_NUM = PROJECT_NUM[0]

# LOCATION = 'us-central1'

# # gs://jt-tfrs-central/new-50e-full-jtv12/run-20230110-150417/candidates-index-local
# RUN_DIR_GCS_PATH = f'{BUCKET_URI}/{RUN_DIR_PATH}'

# # VERSION = '50e_65m_demo'

# print(f"PROJECT_ID: {PROJECT_ID}")
# print(f"PROJECT_NUM: {PROJECT_NUM}")
# print(f"LOCATION: {LOCATION}")
# print(f"BUCKET_URI: {BUCKET_URI}")
# print(f"RUN_DIR_GCS_PATH: {RUN_DIR_GCS_PATH}")

In [35]:
import os
import sys
from google.cloud import aiplatform as vertex_ai

from src.two_tower_jt import test_instances_v2 as test_instances

vertex_ai.init(project=PROJECT_ID, location=LOCATION)

### Create a matching engine index

The matching engine loads an index from a file of embeddings created from the last notebook. 

Many of the optimization options for matching engine are found in the ah tree settings and testing is recommended depending on each use case

Recall we saved our two tower models and query embeddings (newline json) in a candidate folder

In [8]:
# EMBEDDINGS_INITIAL_URI = f'{BUCKET_URI}/{RUN_DIR_PATH}/candidates-index-local/'
EMBEDDINGS_INITIAL_URI = f'{BUCKET_URI}/{RUN_DIR_PATH}/candidates/' # tmp - TODO

print(f"EMBEDDINGS_INITIAL_URI: {EMBEDDINGS_INITIAL_URI}")

EMBEDDINGS_INITIAL_URI: gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/candidates/


`EMBEDDINGS_INITIAL_URI` should lead to a folder with just the candidate json file...

In [9]:
! gsutil ls $EMBEDDINGS_INITIAL_URI

gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/candidates/candidate_embeddings.json


### Create ANN index

In [10]:
# ANN index config
APPROX_NEIGHBORS           = 50
DISTANCE_MEASURE           = "DOT_PRODUCT_DISTANCE"
LEAF_NODE_EMB_COUNT        = 500
LEAF_NODES_SEARCH_PERCENT  = 7
DIMENSIONS                 = 128 # must match output dimensions

DISPLAY_NAME               = f"tfrs_{DIMENSIONS}dim_{VERSION}"
BF_DISPLAY_NAME            = f"{DISPLAY_NAME}_bf"

# labels
DATA_REGIME                = 'full-65m'

# =============================== #
# included in env-setup.ipynb     #
# =============================== #
# 'new-50e-full-jtv12/run-20230110-150417'
# EXPERIMENT_NAME            = 'new-50e-full-jtv12'
# EXPERIMENT_RUN             = 'run-20230110-150417'

print(f"DISPLAY_NAME    : {DISPLAY_NAME}")
print(f"BF_DISPLAY_NAME : {BF_DISPLAY_NAME}")

DISPLAY_NAME    : tfrs_128dim_v1
BF_DISPLAY_NAME : tfrs_128dim_v1_bf


> *Note: setting `sync=False` will allow us to proceed with the notebook while these operations complete*

In [12]:
if CREATE_NEW_ASSETS == 'True':
    
    tree_ah_index = vertex_ai.MatchingEngineIndex.create_tree_ah_index(
        display_name=DISPLAY_NAME,
        contents_delta_uri=EMBEDDINGS_INITIAL_URI,
        dimensions=DIMENSIONS,
        approximate_neighbors_count=APPROX_NEIGHBORS,
        distance_measure_type=DISTANCE_MEASURE,
        leaf_node_embedding_count=LEAF_NODE_EMB_COUNT,
        leaf_nodes_to_search_percent=LEAF_NODES_SEARCH_PERCENT,
        description="Songs embeddings from the Spotify million playlist dataset",
        sync=False,
        labels={
            "experiment_name": f'{EXPERIMENT_NAME}',
            "experiment_run": f'{RUN_NAME}',
            "data_regime": f'{DATA_REGIME}',
        },
    )

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/934903580331/locations/us-central1/indexes/5708585206575792128/operations/7455293224119173120


### Create Brute Force index

used to evaluate ANN retrieval

In [13]:
if CREATE_NEW_ASSETS == 'True':
    
    brute_force_index = vertex_ai.MatchingEngineIndex.create_brute_force_index(
        display_name=BF_DISPLAY_NAME,
        contents_delta_uri=EMBEDDINGS_INITIAL_URI,
        dimensions=DIMENSIONS,
        distance_measure_type=DISTANCE_MEASURE,
        sync=False,
        labels={
            "experiment_name": f'{EXPERIMENT_NAME}',
            "experiment_run": f'{RUN_NAME}',
            "data_regime": f'{DATA_REGIME}',
        },
    )

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/934903580331/locations/us-central1/indexes/3648188377053790208/operations/7858365390768832512
MatchingEngineIndex created. Resource name: projects/934903580331/locations/us-central1/indexes/3648188377053790208
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/934903580331/locations/us-central1/indexes/3648188377053790208')
MatchingEngineIndex created. Resource name: projects/934903580331/locations/us-central1/indexes/5708585206575792128
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/934903580331/locations/us-central1/indexes/5708585206575792128')


### Create Matching Engine endpoint(s)

* both the ANN and brute force indices can be deployed to a single endpoint
* alternatively, we can create seperate endpoints, one for each index

#### index endpoint config: 

In [14]:
ANN_ENDPOINT_DISPLAY_NAME = f'{DISPLAY_NAME}_endpoint'
BF_ENDPOINT_DISPLAY_NAME  = f'{BF_DISPLAY_NAME}_endpoint'

# =============================== #
# included in env-setup.ipynb     #
# =============================== #
# VPC_NETWORK = "ucaip-haystack-vpc-network" # TODO: update this
# VPC_NETWORK_FULL = f"projects/{PROJECT_NUM}/global/networks/{VPC_NETWORK}"

print(f"VPC_NETWORK_FULL: {VPC_NETWORK_FULL}")
print(f"ANN_ENDPOINT_DISPLAY_NAME: {ANN_ENDPOINT_DISPLAY_NAME}")
print(f"BF_ENDPOINT_DISPLAY_NAME: {BF_ENDPOINT_DISPLAY_NAME}")

VPC_NETWORK_FULL: projects/934903580331/global/networks/ucaip-haystack-vpc-network
ANN_ENDPOINT_DISPLAY_NAME: tfrs_128dim_v1_endpoint
BF_ENDPOINT_DISPLAY_NAME: tfrs_128dim_v1_bf_endpoint


Then create the indices

In [15]:
# create new endpoint
if CREATE_NEW_ASSETS == 'True':
    
    my_ann_index_endpoint = vertex_ai.MatchingEngineIndexEndpoint.create(
        display_name=f'{ANN_ENDPOINT_DISPLAY_NAME}',
        description="index endpoint for ANN index",
        network=VPC_NETWORK_FULL,
        sync=False,
    )

# # to use existing
# my_ann_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(
#     'projects/934903580331/locations/us-central1/indexEndpoints/8097410557360996352'
# )

Creating MatchingEngineIndexEndpoint
Create MatchingEngineIndexEndpoint backing LRO: projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080/operations/4489391394518990848


In [16]:
# create new endpoint
if CREATE_NEW_ASSETS == 'True':
    
    my_bf_index_endpoint = vertex_ai.MatchingEngineIndexEndpoint.create(
        display_name=f'{BF_ENDPOINT_DISPLAY_NAME}',
        description="index endpoint for ANN index",
        network=VPC_NETWORK_FULL,
        sync=False,
    )

# # to use existing
# my_bf_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(
#     'projects/934903580331/locations/us-central1/indexEndpoints/1972515064137121792'
# )

Creating MatchingEngineIndexEndpoint
Create MatchingEngineIndexEndpoint backing LRO: projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088/operations/1424691848093368320
MatchingEngineIndexEndpoint created. Resource name: projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080
To use this MatchingEngineIndexEndpoint in another session:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080')
MatchingEngineIndexEndpoint created. Resource name: projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088
To use this MatchingEngineIndexEndpoint in another session:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088')


In [17]:
ANN_INDEX_ENDPOINT_NAME = my_ann_index_endpoint.resource_name
BF_INDEX_ENDPOINT_NAME = my_bf_index_endpoint.resource_name

print(f"ANN_INDEX_ENDPOINT_NAME: {ANN_INDEX_ENDPOINT_NAME}")
print(f"BF_INDEX_ENDPOINT_NAME: {BF_INDEX_ENDPOINT_NAME}")

ANN_INDEX_ENDPOINT_NAME: projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080
BF_INDEX_ENDPOINT_NAME: projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088


## Deploy Indexes to endpoints

> *Note: wait for indexes to be created (~40 mins) before deploying to endpoint*

In [18]:
# !gcloud ai indexes list \
#   --project=$PROJECT_ID \
#   --region=$LOCATION

In [19]:
# get index resource names
if CREATE_NEW_ASSETS == 'True':
    
    tree_ah_resource_name = tree_ah_index.resource_name
    brute_force_index_resource_name = brute_force_index.resource_name

# use existing
# tree_ah_resource_name = f'projects/{PROJECT_NUM}/locations/{REGION}/indexes/8930963516517515264'           # 65m: 8930963516517515264 | 8m: 2204555998062968832
# brute_force_index_resource_name = f'projects/{PROJECT_NUM}/locations/{REGION}/indexes/8006881167976431616' # 65m: 8006881167976431616 | 8m: 4780614984918892544

tree_ah_index = vertex_ai.MatchingEngineIndex(index_name=tree_ah_resource_name)
brute_force_index = vertex_ai.MatchingEngineIndex(index_name=brute_force_index_resource_name)

In [20]:
tree_ah_index.display_name

'tfrs_128dim_v1'

In [21]:
ANN_INDEX_NAME = tree_ah_index.resource_name
BF_INDEX_NAME = brute_force_index.resource_name

print(f"ANN_INDEX_NAME: {ANN_INDEX_NAME}")
print(f"BF_INDEX_NAME: {BF_INDEX_NAME}")

DEPLOYED_ANN_INDEX_ID = f"deployed_{DISPLAY_NAME}"
DEPLOYED_BF_INDEX_ID = f"deployed_{BF_DISPLAY_NAME}"

print(f"DEPLOYED_ANN_INDEX_ID: {DEPLOYED_ANN_INDEX_ID}")
print(f"DEPLOYED_BF_INDEX_ID: {DEPLOYED_BF_INDEX_ID}")

ANN_INDEX_NAME: projects/934903580331/locations/us-central1/indexes/5708585206575792128
BF_INDEX_NAME: projects/934903580331/locations/us-central1/indexes/3648188377053790208
DEPLOYED_ANN_INDEX_ID: deployed_tfrs_128dim_v1
DEPLOYED_BF_INDEX_ID: deployed_tfrs_128dim_v1_bf


#### Deploy ANN index

In [22]:
# create new deployed index
if CREATE_NEW_ASSETS == 'True':
    deployed_ann_index = my_ann_index_endpoint.deploy_index(
        index=tree_ah_index, 
        deployed_index_id=DEPLOYED_ANN_INDEX_ID
    )

# use existing
# EXISTING_ANN_INDEX_ENDPOINT='projects/934903580331/locations/us-central1/indexEndpoints/8292015319384326144'
# deployed_ann_index = vertex_ai.MatchingEngineIndexEndpoint(index_endpoint_name=EXISTING_ANN_INDEX_ENDPOINT)

deployed_ann_index.deployed_indexes

Deploying index MatchingEngineIndexEndpoint index_endpoint: projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080
Deploy index MatchingEngineIndexEndpoint index_endpoint backing LRO: projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080/operations/7711716927902580736
MatchingEngineIndexEndpoint index_endpoint Deployed index. Resource name: projects/934903580331/locations/us-central1/indexEndpoints/3370091100063662080


[id: "deployed_tfrs_128dim_v1"
index: "projects/934903580331/locations/us-central1/indexes/5708585206575792128"
create_time {
  seconds: 1695142716
  nanos: 365064000
}
private_endpoints {
  match_grpc_address: "10.41.2.5"
}
index_sync_time {
  seconds: 1695142930
  nanos: 704409000
}
automatic_resources {
  min_replica_count: 2
  max_replica_count: 2
}
deployment_group: "default"
]

#### Deploy Brute Force index

In [23]:
# # create new deployed index
if CREATE_NEW_ASSETS == 'True':
    
    deployed_bf_index = my_bf_index_endpoint.deploy_index(
        index=brute_force_index, 
        deployed_index_id=DEPLOYED_BF_INDEX_ID
    )

# use existing
# EXISTING_BF_INDEX_ENDPOINT='projects/934903580331/locations/us-central1/indexEndpoints/553142309701550080'
# deployed_bf_index = vertex_ai.MatchingEngineIndexEndpoint(index_endpoint_name=EXISTING_BF_INDEX_ENDPOINT)


deployed_bf_index.deployed_indexes

Deploying index MatchingEngineIndexEndpoint index_endpoint: projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088
Deploy index MatchingEngineIndexEndpoint index_endpoint backing LRO: projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088/operations/3543635472771186688
MatchingEngineIndexEndpoint index_endpoint Deployed index. Resource name: projects/934903580331/locations/us-central1/indexEndpoints/4346246319296217088


[id: "deployed_tfrs_128dim_v1_bf"
index: "projects/934903580331/locations/us-central1/indexes/3648188377053790208"
create_time {
  seconds: 1695143297
  nanos: 696381000
}
private_endpoints {
  match_grpc_address: "10.41.2.5"
}
index_sync_time {
  seconds: 1695143511
  nanos: 700878000
}
automatic_resources {
  min_replica_count: 2
  max_replica_count: 2
}
deployment_group: "default"
]

# Query Model

### Upload Query Model to Vertex Model Registry

In [24]:
QUERY_MODEL_DIR = f"{BUCKET_URI}/{RUN_DIR_PATH}/model-dir/query_model"

print(f"QUERY_MODEL_DIR: {QUERY_MODEL_DIR}")

QUERY_MODEL_DIR: gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/model-dir/query_model


In [25]:
! gsutil ls $QUERY_MODEL_DIR

gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/model-dir/query_model/
gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/model-dir/query_model/fingerprint.pb
gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/model-dir/query_model/saved_model.pb
gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/model-dir/query_model/assets/
gs://ndr-v1-hybrid-vertex-bucket/local-train-v1/run-20230919-150451/model-dir/query_model/variables/


In [26]:
# tf2-gpu.2-12 | tf2-gpu.2-11

# SERVING_IMAGE_URI_CPU = 'us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest'
# SERVING_IMAGE_URI_GPU = 'us-docker.pkg.dev/vertex-ai/prediction/tf2-gpu.2-11:latest'

In [27]:
uploaded_query_model = vertex_ai.Model.upload(
    display_name=f'query_model_{DISPLAY_NAME}',
    artifact_uri=QUERY_MODEL_DIR,
    serving_container_image_uri=SERVING_IMAGE_URI_CPU,
    description="Top of the query tower, meant to return an embedding for each playlist instance",
    sync=True,
)

Creating Model
Create Model backing LRO: projects/934903580331/locations/us-central1/models/2404541769992634368/operations/7833314117841584128
Model created. Resource name: projects/934903580331/locations/us-central1/models/2404541769992634368@1
To use this Model in another session:
model = aiplatform.Model('projects/934903580331/locations/us-central1/models/2404541769992634368@1')


#### Create model endpoint

In [28]:
endpoint = vertex_ai.Endpoint.create(
    display_name=f'endpoint_{DISPLAY_NAME}',
    project=PROJECT_ID,
    location=LOCATION,
    sync=True,
)

Creating Endpoint
Create Endpoint backing LRO: projects/934903580331/locations/us-central1/endpoints/7270536031831588864/operations/6810434052475060224
Endpoint created. Resource name: projects/934903580331/locations/us-central1/endpoints/7270536031831588864
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/934903580331/locations/us-central1/endpoints/7270536031831588864')


#### Deploy uploaded model to model endpoint

In [29]:
deployed_query_model = uploaded_query_model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=f'deployed_qmodel_{DISPLAY_NAME}',
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=2,
    accelerator_type=None,
    accelerator_count=0,
    sync=True,
)

Deploying model to Endpoint : projects/934903580331/locations/us-central1/endpoints/7270536031831588864
Deploy Endpoint model backing LRO: projects/934903580331/locations/us-central1/endpoints/7270536031831588864/operations/8936133076593934336
Endpoint model deployed. Resource name: projects/934903580331/locations/us-central1/endpoints/7270536031831588864


# Retrieve nearest neighbors from index

* use query_model to convert test instance to embeddings
* use embeddings to search for NN in ANN index

If needing to re initialize the deployed model or indexes, replace the IDs in the resource names below:

In [53]:
# deployed_query_model = vertex_ai.Endpoint('projects/934903580331/locations/us-central1/endpoints/4002948002778972160')
# deployed_ann_index = vertex_ai.MatchingEngineIndexEndpoint('projects/934903580331/locations/us-central1/indexEndpoints/8292015319384326144')
# deployed_bf_index = vertex_ai.MatchingEngineIndexEndpoint('projects/934903580331/locations/us-central1/indexEndpoints/553142309701550080')

# DEPLOYED_ANN_ID = deployed_ann_index.deployed_indexes[0].id
# DEPLOYED_BF_ID = deployed_bf_index.deployed_indexes[0].id

# ANN_response = deployed_ann_index.match(
#     deployed_index_id=DEPLOYED_ANN_ID,
#     queries=playlist_emb.predictions,
#     num_neighbors=10
# )

# BF_response = deployed_bf_index.match(
#     deployed_index_id=DEPLOYED_BF_ID,
#     queries=playlist_emb.predictions,
#     num_neighbors=10
# )

## Create Test Instance(s)

* We can create a quick example from the train or valid dataset by returning a structured example like:

```
for tensor_dict in train_dataset.unbatch().skip(12905).take(1):
    td_keys = tensor_dict.keys()
    list_dict = {}
    for k in td_keys:
        list_dict.update({k: tensor_dict[k].numpy()})
    print(list_dict)
```

In [36]:
# # MAX_PLAYLIST_LENGTH --> 5
TEST_INSTANCE_5 = test_instances.TEST_INSTANCE_5
TEST_INSTANCE_5

# # MAX_PLAYLIST_LENGTH --> 15
# TEST_INSTANCE_15 = test_instances.TEST_INSTANCE_15
# TEST_INSTANCE_15

{'album_name_can': 'Capoeira Electronica',
 'album_name_pl': ['Odilara',
  'Capoeira Electronica',
  'Capoeira Ultimate',
  'Festa Popular',
  'Capoeira Electronica'],
 'album_uri_can': 'spotify:album:2FsSSHGt8JM0JgRy6ZX3kR',
 'album_uri_pl': ['spotify:album:4Y8RfvZzCiApBCIZswj9Ry',
  'spotify:album:2FsSSHGt8JM0JgRy6ZX3kR',
  'spotify:album:55HHBqZ2SefPeaENOgWxYK',
  'spotify:album:150L1V6UUT7fGUI3PbxpkE',
  'spotify:album:2FsSSHGt8JM0JgRy6ZX3kR'],
 'artist_followers_can': 5170.0,
 'artist_genres_can': 'capoeira',
 'artist_genres_pl': ['samba moderno',
  'capoeira',
  'capoeira',
  'NONE',
  'capoeira'],
 'artist_name_can': 'Capoeira Experience',
 'artist_name_pl': ['Odilara',
  'Capoeira Experience',
  'Denis Porto',
  'Zambe',
  'Capoeira Experience'],
 'artist_pop_can': 24.0,
 'artist_pop_pl': [4.0, 24.0, 2.0, 0.0, 24.0],
 'artist_uri_can': 'spotify:artist:5SKEXbgzIdRl3gQJ23CnUP',
 'artist_uri_pl': ['spotify:artist:72oameojLOPWYB7nB8rl6c',
  'spotify:artist:5SKEXbgzIdRl3gQJ23CnUP',


#### If needing to create model and index objects in session...

In [37]:
playlist_emb = deployed_query_model.predict([TEST_INSTANCE_5])
playlist_emb

Prediction(predictions=[[-0.319549441, -0.736324608, -0.999139726, 0.936678231, 0.140084431, -0.413407475, 0.878894806, -0.970329106, -0.659263194, -1.60079765, 1.11298513, 0.534306586, 1.12248015, -0.79353, -0.221303761, 0.214388043, 0.851346672, -1.70991278, -0.428875, -0.796241462, -0.130663425, 0.679144442, 1.57031, -1.71820736, -0.0138283893, -0.696535349, 0.518329501, -1.51568925, -0.54820931, 0.00688769668, -1.16123128, 1.08391941, -0.113285184, 0.457706213, 0.0641227961, 0.416062444, -1.27625763, 0.214524657, -1.79184937, 0.368900865, -0.097617425, -2.05919147, -0.195343286, 0.136424914, -1.31718016, -0.237893417, 1.59560561, -0.966435671, 1.97090781, 0.787532568, -0.221562356, -0.302150905, 1.45196605, 0.0823364705, -1.3538276, 1.40799367, -1.17275703, 2.04082108, -0.43333602, 0.913677335, 0.126593262, -0.656877041, 0.239591926, 0.283293277, 0.875116467, -0.861238241, 0.537754834, 0.748203337, 0.236702815, -0.605949759, -0.857457638, -1.20023417, -1.0099895, -0.0130776241, -0.

In [44]:
import time

DEPLOYED_ANN_INDEX_ID = deployed_ann_index.deployed_indexes[0].id
DEPLOYED_BF_INDEX_ID = deployed_bf_index.deployed_indexes[0].id

print(f"DEPLOYED_ANN_INDEX_ID: {DEPLOYED_ANN_INDEX_ID}")
print(f"DEPLOYED_BF_INDEX_ID: {DEPLOYED_BF_INDEX_ID}")

DEPLOYED_ANN_INDEX_ID: deployed_tfrs_128dim_v1
DEPLOYED_BF_INDEX_ID: deployed_tfrs_128dim_v1_bf


In [52]:
# %%timeit 
start_time = time.time()

ANN_response = deployed_ann_index.match(
    deployed_index_id=DEPLOYED_ANN_INDEX_ID,
    queries=playlist_emb.predictions,
    num_neighbors=10
)

end_time = time.time()
elapsed_time = (end_time - start_time) / 60
print(f"elapsed_time: {elapsed_time}")

elapsed_time: 0.0001725157101949056


In [53]:
ANN_response

[[MatchNeighbor(id="b'spotify:track:7opy2GAu6ni8snJvUUgj4M'", distance=89.04745483398438),
  MatchNeighbor(id="b'spotify:track:1evFM5gO5L0aOw2l2wL88D'", distance=89.03700256347656),
  MatchNeighbor(id="b'spotify:track:1FvVInbjQUHTnLo3wcEwEh'", distance=89.02728271484375),
  MatchNeighbor(id="b'spotify:track:25v6UWpnLnEH5hDdqrbhWE'", distance=89.02726745605469),
  MatchNeighbor(id="b'spotify:track:3AEt6lrKnKfc8H1Xq1mN1r'", distance=89.0172119140625),
  MatchNeighbor(id="b'spotify:track:5IEZ99PDHnKaAIeJ6EhS2J'", distance=89.01597595214844),
  MatchNeighbor(id="b'spotify:track:0IK0ej4joW9OORaJ8GasBn'", distance=89.01570129394531),
  MatchNeighbor(id="b'spotify:track:1SejCUMSNYxnVC0vuRKzzy'", distance=88.99887084960938),
  MatchNeighbor(id="b'spotify:track:4mhSTVUnzCapQ02xwPXJrg'", distance=88.99571228027344),
  MatchNeighbor(id="b'spotify:track:3z9Qm5gfdCr4wklgMxAzFQ'", distance=88.99554443359375)]]

In [54]:
# %%timeit 
start_time = time.time()

BF_response = deployed_bf_index.match(
    deployed_index_id=DEPLOYED_BF_INDEX_ID,
    queries=playlist_emb.predictions,
    num_neighbors=10
)

end_time = time.time()
elapsed_time = (end_time - start_time) / 60
print(f"elapsed_time: {elapsed_time}")

elapsed_time: 0.0020070672035217285


In [55]:
BF_response

[[MatchNeighbor(id="b'spotify:track:3ePcO8zbu6IznLKD5XBfV6'", distance=89.0927963256836),
  MatchNeighbor(id="b'spotify:track:6aYHi4SYZkn91EwrWxgGh3'", distance=89.0899887084961),
  MatchNeighbor(id="b'spotify:track:3aNwMNEmpns88B1wjpBGRl'", distance=89.0806655883789),
  MatchNeighbor(id="b'spotify:track:258m3BhXGYBk64QgYT1f6W'", distance=89.06431579589844),
  MatchNeighbor(id="b'spotify:track:0trwqplZTfQvrHM1FnVL8B'", distance=89.06390380859375),
  MatchNeighbor(id="b'spotify:track:11A88zUaDpXiput0IJ8C0P'", distance=89.06157684326172),
  MatchNeighbor(id="b'spotify:track:1d5DGnrbtI0ZPhQtM9DQFw'", distance=89.06036376953125),
  MatchNeighbor(id="b'spotify:track:7EWX3ycwpuswKaivGITwIX'", distance=89.06028747558594),
  MatchNeighbor(id="b'spotify:track:4ohNTFZHEMfcE3DiN3Rh7Y'", distance=89.05621337890625),
  MatchNeighbor(id="b'spotify:track:6TukFfoYQmL0LIAP1SgjTX'", distance=89.0560073852539)]]

## Compute Recall

In [56]:
# Calculate recall by determining how many neighbors were correctly retrieved as compared to the brute-force option.
recalled_neighbors = 0
for tree_ah_neighbors, brute_force_neighbors in zip(
    ANN_response, BF_response
):
    tree_ah_neighbor_ids = [neighbor.id for neighbor in tree_ah_neighbors]
    brute_force_neighbor_ids = [neighbor.id for neighbor in brute_force_neighbors]

    recalled_neighbors += len(
        set(tree_ah_neighbor_ids).intersection(brute_force_neighbor_ids)
    )

recall = recalled_neighbors / len(
    [neighbor for neighbors in BF_response for neighbor in neighbors]
)

print("Recall: {}".format(recall))

Recall: 0.0
