# Implementing Recommendation Engines with Matching Engine

![](img/arch.png)

#### VPC Network peering
Matching engine is a high performance vector matching service that requires a seperate VPC to ensure performance. 

Below are the one-time instructions to set up a peering network. 

# **Once created, be sure to your notebook instance running this particular notebook is in the subnetwork... https://cloud.google.com/vertex-ai/docs/matching-engine/match-eng-setup **

![](img/subnetwork2.png) 
Then select the network from advanced options
![](img/subnetwork.png)

In [1]:
# PROJECT_ID = "hybrid-vertex" 
# NETWORK_NAME = "ucaip-haystack-vpc-network"  
# PEERING_RANGE_NAME = "ucaip-haystack-range"

# # Create a VPC network
# ! gcloud compute networks create {NETWORK_NAME} --bgp-routing-mode=regional --subnet-mode=auto --project={PROJECT_ID}

# # Add necessary firewall rules
# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-icmp --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow icmp

# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-internal --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow all --source-ranges 10.128.0.0/9

# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-rdp --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow tcp:3389

# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-ssh --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow tcp:22

# # Reserve IP range
# ! gcloud compute addresses create {PEERING_RANGE_NAME} --global --prefix-length=16 --network={NETWORK_NAME} --purpose=VPC_PEERING --project={PROJECT_ID} --description="peering range for uCAIP Haystack."

# Set up peering with service networking
# ! gcloud services vpc-peerings connect --service=servicenetworking.googleapis.com --network=${NETWORK_NAME} --ranges=${PEERING_RANGE_NAME} --project=${PROJECT_ID}

### Deploy Playlist Query Model to a Vertex endpoint
This will be the endpoint that a user will query with thier last n songs played 

In [2]:
PROJECT_ID = 'hybrid-vertex'  # <--- TODO: CHANGE THIS
LOCATION = 'us-central1' 
DIMENSIONS = 128 # must match output dimensions - embedding dim in two tower code
DISPLAY_NAME = "spotify_song_candidates_v1"
MODEL_PATH = 'gs://two-tower-models' #TODO change to your model directory

import os
import sys
from google.cloud import aiplatform_v1beta1 #needed for matching engine calls


from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

EMBEDDINGS_INITIAL_URI = f'{MODEL_PATH}/candidates'

### Create a matching engine index

The matching engine loads an index from a file of embeddings created from the last notebook. 

Many of the optimization options for matching engine are found in the ah tree settings and testing is recommended depending on each use case

Recall we saved our two tower models and query embeddings (newline json) in a candidate folder like so:

![](img/saved-models.png)

## Set the Nearest Neighbor Options

See here for tips on [tuning the index](https://cloud.google.com/vertex-ai/docs/matching-engine/using-matching-engine#tuning_the_index)

From the paper - here's the rough idea

1. (Initialization Step) Select a dictionary C(m) bysampling from {x(m) 1 , . . . x (m) n }. 
2. (Partition Assignment Step) For each datapoint xi , update x˜i by using the value of c ∈ C (m) that minimizes the anisotropic loss of ˜xi.
3. (Codebook Update Step) Optimize the loss function over all codewords in all dictionaries while keeping every dictionaries partitions constant.
4. Repeat Step 2 and Step 3 until convergence to a fixed point or maximum number of iteration is reached.


### Relating the algorithm to the parameters:

* `leafNodeEmbeddingCount` -> Number of embeddings on each leaf node. The default value is 1000 if not set.
* `leafNodesToSearchPercent` -> The default percentage of leaf nodes that any query may be searched. Must be in range 1-100, inclusive. The default value is 10 (means 10%) if not set.
* `approximateNeighborsCount` -> The default number of neighbors to find through approximate search before exact reordering is performed. Exact reordering is a procedure where results returned by an approximate search algorithm are reordered via a more expensive distance computation.
* `distanceMeasureType` -> DOT_PRODUCT_DISTANCE is default - COSINE, L1 and L2^2 is available

Other best practices from our PM team:
```
Start from leafNodesToSearchPercent=5 and approximateNeighborsCount=10 * k

use default values for others.

measure performance and recall and change those 2 parameters accordingly.
```

In [None]:
tree_ah_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=DISPLAY_NAME,
    contents_delta_uri=EMBEDDINGS_INITIAL_URI,
    dimensions=DIMENSIONS,
    approximate_neighbors_count=50,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Songs embeddings from the Spotify million playlist dataset",
    labels={"label_name": "label_value"},
)

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/934903580331/locations/us-central1/indexes/7391796771212492800/operations/2183856935755841536


## This takes 20-30 minutes - here's some reading on what it is doing
#### Note on the advantages of the algorithm

[link](https://arxiv.org/pdf/1908.10396.pdf)

```However, it is easy to see that not all pairs of (x, q) are equally important. The approximation error on the pairs which have a high inner product is far more important since they are likely to be among the top ranked pairs and can greatly affect the search result, while for the pairs whose inner product is low the approximation error matters much less. In other words, for a given datapoint x, we should quantize it with a bigger focus on its error with those queries which have high inner product with x. See Figure 1 for the illustration.```


![](img/algo.png)


### Create a matching engine endpoint

Below we set the variable names for the endpoint along with other key values for resource creation

In [8]:
ENDPOINT = "{}-aiplatform.googleapis.com".format(LOCATION)
NETWORK_NAME = "ucaip-haystack-vpc-network"  # @param {type:"string"}
PROJECT_NUMBER = !gcloud projects list --filter="PROJECT_ID:'{PROJECT_ID}'" --format='value(PROJECT_NUMBER)'
PROJECT_NUMBER = PROJECT_NUMBER[0]

PARENT = "projects/{}/locations/{}".format(PROJECT_ID, LOCATION)

print("ENDPOINT: {}".format(ENDPOINT))
print("PROJECT_ID: {}".format(PROJECT_ID))
print("REGION: {}".format(LOCATION))

!gcloud config set project {PROJECT_ID}
!gcloud config set ai_platform/region {LOCATION}

ENDPOINT: us-central1-aiplatform.googleapis.com
PROJECT_ID: hybrid-vertex
REGION: us-central1
Updated property [core/project].
Updated property [ai_platform/region].


In [9]:
index_endpoint_client = aiplatform_v1beta1.IndexEndpointServiceClient(
    client_options=dict(api_endpoint=ENDPOINT)
)


VPC_NETWORK_NAME = "projects/{}/global/networks/{}".format(PROJECT_NUMBER, NETWORK_NAME) #FQN for VPC network name

index_client = {
    "display_name": "index_endpoint_for_demo",
    "network": VPC_NETWORK_NAME,
}

r = index_endpoint_client.create_index_endpoint(
    parent=PARENT, index_endpoint=index_client
)

## Other quick notes on ME while we wait for deployment

[link](https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology)

Instead of comparing vectors one by one, you could use the approximate nearest neighbor (ANN) approach to improve search times. Many ANN algorithms use vector quantization (VQ), in which you split the vector space into multiple groups, define "codewords" to represent each group, and search only for those codewords. This VQ technique dramatically enhances query speeds and is the essential part of many ANN algorithms, just like indexing is the essential part of relational databases and full-text search engines.

![](img/vectorQuant.gif)


As you may be able to conclude from the diagram above, as the number of groups in the space increases the speed of the search decreases and the accuracy increases.  Managing this trade-off — getting higher accuracy at shorter latency — has been a key challenge with ANN algorithms. 

Last year, Google Research announced ScaNN, a new solution that provides state-of-the-art results for this challenge. With ScaNN, they introduced a new VQ algorithm called anisotropic vector quantization:

![](img/Loss_Types.max-1000x1000.png)

Anisotropic vector quantization uses a new loss function to train a model for VQ for an optimal grouping to capture farther data points (i.e. higher inner product) in a single group. With this idea, the new algorithm gives you higher accuracy at lower latency, as you can see in the benchmark result below (the violet line): 

![](img/speedvsaccuracy.max-1600x1600.png)


In [10]:
INDEX_ENDPOINT_NAME = r.result().name
INDEX_ENDPOINT_NAME

'projects/934903580331/locations/us-central1/indexEndpoints/259220861364469760'

In [11]:
INDEX_ENDPOINT_NAME

'projects/934903580331/locations/us-central1/indexEndpoints/259220861364469760'

### Deploy the index to the endpoint and create and endpoint object

In [12]:
DEPLOYED_INDEX_ID = "deployed_spotify_v1"

deploy_ann_index = {
    "id": DEPLOYED_INDEX_ID,
    "display_name": DEPLOYED_INDEX_ID,
    "index": tree_ah_index.resource_name,
}

s = index_endpoint_client.deploy_index(
    index_endpoint=INDEX_ENDPOINT_NAME, deployed_index=deploy_ann_index
)

In [13]:
# Poll the operation until it's done successfullly.
import time

while True:
    if s.done():
        break
    print("Poll the operation to deploy index...")
    time.sleep(60)
s.result()

Poll the operation to deploy index...
Poll the operation to deploy index...
Poll the operation to deploy index...
Poll the operation to deploy index...


deployed_index {
  id: "deployed_spotify_v1"
}

In [14]:
# finally create the endpoint object
ME_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(INDEX_ENDPOINT_NAME)
ME_index_endpoint

<google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint.MatchingEngineIndexEndpoint object at 0x7f1a934194d0> 
resource name: projects/934903580331/locations/us-central1/indexEndpoints/259220861364469760

### You should now see matching engine resources in your GCP console:

![](img/me-resources2.png)

![](img/me-indexes.png)

### Now instantiate the matching engine endpoint for queries
We will use this later to send query results that return in a vector format from the query / playlist endpoint

Deploying that endpoint is the next task

In [15]:
## Deploy the query endpoint
QUERY_MODEL = "gs://two-tower-models/query_model"

model_gcp = aiplatform.Model.upload(
        display_name="Spotify Playlist Query Model",
        artifact_uri=QUERY_MODEL,
        serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest',
        description="Top of the query tower, meant to return an embedding for each playlist instance",
    )

Creating Model
Create Model backing LRO: projects/934903580331/locations/us-central1/models/8649686451899858944/operations/179755101575970816
Model created. Resource name: projects/934903580331/locations/us-central1/models/8649686451899858944@1
To use this Model in another session:
model = aiplatform.Model('projects/934903580331/locations/us-central1/models/8649686451899858944@1')


#### Deploy the uploaded model to an API endpoint

At this point you should be able to see the model in the Vertex Model Registry:

![](img/model-registry.png)

In [16]:
endpoint = aiplatform.Endpoint.create(
    display_name="Spotify Playist Model Endpoint",
    project=PROJECT_ID,
    location=LOCATION,
)

Creating Endpoint
Create Endpoint backing LRO: projects/934903580331/locations/us-central1/endpoints/1185154787486728192/operations/968729461295939584
Endpoint created. Resource name: projects/934903580331/locations/us-central1/endpoints/1185154787486728192
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/934903580331/locations/us-central1/endpoints/1185154787486728192')


In [17]:
deployment = model_gcp.deploy(
    endpoint=endpoint,
    deployed_model_display_name="Spotify Playlist Query Model",
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=2,
    accelerator_type=None,
    accelerator_count=0,
    sync=False,
)

This part takes a few minutes - but when done you should see a deployed model:
    
![](img/deployed_endpoint.png)

## Putting it together: Combine the query endpoint with the matching engine endpoint for state-of the art recommendations

You can grab a quick example from training or build yourself by utilizing the returned example structure. Skip is to get to a psuedo-random example. The data is the same from training - you can also construct your own test instances using the format provided


```python
for tensor_dict in train_dataset.unbatch().skip(12905).take(1):
    td_keys = tensor_dict.keys()
    list_dict = {}
    for k in td_keys:
        list_dict.update({k: tensor_dict[k].numpy()})
    print(list_dict)
```

## Get some test instances and run through the endpoint and matching engine

In [18]:
TEST_INSTANCE = {'album_name_can': 'We Just Havent Met Yet', 
                 'album_name_pl': ["There's Really A Wolf", 'Late Nights: The Album',
                       'American Teen', 'Crazy In Love', 'Pony'], 
                 'album_uri_can': 'spotify:album:5l83t3mbVgCrIe1VU9uJZR', 
                 'artist_followers_can': 4339757.0, 
                 'artist_genres_can': "'hawaiian hip hop', 'rap'", 
                 'artist_genres_pl': ["'hawaiian hip hop', 'rap'",
                       "'chicago rap', 'dance pop', 'pop', 'pop rap', 'r&b', 'southern hip hop', 'trap', 'urban contemporary'",
                       "'pop', 'pop r&b'", "'dance pop', 'pop', 'r&b'",
                       "'chill r&b', 'pop', 'pop r&b', 'r&b', 'urban contemporary'"], 
                 'artist_name_can': 'Russ', 
                 'artist_name_pl': ['Russ', 'Jeremih', 'Khalid', 'Beyonc\xc3\xa9',
                       'William Singe'], 
                 'artist_pop_can': 82.0, 
                 'artist_pop_pl': [82., 80., 90., 87., 65.], 
                 'artist_uri_can': 'spotify:artist:1z7b1Pr1rSlvWRzsW3HOrS', 
                 'artists_followers_pl': [ 4339757.,  5611842., 15046756., 30713126.,   603837.], 
                 'collaborative': 'false', 
                 'description_pl': '', 
                 'duration_ms_can': 237322.0, 
                 'duration_ms_songs_pl': [237506., 217200., 219080., 226400., 121739.], 
                 'n_songs_pl': 8.0, 
                 'name': 'Lit Tunes ', 
                 'num_albums_pl': 8.0, 
                 'num_artists_pl': 8.0, 
                 'track_name_can': 'We Just Havent Met Yet', 
                 'track_name_pl': ['Losin Control', 'Paradise', 'Location',
                       'Crazy In Love - Remix', 'Pony'], 
                 'track_pop_can': 57.0, 
                 'track_pop_pl': [79., 58., 83., 71., 57.], 
                 'track_uri_can': 'spotify:track:0VzDv4wiuZsLsNOmfaUy2W', 
                 'track_uri_pl': ['spotify:track:4cxMGhkinTocPSVVKWIw0d',
                       'spotify:track:1wNEBPo3nsbGCZRryI832I',
                       'spotify:track:152lZdxL1OR0ZMW6KquMif',
                       'spotify:track:2f4IuijXLxYOeBncS60GUD',
                       'spotify:track:4Lj8paMFwyKTGfILLELVxt']}

In [25]:
deployment.predict([TEST_INSTANCE])

Prediction(predictions=[[0.0840319172, 0.186792836, 0.0950887352, 0.00264413841, 0.147926927, -0.0700385496, -0.102679625, -0.154979482, 0.148580015, -0.101469621, -0.091818966, 0.00616804603, 0.00408178242, -0.00168958679, 0.0360991098, 0.0825722665, 0.0213036463, 0.0813929439, -0.0449486524, 0.0399172455, 0.0706564263, -0.0271026175, 0.0144804297, -0.0893997326, -0.077369, 0.0540666, 0.0316286944, 0.0147486292, 0.071288988, 0.142902657, -0.0227023549, 0.0330891721, -0.0288588032, 0.130149409, -0.0312393289, 0.0212887935, 0.090886116, -0.114642344, 0.0448945798, 0.105831, 0.149386242, -0.0897929147, 0.103703573, -0.167980403, 0.0869361386, 0.13272588, -0.0506982207, -0.000560096931, -0.0707496703, 0.0320968144, -0.15122813, 0.0746913552, -0.0434209928, 0.0682459846, 0.0147757465, -0.154896766, -0.0927062, 0.0450625271, 0.0736268386, -0.0314874575, 0.0678062886, 0.0491155, 0.0494973101, -0.0348398499, 0.104099423, 0.0259010959, 0.0120011466, 0.057427872, -0.0411665812, -0.00769883487, 

# Get our predictions for the next song recommendation - now let's feed the embedding to find the closest neighbors
(higher distance score better - this is inner product)

In [26]:
playlist_emb = deployment.predict([TEST_INSTANCE])

In [27]:
ME_index_endpoint.match(deployed_index_id='deployed_spotify_v1',
                       queries=playlist_emb.predictions,
                       num_neighbors=10)

[[MatchNeighbor(id="b'spotify:track:3M66kXDQHwSK68EB8bzwNX'", distance=17.088489532470703),
  MatchNeighbor(id="b'spotify:track:38HFzYat2gnWcjqtgntSBW'", distance=11.790796279907227),
  MatchNeighbor(id="b'spotify:track:0XhYgAQ8RIPT96grdcokho'", distance=9.291814804077148),
  MatchNeighbor(id="b'spotify:track:0ZdXsa7FAfSme3dZGxLYcx'", distance=8.279316902160645),
  MatchNeighbor(id="b'spotify:track:18EALdmGjJFvaPCmmVyOlT'", distance=7.732163429260254),
  MatchNeighbor(id="b'spotify:track:7J1fTxccRfKJO4umBdqKaN'", distance=7.713829040527344),
  MatchNeighbor(id="b'spotify:track:5t4xvd7Tji3PnSokrdpJBr'", distance=7.037336349487305),
  MatchNeighbor(id="b'spotify:track:1D27lPKHCbMTFAqGRunLs3'", distance=6.975942611694336),
  MatchNeighbor(id="b'spotify:track:2QfprS7sN2u4W9gpe67YMH'", distance=6.927917957305908),
  MatchNeighbor(id="b'spotify:track:4K6z75yOLLMvfRaXmA8eCT'", distance=6.81923770904541)]]

#### All set - you now have a working recommendation system!
See the next notebook if you want to explore the system and make recommendations for yourself

In [None]:
# ## Cleanup
# from google.cloud import aiplatform_v1beta1 #needed for matching engine calls

# REGION = 'us-central1'
# ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)

# index_client = aiplatform_v1beta1.IndexServiceClient(
#     client_options=dict(api_endpoint=ENDPOINT)
# )

# index_client.delete_index(name='deployed_spotify_v1')
# index_client.delete_index(name=INDEX_BRUTE_FORCE_RESOURCE_NAME)
# index_endpoint_client.delete_index_endpoint(name=INDEX_ENDPOINT_NAME)