# Implementing Recommendation Engines with Matching Engine

![](img/arch.png)

#### VPC Network peering
Matching engine is a high performance vector matching service that requires a seperate VPC to ensure performance. 

Below are the one-time instructions to set up a peering network. 

# **Once created, be sure to your notebook instance running this particular notebook is in the subnetwork... https://cloud.google.com/vertex-ai/docs/matching-engine/match-eng-setup **

![](img/subnetwork2.png) 
Then select the network from advanced options
![](img/subnetwork.png)

In [1]:
# PROJECT_ID = "hybrid-vertex" 
# NETWORK_NAME = "ucaip-haystack-vpc-network"  
# PEERING_RANGE_NAME = "ucaip-haystack-range"

# # Create a VPC network
# ! gcloud compute networks create {NETWORK_NAME} --bgp-routing-mode=regional --subnet-mode=auto --project={PROJECT_ID}

# # Add necessary firewall rules
# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-icmp --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow icmp

# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-internal --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow all --source-ranges 10.128.0.0/9

# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-rdp --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow tcp:3389

# ! gcloud compute firewall-rules create {NETWORK_NAME}-allow-ssh --network {NETWORK_NAME} --priority 65534 --project {PROJECT_ID} --allow tcp:22

# # Reserve IP range
# ! gcloud compute addresses create {PEERING_RANGE_NAME} --global --prefix-length=16 --network={NETWORK_NAME} --purpose=VPC_PEERING --project={PROJECT_ID} --description="peering range for uCAIP Haystack."

# Set up peering with service networking
# ! gcloud services vpc-peerings connect --service=servicenetworking.googleapis.com --network=${NETWORK_NAME} --ranges=${PEERING_RANGE_NAME} --project=${PROJECT_ID}

### Deploy Playlist Query Model to a Vertex endpoint
This will be the endpoint that a user will query with thier last n songs played 

In [4]:
import os

PROJECT_ID = 'wortz-project-352116'  # <--- TODO: CHANGE THIS
PROJECT = PROJECT_ID
LOCATION = 'us-central1'
REGION = 'us-central1'
DIMENSIONS = 128 # must match output dimensions - embedding dim in two tower code
DISPLAY_NAME = "spotify_merlin_candidates_v1"
OUTPUT_PATH = 'gs://spotify-jsw-mpd-2023'
# OUTPUT_PATH = os.path.join(BUCKET, "merlin-processed")
MODEL_PATH = os.path.join(OUTPUT_PATH, 'query_model_merlin')

import sys
from google.cloud import aiplatform_v1beta1 #needed for matching engine calls


from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

EMBEDDINGS_INITIAL_URI = os.path.join(OUTPUT_PATH, 'merlin-embeddings')

### Create a matching engine index

The matching engine loads an index from a file of embeddings created from the last notebook. 

Many of the optimization options for matching engine are found in the ah tree settings and testing is recommended depending on each use case

Recall we saved our two tower models and query embeddings (newline json) in a candidate folder like so:

![](img/saved-models.png)

## Set the Nearest Neighbor Options

See here for tips on [tuning the index](https://cloud.google.com/vertex-ai/docs/matching-engine/using-matching-engine#tuning_the_index)

From the paper - here's the rough idea

1. (Initialization Step) Select a dictionary C(m) bysampling from {x(m) 1 , . . . x (m) n }. 
2. (Partition Assignment Step) For each datapoint xi , update x˜i by using the value of c ∈ C (m) that minimizes the anisotropic loss of ˜xi.
3. (Codebook Update Step) Optimize the loss function over all codewords in all dictionaries while keeping every dictionaries partitions constant.
4. Repeat Step 2 and Step 3 until convergence to a fixed point or maximum number of iteration is reached.


### Relating the algorithm to the parameters:

* `leafNodeEmbeddingCount` -> Number of embeddings on each leaf node. The default value is 1000 if not set.
* `leafNodesToSearchPercent` -> The default percentage of leaf nodes that any query may be searched. Must be in range 1-100, inclusive. The default value is 10 (means 10%) if not set.
* `approximateNeighborsCount` -> The default number of neighbors to find through approximate search before exact reordering is performed. Exact reordering is a procedure where results returned by an approximate search algorithm are reordered via a more expensive distance computation.
* `distanceMeasureType` -> DOT_PRODUCT_DISTANCE is default - COSINE, L1 and L2^2 is available

Other best practices from our PM team:
```
Start from leafNodesToSearchPercent=5 and approximateNeighborsCount=10 * k

use default values for others.

measure performance and recall and change those 2 parameters accordingly.
```

In [5]:
tree_ah_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=DISPLAY_NAME,
    contents_delta_uri=EMBEDDINGS_INITIAL_URI,
    dimensions=DIMENSIONS,
    approximate_neighbors_count=50,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Songs embeddings from the Spotify million playlist dataset",
    labels={"label_name": "label_value"},
)

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/679926387543/locations/us-central1/indexes/1995121023204196352/operations/182364792424497152
MatchingEngineIndex created. Resource name: projects/679926387543/locations/us-central1/indexes/1995121023204196352
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/679926387543/locations/us-central1/indexes/1995121023204196352')


### Public Endpoint Example

```
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints
{
  "display_name": "public-endpoint-test1", "publicEndpointEnabled: true"
 
}
```

In [63]:
import requests
import json
import google.auth
import google.auth.transport.requests



credentials, project = google.auth.default()
request = google.auth.transport.requests.Request()
credentials.refresh(request)
ENDPOINT_NAME = 'public-test-wortz2'

endpoint_create_data = {
    'display_name': ENDPOINT_NAME,
    'publicEndpointEnabled': True
}

rpc_address = f'https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{LOCATION}/indexEndpoints'
endpoint_json_data = json.dumps(endpoint_create_data)

header = {'Authorization': 'Bearer ' + credentials.token}


create_endpoint_response = requests.post(rpc_address, data=endpoint_json_data, headers=header).json()

In [71]:
create_endpoint_response

{'name': 'projects/679926387543/locations/us-central1/indexEndpoints/5619392823330603008/operations/4938500250462584832',
 'metadata': {'@type': 'type.googleapis.com/google.cloud.aiplatform.v1.CreateIndexEndpointOperationMetadata',
  'genericMetadata': {'createTime': '2023-04-08T20:34:09.727324Z',
   'updateTime': '2023-04-08T20:34:09.727324Z'}}}

In [105]:
OPERATION_NAME = create_endpoint_response['name']
PUBLIC_ENDPOINT_NAME, sep, tail = OPERATION_NAME.partition('/operations')
_, _, ENDPOINT_NUM = PUBLIC_ENDPOINT_NAME.partition('indexEndpoints/')


#set the deployed index name
INDEX_ENDPOINT_ID = 'merlin_spotify_candidate_deployed_index_v1'

PUBLIC_ENDPOINT_NAME

'projects/679926387543/locations/us-central1/indexEndpoints/5619392823330603008'

### Create a matching engine endpoint

Below we set the variable names for the endpoint along with other key values for resource creation

In [79]:
ENDPOINT = aiplatform.MatchingEngineIndexEndpoint(
    index_endpoint_name=PUBLIC_ENDPOINT_NAME,
    project=PROJECT,
    location=LOCATION,
)

### Deploy the index to the endpoint and create and endpoint object

In [None]:
ME_index_endpoint = ENDPOINT.deploy_index(
    index=tree_ah_index,
    deployed_index_id=INDEX_ENDPOINT_ID,
    display_name='Spotify Merlin Deployed Index v1'
)

## This takes 20-30 minutes - here's some reading on what it is doing
#### Note on the advantages of the algorithm

[link](https://arxiv.org/pdf/1908.10396.pdf)

```However, it is easy to see that not all pairs of (x, q) are equally important. The approximation error on the pairs which have a high inner product is far more important since they are likely to be among the top ranked pairs and can greatly affect the search result, while for the pairs whose inner product is low the approximation error matters much less. In other words, for a given datapoint x, we should quantize it with a bigger focus on its error with those queries which have high inner product with x. See Figure 1 for the illustration.```


![](img/algo.png)


## Other quick notes on ME while we wait for deployment

[link](https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology)

Instead of comparing vectors one by one, you could use the approximate nearest neighbor (ANN) approach to improve search times. Many ANN algorithms use vector quantization (VQ), in which you split the vector space into multiple groups, define "codewords" to represent each group, and search only for those codewords. This VQ technique dramatically enhances query speeds and is the essential part of many ANN algorithms, just like indexing is the essential part of relational databases and full-text search engines.

![](img/vectorQuant.gif)


As you may be able to conclude from the diagram above, as the number of groups in the space increases the speed of the search decreases and the accuracy increases.  Managing this trade-off — getting higher accuracy at shorter latency — has been a key challenge with ANN algorithms. 

Last year, Google Research announced ScaNN, a new solution that provides state-of-the-art results for this challenge. With ScaNN, they introduced a new VQ algorithm called anisotropic vector quantization:

![](img/Loss_Types.max-1000x1000.png)

Anisotropic vector quantization uses a new loss function to train a model for VQ for an optimal grouping to capture farther data points (i.e. higher inner product) in a single group. With this idea, the new algorithm gives you higher accuracy at lower latency, as you can see in the benchmark result below (the violet line): 

![](img/speedvsaccuracy.max-1600x1600.png)


### You should now see matching engine resources in your GCP console:

![](img/me-resources2.png)

![](img/me-indexes.png)

## Note the endpoint has already been created in the `02-build-custom-query-predictor`


In [86]:
# #from the last noteobook copy/paste

endpoint = aiplatform.Endpoint('projects/679926387543/locations/us-central1/endpoints/7469831310459011072')

## Putting it together: Combine the query endpoint with the matching engine endpoint for state-of the art recommendations

You can grab a quick example from training or build yourself by utilizing the returned example structure. Skip is to get to a psuedo-random example. The data is the same from training - you can also construct your own test instances using the format provided


```python
for tensor_dict in train_dataset.unbatch().skip(12905).take(1):
    td_keys = tensor_dict.keys()
    list_dict = {}
    for k in td_keys:
        list_dict.update({k: tensor_dict[k].numpy()})
    print(list_dict)
```

## Get some test instances and run through the endpoint and matching engine

In [87]:
## Ground truth candidate:
    # 'album_uri_can': 'spotify:album:5l83t3mbVgCrIe1VU9uJZR', 
    # 'artist_name_can': 'Russ', 
    # 'track_name_can': 'We Just Havent Met Yet', 
## TODO - we have to overload with candidate data because of the workflow transform, add overloaded values in the predictor
TEST_INSTANCE = { 'pid': 1,
                 'pl_name_src': 'Lit Tunes ', 
                 'pl_collaborative_src': 'false',
                 'pl_duration_ms_new': 237506.0, 
                  'artist_name_pl': ['Russ', 'Jeremih', 'Khalid', 'Beyonc\xc3\xa9',
                       'William Singe'], 
                 'track_uri_pl': ['spotify:track:4cxMGhkinTocPSVVKWIw0d',
                       'spotify:track:1wNEBPo3nsbGCZRryI832I',
                       'spotify:track:152lZdxL1OR0ZMW6KquMif',
                       'spotify:track:2f4IuijXLxYOeBncS60GUD',
                       'spotify:track:4Lj8paMFwyKTGfILLELVxt'],
                 'track_name_pl': ['Losin Control', 'Paradise', 'Location',
                       'Crazy In Love - Remix', 'Pony'], 
                 'album_name_pl': ["There's Really A Wolf", 'Late Nights: The Album',
                       'American Teen', 'Crazy In Love', 'Pony'], 
                     }

#### Before we submit the prediction to test, we have to call the workflow
Ideally this could be implemented with a custom prediction routine so the data is pre-processed ahead of time.

More info on how to implement a pre-processer custom model: https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines

# Get our predictions for the next song recommendation - now let's feed the embedding to find the closest neighbors
(higher distance score better - this is inner product)

In [88]:
playlist_emb = endpoint.predict([TEST_INSTANCE])

## Getting the domain of the public endpoint

[Context](https://cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public#get_the_index_domain_name)

Following this guide for the [new API](https://cloud.google.com/vertex-ai/docs/matching-engine/query-index-public-endpoint)

```
$ curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`"  https://1957880287.us-central1-181224308459.vdb.vertexai.goog/v1beta1/projects/181224308459/locations/us-central1/indexEndpoints/3370566089086861312:findNeighbors -d '{deployed_index_id: "test_index_public1", queries: [{datapoint: {datapoint_id: "0", feature_vector: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}, neighbor_count: 5}]}'
```

In [152]:
endpoint_data = ! gcloud ai index-endpoints list --region {LOCATION}
endpoint_address = [e for e in endpoint_data if 'publicEndpointDomainName' in e][0].partition(': ')[2] #careful - this is grabbing the first one in the list
endpoint_address

'1071885379.us-central1-679926387543.vdb.vertexai.goog'

In [193]:
import requests
import json
import google.auth
import google.auth.transport.requests



credentials, project = google.auth.default()
request = google.auth.transport.requests.Request()
credentials.refresh(request)
    # 1]}, neighbor_count: 5}]}'
request_data = {"deployed_index_id": INDEX_ENDPOINT_ID,
                                 'queries': [{'datapoint': {"datapoint_id": f"{i}", 
                                                            "feature_vector": emb},
                                              'neighbor_count': 5} 
                                             for i, emb in enumerate(playlist_emb.predictions)]
                                }

rpc_address = f'https://{endpoint_address}/v1beta1/projects/{PROJECT}/locations/{LOCATION}/indexEndpoints/{INDEX_ENDPOINT_ID}:findNeighbors'
endpoint_json_data = json.dumps(request_data)

header = {'Authorization': 'Bearer ' + credentials.token}


requests.post(rpc_address, data=endpoint_json_data, headers=header).json()

{'nearestNeighbors': [{'id': '0',
   'neighbors': [{'datapoint': {'datapointId': 'spotify:track:2jLHm0rzlxKonOQPSApDJN',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 65.91852569580078},
    {'datapoint': {'datapointId': 'spotify:track:2t9Ftn5WjAqUP9LWW8C5wg',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 65.82749938964844},
    {'datapoint': {'datapointId': 'spotify:track:6k4Flvu1tcz8vWAfcaxEPP',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 65.807861328125},
    {'datapoint': {'datapointId': 'spotify:track:4d5aAIEtXySFDFxNuYbbZv',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 65.78123474121094},
    {'datapoint': {'datapointId': 'spotify:track:2M0PF3WQt38vwoKjay5Ioh',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 65.74112701416016}]}]}

# Here's our Recommendations!

  ### No Scurbs
`[[MatchNeighbor(id='spotify:track:6rvjjyTCbuoIucuI9TvTfX', distance=118.5457992553711),`

![](img/spotify/1.png)
_______

  ### I Know
`MatchNeighbor(id='spotify:track:2jNFga2ldTWgXXuu57u6QB', distance=117.87425994873047),`

![](img/spotify/2.png)

_______

  ### Eye 2 Eye
`MatchNeighbor(id='spotify:track:5RINSPsc11CjSj9OhBbHeV', distance=117.82361602783203),`

![](img/spotify/3.png)

_______

  ### Just a Friend
`MatchNeighbor(id='spotify:track:14yaVE5zyeIZ5OopvQFVtG', distance=117.8228530883789),`

![](img/spotify/3.png)


_______

  ### Mirror 
`MatchNeighbor(id='spotify:track:5sDKFCwJsnIS31t3IaCoZt', distance=117.7808609008789),`

![](img/spotify/4.png)


_______

   ### Hello
`MatchNeighbor(id='spotify:track:3vW77S4JolRGbunmXuJMIV', distance=117.58699035644531),`

![](img/spotify/5.png)


_______

  ### home
`MatchNeighbor(id='spotify:track:3gY3fTkff4VmZ2XfvU6N4l', distance=117.57185363769531),`

![](img/spotify/6.png)


_______

  ### Undercover - Coucheron Remix
`MatchNeighbor(id='spotify:track:1uE9BkKQtUT03iPPk55oKO', distance=117.54019927978516),`

![](img/spotify/7.png)

_______

  ### Undercover - Devault Remix

`MatchNeighbor(id='spotify:track:2csIyFgOxMfxE9yn02Rc2A', distance=117.40080261230469),`




![](img/spotify/7.png)

_______

### Got That Bomb
`MatchNeighbor(id='spotify:track:5bxRZTJbd2vApE4ZyakL0h', distance=117.31294250488281)]]`

![](img/spotify/3.png)


In [None]:
# ## Cleanup
# from google.cloud import aiplatform_v1beta1 #needed for matching engine calls

# REGION = 'us-central1'
# ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)

# index_client = aiplatform_v1beta1.IndexServiceClient(
#     client_options=dict(api_endpoint=ENDPOINT)
# )

# index_client.delete_index(name='deployed_spotify_v1')
# index_client.delete_index(name=INDEX_BRUTE_FORCE_RESOURCE_NAME)
# index_endpoint_client.delete_index_endpoint(name=INDEX_ENDPOINT_NAME)