# Implementing Recommendation Engines with Matching Engine

![](img/arch.png)

### Deploy Playlist Query Model to a Vertex endpoint
This will be the endpoint that a user will query with thier last n songs played 

In [1]:
import os

PROJECT_ID = 'wortz-project-352116'  # <--- TODO: CHANGE THIS
PROJECT = PROJECT_ID
LOCATION = 'us-central1'
REGION = 'us-central1'
DIMENSIONS = 128 # must match output dimensions - embedding dim in two tower code
DISPLAY_NAME = "spotify_merlin_candidates_v1"
OUTPUT_PATH = 'gs://spotify-jsw-mpd-2023'
# OUTPUT_PATH = os.path.join(BUCKET, "merlin-processed")
MODEL_PATH = os.path.join(OUTPUT_PATH, 'query_model_merlin')
INDEX_ENDPOINT_ID = 'merlin_spotify_candidate_deployed_index_v1'

import sys
from google.cloud import aiplatform_v1beta1 #needed for matching engine calls


from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

EMBEDDINGS_INITIAL_URI = os.path.join(OUTPUT_PATH, 'merlin-embeddings')

### Create a matching engine index

The matching engine loads an index from a file of embeddings created from the last notebook. 

Many of the optimization options for matching engine are found in the ah tree settings and testing is recommended depending on each use case

Recall we saved our two tower models and query embeddings (newline json) in a candidate folder like so:

![](img/saved-models.png)

## Set the Nearest Neighbor Options

See here for tips on [tuning the index](https://cloud.google.com/vertex-ai/docs/matching-engine/using-matching-engine#tuning_the_index)

From the paper - here's the rough idea

1. (Initialization Step) Select a dictionary C(m) bysampling from {x(m) 1 , . . . x (m) n }. 
2. (Partition Assignment Step) For each datapoint xi , update x˜i by using the value of c ∈ C (m) that minimizes the anisotropic loss of ˜xi.
3. (Codebook Update Step) Optimize the loss function over all codewords in all dictionaries while keeping every dictionaries partitions constant.
4. Repeat Step 2 and Step 3 until convergence to a fixed point or maximum number of iteration is reached.


### Relating the algorithm to the parameters:

* `leafNodeEmbeddingCount` -> Number of embeddings on each leaf node. The default value is 1000 if not set.
* `leafNodesToSearchPercent` -> The default percentage of leaf nodes that any query may be searched. Must be in range 1-100, inclusive. The default value is 10 (means 10%) if not set.
* `approximateNeighborsCount` -> The default number of neighbors to find through approximate search before exact reordering is performed. Exact reordering is a procedure where results returned by an approximate search algorithm are reordered via a more expensive distance computation.
* `distanceMeasureType` -> DOT_PRODUCT_DISTANCE is default - COSINE, L1 and L2^2 is available

Other best practices from our PM team:
```
Start from leafNodesToSearchPercent=5 and approximateNeighborsCount=10 * k

use default values for others.

measure performance and recall and change those 2 parameters accordingly.
```

In [5]:
tree_ah_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=DISPLAY_NAME,
    contents_delta_uri=EMBEDDINGS_INITIAL_URI,
    dimensions=DIMENSIONS,
    approximate_neighbors_count=50,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Songs embeddings from the Spotify million playlist dataset",
    labels={"label_name": "label_value"},
)

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/679926387543/locations/us-central1/indexes/1995121023204196352/operations/182364792424497152
MatchingEngineIndex created. Resource name: projects/679926387543/locations/us-central1/indexes/1995121023204196352
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/679926387543/locations/us-central1/indexes/1995121023204196352')


### Public Endpoint Example

```
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/indexEndpoints
{
  "display_name": "public-endpoint-test1", "publicEndpointEnabled: true"
 
}
```

In [63]:
import requests
import json
import google.auth
import google.auth.transport.requests



credentials, project = google.auth.default()
request = google.auth.transport.requests.Request()
credentials.refresh(request)
ENDPOINT_NAME = 'public-test-wortz2'

endpoint_create_data = {
    'display_name': ENDPOINT_NAME,
    'publicEndpointEnabled': True
}

rpc_address = f'https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{LOCATION}/indexEndpoints'
endpoint_json_data = json.dumps(endpoint_create_data)

header = {'Authorization': 'Bearer ' + credentials.token}


create_endpoint_response = requests.post(rpc_address, data=endpoint_json_data, headers=header).json()

In [71]:
create_endpoint_response

{'name': 'projects/679926387543/locations/us-central1/indexEndpoints/5619392823330603008/operations/4938500250462584832',
 'metadata': {'@type': 'type.googleapis.com/google.cloud.aiplatform.v1.CreateIndexEndpointOperationMetadata',
  'genericMetadata': {'createTime': '2023-04-08T20:34:09.727324Z',
   'updateTime': '2023-04-08T20:34:09.727324Z'}}}

In [105]:
OPERATION_NAME = create_endpoint_response['name']
PUBLIC_ENDPOINT_NAME, _, _ = OPERATION_NAME.partition('/operations')
_, _, ENDPOINT_NUM = PUBLIC_ENDPOINT_NAME.partition('indexEndpoints/')


#set the deployed index name
INDEX_ENDPOINT_ID = 'merlin_spotify_candidate_deployed_index_v1'

PUBLIC_ENDPOINT_NAME

'projects/679926387543/locations/us-central1/indexEndpoints/5619392823330603008'

### Create a matching engine endpoint

Below we set the variable names for the endpoint along with other key values for resource creation

In [79]:
ENDPOINT = aiplatform.MatchingEngineIndexEndpoint(
    index_endpoint_name=PUBLIC_ENDPOINT_NAME,
    project=PROJECT,
    location=LOCATION,
)

### Deploy the index to the endpoint and create and endpoint object

In [None]:
ME_index_endpoint = ENDPOINT.deploy_index(
    index=tree_ah_index,
    deployed_index_id=INDEX_ENDPOINT_ID,
    display_name='Spotify Merlin Deployed Index v1'
)

## This takes 20-30 minutes - here's some reading on what it is doing
#### Note on the advantages of the algorithm

[link](https://arxiv.org/pdf/1908.10396.pdf)

```However, it is easy to see that not all pairs of (x, q) are equally important. The approximation error on the pairs which have a high inner product is far more important since they are likely to be among the top ranked pairs and can greatly affect the search result, while for the pairs whose inner product is low the approximation error matters much less. In other words, for a given datapoint x, we should quantize it with a bigger focus on its error with those queries which have high inner product with x. See Figure 1 for the illustration.```


![](img/algo.png)


## Other quick notes on ME while we wait for deployment

[link](https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology)

Instead of comparing vectors one by one, you could use the approximate nearest neighbor (ANN) approach to improve search times. Many ANN algorithms use vector quantization (VQ), in which you split the vector space into multiple groups, define "codewords" to represent each group, and search only for those codewords. This VQ technique dramatically enhances query speeds and is the essential part of many ANN algorithms, just like indexing is the essential part of relational databases and full-text search engines.

![](img/vectorQuant.gif)


As you may be able to conclude from the diagram above, as the number of groups in the space increases the speed of the search decreases and the accuracy increases.  Managing this trade-off — getting higher accuracy at shorter latency — has been a key challenge with ANN algorithms. 

Last year, Google Research announced ScaNN, a new solution that provides state-of-the-art results for this challenge. With ScaNN, they introduced a new VQ algorithm called anisotropic vector quantization:

![](img/Loss_Types.max-1000x1000.png)

Anisotropic vector quantization uses a new loss function to train a model for VQ for an optimal grouping to capture farther data points (i.e. higher inner product) in a single group. With this idea, the new algorithm gives you higher accuracy at lower latency, as you can see in the benchmark result below (the violet line): 

![](img/speedvsaccuracy.max-1600x1600.png)


### You should now see matching engine resources in your GCP console:

![](img/me-resources2.png)

![](img/me-indexes.png)

## Note the endpoint has already been created in the `02-build-custom-query-predictor`


In [2]:
# #from the last noteobook copy/paste

endpoint = aiplatform.Endpoint('projects/679926387543/locations/us-central1/endpoints/7469831310459011072')

## Putting it together: Combine the query endpoint with the matching engine endpoint for state-of the art recommendations

You can grab a quick example from training or build yourself by utilizing the returned example structure. Skip is to get to a psuedo-random example. The data is the same from training - you can also construct your own test instances using the format provided


```python
for tensor_dict in train_dataset.unbatch().skip(12905).take(1):
    td_keys = tensor_dict.keys()
    list_dict = {}
    for k in td_keys:
        list_dict.update({k: tensor_dict[k].numpy()})
    print(list_dict)
```

## Get some test instances and run through the endpoint and matching engine

In [20]:
TEST_INSTANCE = {
  "pid": 1,
  "pl_name_src": "Jazz Standards",
  "pl_collaborative_src": "false",
  "pl_duration_ms_new": 2968084.0,
  "artist_name_pl": ["Miles Davis", "Duke Ellington", "Charlie Parker", "Dianne Reeves", "Pat Metheny"],
  "artist_uri_pl": ["spotify:artist:0kbYTNQb4Pb1rPbbaF0pT4", "spotify:artist:4F7Q5NV6h5TSwCainz8S5A", "spotify:artist:4Ww5mwS7BWYjoZTUIrMHfC", "spotify:artist:7nwrblOf59ulOiB6djwPVh", "spotify:artist:3t58jfUhoMLYVO14XaUFLA"],
  "track_uri_pl": ["spotify:track:0aWMVrwxPNYkKmFthzmpRi", "spotify:track:0PrGgNDwfJPNXADJYROvBw", "spotify:track:5rPMbUxXRXvWu89k0n6Sxj", "spotify:track:3DRL2sPYVbx87ArfP2TBqD", "spotify:track:7rYSSGZShi5Zgde60MQAMx"],
  "track_name_pl": ["Blue in Green", "In A Sentimental Mood", "April In Paris", "How High The Moon", "All the Things You Are"],
  "album_name_pl": ["Kind Of Blue (Legacy Edition)", "Duke Ellington \u0026 John Coltrane", "The Genius Of Charlie Parker #2: April In Paris", "Good Night, And Good Luck", "Question and Answer"],
}

#### Before we submit the prediction to test, we have to call the workflow
Ideally this could be implemented with a custom prediction routine so the data is pre-processed ahead of time.

More info on how to implement a pre-processer custom model: https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines

# Get our predictions for the next song recommendation - now let's feed the embedding to find the closest neighbors
(higher distance score better - this is inner product)

In [27]:
playlist_emb = endpoint.predict([TEST_INSTANCE, TEST_INSTANCE])

In [28]:
playlist_emb

Prediction(predictions=[[0.0, 0.0, 0.4922759830951691, 3.138733625411987, 0.5686017274856567, 1.365304350852966, 0.0, 0.0, 1.224401354789734, 0.7344078421592712, 0.0, 0.0, 0.7765825390815735, 0.05535386130213737, 0.0, 0.0, 0.0, 0.6895463466644287, 3.277911424636841, 1.6609787940979, 0.0, 3.363595962524414, 0.0, 2.36107325553894, 0.0, 0.0, 0.408475935459137, 0.4845336079597473, 1.05967390537262, 0.0, 0.0, 0.0, 1.358271241188049, 2.016084671020508, 0.04031898081302643, 0.0, 0.0, 1.558114409446716, 0.0, 0.08912238478660583, 0.4924914836883545, 0.0, 0.0, 2.32361102104187, 0.8849217891693115, 0.0, 1.193393707275391, 0.5141043066978455, 0.2576574087142944, 0.0, 0.0, 0.5713821053504944, 3.878991603851318, 0.5326656103134155, 0.9144375920295715, 0.0, 0.0, 0.213668167591095, 0.0, 0.6621321439743042, 1.690809845924377, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5349928736686707, 1.141141653060913, 0.0, 0.0, 0.1994768977165222, 0.0, 0.06651673465967178, 0.0, 0.112617239356041, 2.056260108947754, 2.001875638

## Getting the domain of the public endpoint

[Context](https://cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public#get_the_index_domain_name)

Following this guide for the [new API](https://cloud.google.com/vertex-ai/docs/matching-engine/query-index-public-endpoint)

```
$ curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`"  https://1957880287.us-central1-181224308459.vdb.vertexai.goog/v1beta1/projects/181224308459/locations/us-central1/indexEndpoints/3370566089086861312:findNeighbors -d '{deployed_index_id: "test_index_public1", queries: [{datapoint: {datapoint_id: "0", feature_vector: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}, neighbor_count: 5}]}'
```

In [23]:
endpoint_data = ! gcloud ai index-endpoints list --region {LOCATION} --filter {INDEX_ENDPOINT_ID}
endpoint_address = [e for e in endpoint_data if 'publicEndpointDomainName' in e][0].partition(': ')[2] #careful - this is grabbing the first one in the list
endpoint_address

'1071885379.us-central1-679926387543.vdb.vertexai.goog'

In [29]:
import requests
import json
import google.auth
import google.auth.transport.requests
from typing import List


def get_matches_public_endpoint(embeddings: List[List[float]],
                                n_matches: int, 
                                endpoint_address: str, 
                                index_endpoint_id: str, 
                                project: str = PROJECT, 
                                location: str = LOCATION) -> str:
    '''
    get matches from matching engine given a vector query
    Uses public endpoint
    
    '''
    credentials, project = google.auth.default()
    request = google.auth.transport.requests.Request()
    credentials.refresh(request)
    request_data = {"deployed_index_id": index_endpoint_id,
                                     'queries': [{'datapoint': {"datapoint_id": f"{i}", 
                                                                "feature_vector": emb},
                                                  'neighbor_count': n_matches} 
                                                 for i, emb in enumerate(embeddings)]
                                    }

    rpc_address = f'https://{endpoint_address}/v1beta1/projects/{project}/locations/{location}/indexEndpoints/{index_endpoint_id}:findNeighbors'
    endpoint_json_data = json.dumps(request_data)

    header = {'Authorization': 'Bearer ' + credentials.token}


    return requests.post(rpc_address, data=endpoint_json_data, headers=header).json()
                                
dict(get_matches_public_endpoint(playlist_emb.predictions,
                            5, 
                            endpoint_address, 
                            INDEX_ENDPOINT_ID))

{'nearestNeighbors': [{'id': '0',
   'neighbors': [{'datapoint': {'datapointId': 'spotify:track:7dvw0O6JYgJdzM6atK49qO',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 85.89031219482422},
    {'datapoint': {'datapointId': 'spotify:track:549IVSyUCvitBeygRyN3FQ',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 85.86182403564453},
    {'datapoint': {'datapointId': 'spotify:track:47xapmNY1CSQrKmvwIIsDt',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 85.83961486816406},
    {'datapoint': {'datapointId': 'spotify:track:4mWIlN41RfA2cG5YT2dDh6',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 85.83741760253906},
    {'datapoint': {'datapointId': 'spotify:track:5P0u9rjq2ucW3hnvHPkXwe',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 85.83346557617188}]},
  {'id': '1',
   'neighbors': [{'datapoint': {'datapointId': 'spotify:track:7dvw0O6JYgJdzM6atK49qO',
      'crowdingTag': {'crowdingAttribute': '0

# Here's our Recommendations!

  ### Gone wit the wind
`{'datapoint': {'datapointId': 'spotify:track:7dvw0O6JYgJdzM6atK49qO',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 85.89031219482422}`

![](img/spotify/1.png)
_______


In [None]:
# From Batch Prediction Table

In [202]:
test_playlist2 = [[0.0, 0.0, 0.11261723935604095, 3.6582555770874023, 0.7231143712997437, 0.0, 0.0, 0.0, 0.06596212089061737, 0.610464870929718, 0.0, 0.0, 0.2943631708621979, 0.37219682335853577, 0.0, 0.0, 0.0, 0.7215844988822937, 3.073551893234253, 0.19929471611976624, 0.0, 2.4727284908294678, 0.0, 0.8978801965713501, 0.0, 0.0, 0.4465973973274231, 0.6361901164054871, 0.6546093821525574, 0.0, 0.1530240774154663, 0.0, 1.2126094102859497, 0.11474587768316269, 0.0, 0.0, 0.0, 0.5329757928848267, 0.11834070831537247, 0.0, 0.0, 0.0, 0.0, 0.49517688155174255, 0.37308937311172485, 0.0, 0.0, 0.7244329452514648, 1.5604287385940552, 0.0, 0.0, 0.849047839641571, 3.346245765686035, 0.9201681017875671, 1.5311988592147827, 0.40950027108192444, 0.0, 0.05821037292480469, 0.0, 0.6386757493019104, 2.4086601734161377, 0.0, 0.0, 0.0, 0.0, 0.04834048077464104, 0.0, 0.5530638098716736, 1.8521811962127686, 0.0, 0.0, 0.4369158148765564, 0.0, 0.1732865273952484, 0.0, 0.07011672854423523, 0.9043356776237488, 0.8143212199211121, 0.22457373142242432, 0.7144923210144043, 0.30752840638160706, 1.4313921928405762, 0.036675311625003815, 1.3362258672714233, 0.037897899746894836, 1.3523902893066406, 0.5329119563102722, 0.2261199653148651, 0.0, 2.044106960296631, 0.4421702027320862, 2.6947712898254395, 1.8096294403076172, 2.98370623588562, 0.0, 0.4754008948802948, 3.8540384769439697, 0.8544381260871887, 0.8371299505233765, 1.0105479955673218, 1.0285321474075317, 0.48182278871536255, 0.7570167183876038, 0.0, 0.27072158455848694, 2.0489559173583984, 0.33231568336486816, 0.0, 0.0, 0.08799964189529419, 0.30721190571784973, 0.0, 0.6217242479324341, 0.0, 0.9204525947570801, 0.0, 0.11418246477842331, 0.21670092642307281, 0.0, 2.011657953262329, 0.0, 0.0, 0.0, 0.5740302801132202, 2.492286443710327, 0.5358629822731018, 2.1077561378479004, 2.0576491355895996]]


#### Here we do additional testing from the batch prediction table in the prior notebook

Copy/paste example embeddings from the table for testing - example seen below:

![](img/get_embeddd_bp.png)

In [203]:
get_matches_public_endpoint(test_playlist2,
                            5, 
                            endpoint_address, 
                            INDEX_ENDPOINT_ID)

{'nearestNeighbors': [{'id': '0',
   'neighbors': [{'datapoint': {'datapointId': 'spotify:track:79YPGYriJOdW3J7rE5cqVW',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 76.4923324584961},
    {'datapoint': {'datapointId': 'spotify:track:46YS8p5wWLOpwhMM2PuuFs',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 76.32144165039062},
    {'datapoint': {'datapointId': 'spotify:track:65WpN0fsvPD7MXzwmI42V7',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 76.19010925292969},
    {'datapoint': {'datapointId': 'spotify:track:24ISA6avNm8tsgDzQsilio',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 76.16402435302734},
    {'datapoint': {'datapointId': 'spotify:track:6U03i3nfypIQdtgRF89lGr',
      'crowdingTag': {'crowdingAttribute': '0'}},
     'distance': 76.1233901977539}]}]}

In [None]:
# ## Cleanup
# from google.cloud import aiplatform_v1beta1 #needed for matching engine calls

# REGION = 'us-central1'
# ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)

# index_client = aiplatform_v1beta1.IndexServiceClient(
#     client_options=dict(api_endpoint=ENDPOINT)
# )

# index_client.delete_index(name='deployed_spotify_v1')
# index_client.delete_index(name=INDEX_BRUTE_FORCE_RESOURCE_NAME)
# index_endpoint_client.delete_index_endpoint(name=INDEX_ENDPOINT_NAME)