# E2E recsys with matching engine and TFRS


Simple example, goal being:

    1) Train a Two-Tower model using movielens data
    
    2) Deploy the query model endpoint
    
    3) Save movie embeddings to json, for use in matching engine
    
    
#### Note on VPC Pairing - insturctions for in-notebook pairing [here](console.cloud.google.com/networking/networks/list?authuser=4&project=qwiklabs-gcp-01-782d5edf8d15)
    
First we will create a user-managed notebook behind the already created peered VPC network used for Matching Engine. Select tensorflow enterprise 2.6 with a T4 GPU


![](./imgs/create-workbench.png)


##### Be sure to create the notebook in the peered network


![](./imgs/network-create.png)

    
The next notebook will connect matching engine with the query endpoint for a simple recommender system

Run the below pip install one time to install tensorflow-recommenders

In [40]:
# !echo Y | pip uninstall tensorflow
!pip install google-cloud-aiplatform tensorflow-recommenders==0.6.0 --user

Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.19.1-py2.py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m28.2 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Collecting google-cloud-resource-manager<3.0.0dev,>=1.3.3
  Downloading google_cloud_resource_manager-1.6.3-py2.py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.8/233.8 kB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-cloud-bigquery<3.0.0dev,>=1.15.0
  Downloading google_cloud_bigquery-2.34.4-py2.py3-none-any.whl (206 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m206.6/206.6 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: google-cloud-resource-manager, google-cloud-bigquery, google-cloud-aiplatform
[0mSuccessfully installed google-cloud-aiplatform-1.19.1 google-cloud-bigquery-2.34.4 google-cloud-resource-manager-1.6.3


### Important - restart the kernel after installing

# Train a 2 tower model

In [1]:
from typing import Dict, Text

import json

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

# disable INFO and DEBUG logging everywhere
import logging

from google.protobuf import struct_pb2

import pandas as pd


logging.disable(logging.WARNING)

DIMENSIONS = 64 # this is how large the embedding dimensions get


# Ratings data.
ratings = tfds.load('movielens/100k-ratings', split="train")
# Features of all the available movies.
movies = tfds.load('movielens/100k-movies', split="train")

# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_id": tf.strings.to_number(x["movie_id"]),
    "user_id": tf.strings.to_number(x["user_id"])
})
movies = movies.map(lambda x: tf.strings.to_number(x["movie_id"]))

# Build a model.
class Model(tfrs.Model):

    def __init__(self):
        super().__init__()

        # Set up user representation.
        self.user_model = tf.keras.Sequential([
            tf.keras.layers.Embedding(
            input_dim=2000, output_dim=DIMENSIONS),
            ])
        # Set up movie representation.
        self.item_model = tf.keras.Sequential([
            tf.keras.layers.Embedding(
            input_dim=2000, output_dim=DIMENSIONS),
        ])
        # Set up a retrieval task and evaluation metrics over the
        # entire dataset of candidates.
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies.batch(128).map(self.item_model)
            )
        )

    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

        user_embeddings = self.user_model(features["user_id"])
        movie_embeddings = self.item_model(features["movie_id"])

        return self.task(user_embeddings, movie_embeddings)


model = Model()
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Randomly shuffle data and split between train and test.
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

# Train.
model.fit(train.batch(1024), epochs=5)

# Evaluate.
model.evaluate(test.batch(1024), return_dict=True)

2022-12-12 15:06:35.357776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 15:06:35.369725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 15:06:35.371416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 15:06:35.374453: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


{'factorized_top_k/top_1_categorical_accuracy': 4.999999873689376e-05,
 'factorized_top_k/top_5_categorical_accuracy': 0.0003000000142492354,
 'factorized_top_k/top_10_categorical_accuracy': 0.0015999999595806003,
 'factorized_top_k/top_50_categorical_accuracy': 0.05550000071525574,
 'factorized_top_k/top_100_categorical_accuracy': 0.1505499929189682,
 'loss': 3468.936279296875,
 'regularization_loss': 0,
 'total_loss': 3468.936279296875}

### Set your variables

Run `!gcloud auth login` in terminal

In [2]:
import os

PROJECT = 'matchine-engine' #set to your own
NETWORK_NAME = 'matching-engine-vpc' #same as VPC peered network

### Create a bucket to store our embeddings and models
BUCKET = 'gs://jwortz-bucket' # TODO - change for each user
EMBEDDINGS = os.path.join(BUCKET, 'embeddings')
QUERY_MODEL = os.path.join(BUCKET, 'query_model')
REGION = 'us-central1'

## Gets an auth token with the Parent variable
PROJECT_ID = PROJECT
AUTH_TOKEN = !gcloud auth print-access-token
PROJECT_NUMBER = ! gcloud projects list --filter="$PROJECT" --format="value(PROJECT_NUMBER)"
PROJECT_NUMBER = PROJECT_NUMBER[0]

PARENT = "projects/{}/locations/{}".format(PROJECT_ID, REGION)
PARENT

'projects/matchine-engine/locations/us-central1'

In [30]:
# run one time to create your bucket
!gsutil mb -l $REGION $BUCKET

Creating gs://jwortz-bucket/...


In [31]:
# Save the query/user model

model.user_model.save(QUERY_MODEL)

In [3]:
# Make sure it saved
!gsutil ls $QUERY_MODEL

gs://jwortz-bucket/query_model/
gs://jwortz-bucket/query_model/keras_metadata.pb
gs://jwortz-bucket/query_model/saved_model.pb
gs://jwortz-bucket/query_model/assets/
gs://jwortz-bucket/query_model/variables/


In [4]:
from google.cloud import aiplatform

model_gcp = aiplatform.Model.upload(
        display_name="Movielens User Query Model",
        artifact_uri=QUERY_MODEL,
        serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest',
        description="Top of the query tower, meant to return an embedding for each user instance",
    )

In [5]:
#validate the model type output
model_gcp

<google.cloud.aiplatform.models.Model object at 0x7f25e814ec50> 
resource name: projects/807304860730/locations/us-central1/models/3397194061688340480

In [6]:
import time

In [7]:
endpoint = aiplatform.Endpoint.create(
    display_name="Movielens Model Endpoint",
    project=PROJECT,
    location=REGION,
)

In [8]:
deployment = model_gcp.deploy(
    endpoint=endpoint,
    deployed_model_display_name="Movielens User Query Model",
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=2,
    accelerator_type=None,
    accelerator_count=0,
    sync=False,
)


In [9]:
deployment

<google.cloud.aiplatform.models.Endpoint object at 0x7f26cacfec50> 
resource name: projects/807304860730/locations/us-central1/endpoints/6840374097696784384

## Save the embeddings for the movie dataset

### Write embeddings to local storage
Following this format for Matching Engine
https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/matching_engine/sdk_matching_engine_for_indexing.ipynb


In [10]:
movie_embs = movies.batch(1000).map(lambda x: [x, model.item_model(x)]).unbatch() #process 1000 at a time then flatten it back

In [11]:
# Write to local disk
with open("movie_embeddings.json", 'w') as f:
    for movie_id, movie_emb in movie_embs:
        # print(movie_id.numpy(), movie_emb.numpy())
        f.write('{"id":"' + str(movie_id.numpy()) + '","embedding":[' + ",".join(str(x) for x in list(movie_emb.numpy())) + ']}')
        f.write("\n")

You should now see .json data as required by matching engine
![](imgs/jsonl.png)

### Upload the data to GCS
Only remove if you have issues uploading the json file

In [12]:
!gsutil cp movie_embeddings.json $EMBEDDINGS/movie_embeddings.json

Copying file://movie_embeddings.json [Content-Type=application/json]...
/ [1 files][  1.2 MiB/  1.2 MiB]                                                
Operation completed over 1 objects/1.2 MiB.                                      


# Next we will deploy our movie inidicies. With Matching Engine
* Create an index (from the `json` files)
* Create and endpoint
* Deploy the index to the endpoint so you can perform vector search

In [13]:
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET)

## Set the Nearest Neighbor Options

See here for tips on [tuning the index](https://cloud.google.com/vertex-ai/docs/matching-engine/using-matching-engine#tuning_the_index)

From the paper - here's the rough idea

1. (Initialization Step) Select a dictionary C(m) bysampling from {x(m) 1 , . . . x (m) n }. 
2. (Partition Assignment Step) For each datapoint xi , update x˜i by using the value of c ∈ C (m) that minimizes the anisotropic loss of ˜xi.
3. (Codebook Update Step) Optimize the loss function over all codewords in all dictionaries while keeping every dictionaries partitions constant.
4. Repeat Step 2 and Step 3 until convergence to a fixed point or maximum number of iteration is reached.


### Relating the algorithm to the parameters:

* `leafNodeEmbeddingCount` -> Number of embeddings on each leaf node. The default value is 1000 if not set.
* `leafNodesToSearchPercent` -> The default percentage of leaf nodes that any query may be searched. Must be in range 1-100, inclusive. The default value is 10 (means 10%) if not set.
* `approximateNeighborsCount` -> The default number of neighbors to find through approximate search before exact reordering is performed. Exact reordering is a procedure where results returned by an approximate search algorithm are reordered via a more expensive distance computation.
* `distanceMeasureType` -> DOT_PRODUCT_DISTANCE is default - COSINE, L1 and L2^2 is available

Other best practices from our PM team:
```
Start from leafNodesToSearchPercent=5 and approximateNeighborsCount=10 * k

use default values for others.

measure performance and recall and change those 2 parameters accordingly.
```

In [21]:
DISPLAY_NAME = "me-index-endpoint-movielens-demo"

tree_ah_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=DISPLAY_NAME,
    contents_delta_uri=EMBEDDINGS,
    dimensions=DIMENSIONS,
    approximate_neighbors_count=150,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Movielens Demo2",
    labels={"label_name": "label_value"},
    sync=False
)

## Note on the advantages of the algorithm

[link](https://arxiv.org/pdf/1908.10396.pdf)

```However, it is easy to see that not all pairs of (x, q) are equally important. The approximation error on the pairs which have a high inner product is far more important since they are likely to be among the top ranked pairs and can greatly affect the search result, while for the pairs whose inner product is low the approximation error matters much less. In other words, for a given datapoint x, we should quantize it with a bigger focus on its error with those queries which have high inner product with x. See Figure 1 for the illustration.```


![](imgs/algo.png)



### Use existing to skip creation time

```python
tree_ah_index = aiplatform.MatchingEngineIndex('6576460520704966656')
tree_ah_index
```

### Save the name of the endpoint

In [34]:
INDEX_RESOURCE_NAME = tree_ah_index.result().name
INDEX_RESOURCE_NAME

AttributeError: 'MatchingEngineIndex' object has no attribute 'result'

Debugging tool in case you run into issues. Example usage below.
`!gcloud beta ai operations describe 4122851463774863360 --index=7253099976438317056 --project=$PROJECT`

## Create Index Endpoint and Deploy Index

In [16]:
VPC_NETWORK_NAME = "projects/{}/global/networks/{}".format(PROJECT_NUMBER, NETWORK_NAME)
VPC_NETWORK_NAME

'projects/807304860730/global/networks/matching-engine-vpc'

In [17]:
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="index_endpoint_for_demo",
    description="endpoint for movielens",
    network=VPC_NETWORK_NAME,
)

In [33]:
INDEX_ENDPOINT_NAME = my_index_endpoint.resource_name
INDEX_ENDPOINT_NAME

'projects/807304860730/locations/us-central1/indexEndpoints/8211267185440456704'

### Deploy the index to the endpoint

In [39]:
DEPLOYED_INDEX_ID = "tree_movielens" #change to unique id or use this one

## note you can comment out if you are not creating your own index endpoint

my_index_endpoint = my_index_endpoint.deploy_index(
    index=tree_ah_index, deployed_index_id=DEPLOYED_INDEX_ID
)

my_index_endpoint.deployed_indexes

TimeoutError: Operation did not complete within the designated timeout of 900 seconds.

## Other quick notes on ME while we wait for deployment

[link](https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology)

Instead of comparing vectors one by one, you could use the approximate nearest neighbor (ANN) approach to improve search times. Many ANN algorithms use vector quantization (VQ), in which you split the vector space into multiple groups, define "codewords" to represent each group, and search only for those codewords. This VQ technique dramatically enhances query speeds and is the essential part of many ANN algorithms, just like indexing is the essential part of relational databases and full-text search engines.

![](imgs/vectorQuant.gif)


As you may be able to conclude from the diagram above, as the number of groups in the space increases the speed of the search decreases and the accuracy increases.  Managing this trade-off — getting higher accuracy at shorter latency — has been a key challenge with ANN algorithms. 

Last year, Google Research announced ScaNN, a new solution that provides state-of-the-art results for this challenge. With ScaNN, they introduced a new VQ algorithm called anisotropic vector quantization:

![](imgs/Loss_Types.max-1000x1000.png)

Anisotropic vector quantization uses a new loss function to train a model for VQ for an optimal grouping to capture farther data points (i.e. higher inner product) in a single group. With this idea, the new algorithm gives you higher accuracy at lower latency, as you can see in the benchmark result below (the violet line): 

![](imgs/speedvsaccuracy.max-1600x1600.png)


# Connect Matching Engine and The User Model Into a Recommendation System

This will bring it all together by incorporating the prediction endpoint 

In [41]:
# establish index_endpoint -IMPORTANT for constructing already created endpoints/indicies/etc...
ME_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(INDEX_ENDPOINT_NAME)

In [26]:
USER = 627.0 #pick anyone 0-100k to see watch history and recommendations
NUM_NEIGH=3

emb_627 = endpoint.predict([[USER]]) #prediction from the saved model
emb_627 = emb_627.predictions[0]
emb_627 # we should get our user xxx embedding @ dim len

[[-0.162882209,
  -0.335985959,
  0.137335211,
  0.0739337727,
  -0.432961673,
  -0.394063562,
  0.34763962,
  -0.080462493,
  -0.424611747,
  0.217125714,
  -0.45972839,
  0.494097382,
  0.298708528,
  -0.17870906,
  0.617574215,
  -0.556233346,
  -0.364824772,
  0.218952522,
  0.604446352,
  -0.316697538,
  -0.873303175,
  -0.162815481,
  0.500744939,
  0.257559836,
  -0.100139424,
  0.667596281,
  0.108864874,
  0.378669232,
  -0.337707669,
  0.0600555874,
  -0.129736587,
  -0.336009651,
  -0.0422365814,
  -0.399091274,
  0.267339408,
  0.0344658494,
  0.353300542,
  -0.0192649253,
  0.0923104733,
  -0.102977142,
  0.126158774,
  0.190066129,
  -0.167369932,
  0.252656192,
  0.048801031,
  0.340473771,
  -0.397580266,
  -0.169092372,
  0.0312889926,
  0.424314827,
  0.298339814,
  -0.315201908,
  0.464916021,
  0.0904096663,
  0.603862882,
  -0.147781983,
  0.492935568,
  -0.571237922,
  0.306813419,
  0.0781588256,
  0.566045702,
  0.197490379,
  -0.198064387,
  -0.410030544]]

In [42]:
ME_index_endpoint.match(queries=emb_627, deployed_index_id=DEPLOYED_INDEX_ID, num_neighbors=10)

[[MatchNeighbor(id='394.0', distance=5.03569221496582),
  MatchNeighbor(id='1075.0', distance=4.747988700866699),
  MatchNeighbor(id='1299.0', distance=4.696071624755859),
  MatchNeighbor(id='1410.0', distance=4.3028106689453125),
  MatchNeighbor(id='1311.0', distance=4.258786201477051),
  MatchNeighbor(id='725.0', distance=4.23414945602417),
  MatchNeighbor(id='1508.0', distance=4.20390510559082),
  MatchNeighbor(id='542.0', distance=4.15582275390625),
  MatchNeighbor(id='1538.0', distance=4.070896148681641),
  MatchNeighbor(id='1531.0', distance=4.002253532409668)]]

#### Create movie lookup tables
Get what given user has rated highly, and what is being recommended

In [43]:
! wget https://files.grouplens.org/datasets/movielens/ml-100k/u.item

--2022-12-12 16:36:48--  https://files.grouplens.org/datasets/movielens/ml-100k/u.item
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 236344 (231K)
Saving to: ‘u.item.2’


2022-12-12 16:36:48 (3.06 MB/s) - ‘u.item.2’ saved [236344/236344]



In [44]:
# Quick sidetour - create movie lookup dictionary
movie_names = pd.read_csv('u.item', delimiter='|' , 
                          encoding='latin-1', 
                          usecols=(0,1),
                          names = ['movie_id', 'title'])
movielookup = movie_names.to_dict()['title']

In [52]:
print("Movies Watched:")
for i, watched_movie in enumerate(ratings.filter(lambda x: x['user_id']==USER)):
    if i >= 10: #limit to top n
        break
    else:
        key = watched_movie['movie_id'].numpy()
        print(f"""\n 
              {i}: {movielookup[key]}"""
             )

Movies Watched:

 
              0: Piano, The (1993)

 
              1: Star Trek: The Wrath of Khan (1982)

 
              2: Return of the Jedi (1983)

 
              3: Star Trek VI: The Undiscovered Country (1991)

 
              4: Star Trek III: The Search for Spock (1984)

 
              5: Four Rooms (1995)

 
              6: Addams Family Values (1993)

 
              7: Arsenic and Old Lace (1944)

 
              8: Pinocchio (1940)

 
              9: Dead Poets Society (1989)


In [48]:
query_vector = emb_627


ann_response = ME_index_endpoint.match(
    deployed_index_id=DEPLOYED_INDEX_ID, 
    queries=query_vector, 
    num_neighbors=NUM_NEIGH
)

print("Recommended movie IDs:", ann_response)

Recommended movie IDs: [[MatchNeighbor(id='394.0', distance=5.03569221496582), MatchNeighbor(id='1075.0', distance=4.747988700866699), MatchNeighbor(id='1299.0', distance=4.696071624755859)]]


In [53]:
# look at the recommended movies vs the viewed for that user
print("Movies recommended: ")
for i, match in enumerate(ann_response[0]):
    key = int(float(match.id))
    print(f"""\n 
          {i}: {movielookup[key]} (distance: {match.distance})"""
         )


Movies recommended: 

 
          0: Robin Hood: Men in Tights (1993) (distance: 5.03569221496582)

 
          1: Pagemaster, The (1994) (distance: 4.747988700866699)

 
          2: 'Til There Was You (1997) (distance: 4.696071624755859)


### Bonus topic

Streaming upserts and compaction details can be found on the official guide [here](https://cloud.google.com/vertex-ai/docs/matching-engine/update-rebuild-index#update_an_index_using_streaming_updates)

### Cleaning up
To clean up all Google Cloud resources used in this project, you can delete the Google Cloud project you used for the tutorial. You can also manually delete resources that you created by running the following code.

In [None]:
# Force undeployment of indexes and delete endpoint
my_index_endpoint.delete(force=True)
tree_ah_index.delete()