##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Efficient serving

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/efficient_serving"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/efficient_serving.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/efficient_serving.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/efficient_serving.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

[Retrieval models](https://www.tensorflow.org/recommenders/examples/basic_retrieval) are often built to surface a handful of top candidates out of millions or even hundreds of millions of candidates. To be able to react to the user's context and behaviour, they need to be able to do this on the fly, in a matter of milliseconds.

Approximate nearest neighbour search (ANN) is the technology that makes this possible. In this tutorial, we'll show how to use ScaNN - a state of the art nearest neighbour retrieval package - to seamlessly scale TFRS retrieval to millions of items.

## What is ScaNN?

ScaNN is a library from Google Research that performs dense vector similarity search at large scale. Given a database of candidate embeddings, ScaNN indexes these embeddings in a manner that allows them to be rapidly searched at inference time. ScaNN uses state of the art vector compression techniques and carefully implemented algorithms to achieve the best speed-accuracy tradeoff. It can greatly outperform brute force search while sacrificing little in terms of accuracy.

## Building a ScaNN-powered model

To try out ScaNN in TFRS, we'll build a simple MovieLens retrieval model, just as we did in the [basic retrieval](https://www.tensorflow.org/recommenders/examples/basic_retrieval) tutorial. If you have followed that tutorial, this section will be familiar and can safely be skipped.

To start, install TFRS and TensorFlow Datasets:

In [3]:
%pip install tensorflow

[0mNote: you may need to restart the kernel to use updated packages.


In [4]:
%pip install tensorflow-recommenders
%pip install --upgrade tensorflow-datasets
%pip install --upgrade sagemaker

[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.


We also need to install `scann`: it's an optional dependency of TFRS, and so needs to be installed separately.

In [5]:
%pip install scann

[0mNote: you may need to restart the kernel to use updated packages.


Set up all the necessary imports.

In [6]:
import tensorflow

In [7]:
from typing import Dict, Text

import os
import pprint
import tempfile

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [8]:
import tensorflow_recommenders as tfrs

In [9]:
# aws
import boto3
import sagemaker
from sagemaker import KMeans
from sagemaker import get_execution_role
import sagemaker.amazon.common as smac
from sagemaker.tensorflow import TensorFlow
from sagemaker.tensorflow import TensorFlowModel

In [10]:
role = get_execution_role()
session = sagemaker.session.Session()
bucket_name = session.default_bucket()
bucket = 's3://{}'.format(session.default_bucket())
print('default_s3_bucket: {}'.format(bucket))

default_s3_bucket: s3://sagemaker-us-east-1-431615879134


### Model definition

In [11]:
tf_estimator = TensorFlow(
    image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.10.0-cpu-py39-ubuntu20.04-sagemaker',
    entry_point='bf_training.py',
    source_dir = '.',
    role=role,
    instance_count=1, 
    instance_type='ml.m5.xlarge',
    script_mode=True,
#    keep_alive_period_in_seconds=1800,
#     framework_version='2.9.1',
#     py_version='py39'
    )

In [12]:
tf_estimator.fit()

2022-10-18 20:13:01 Starting - Starting the training job...
2022-10-18 20:13:25 Starting - Preparing the instances for trainingProfilerReport-1666123981: InProgress
......
2022-10-18 20:14:32 Downloading - Downloading input data......
2022-10-18 20:15:26 Training - Downloading the training image........[34m2022-10-18 20:16:36.798413: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F[0m
[34mTo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.[0m
[34m2022-10-18 20:16:37.352912: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2022-10-18 20:16:37.358107: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded eve

In [19]:
tf_estimator.model_data

's3://sagemaker-us-east-1-431615879134/tensorflow-training-2022-10-18-20-12-56-647/output/model.tar.gz'

In [14]:
!aws s3 cp {tf_estimator.model_data} ./model/

download: s3://sagemaker-us-east-1-431615879134/tensorflow-training-2022-10-18-20-12-56-647/output/model.tar.gz to model/model.tar.gz


In [15]:
!tar -xvf ./model/model.tar.gz

1/
1/saved_model.pb
1/variables/
1/variables/variables.index
1/variables/variables.data-00000-of-00001
1/assets/


In [16]:
# Load it back; can also be done in TensorFlow Serving.
loaded = tf.saved_model.load('./1/')

In [17]:
  # Pass a user id in, get top predicted movie titles back.
scores, titles = loaded(tf.constant(["42"]))

In [18]:
titles

<tf.Tensor: shape=(1, 10), dtype=string, numpy=
array([[b'Fried Green Tomatoes (1991)', b'Top Gun (1986)',
        b'Beauty and the Beast (1991)',
        b'Hunt for Red October, The (1990)', b'Mary Poppins (1964)',
        b'Ghost (1990)', b'Back to the Future (1985)',
        b'When Harry Met Sally... (1989)', b'Sound of Music, The (1965)',
        b'Benny & Joon (1993)']], dtype=object)>