# Deploy your Tensorflow pretrained Model to SageMaker Endpoint

In this notebook we will deploy a pre-trained Tensorflow model to SageMaker Endpoint.

First we will deploy using SageMaker Python SDK, and then we will deploy using `boto3` SDK.

When using SageMaker Python SDK `model.deploy` few things happen behind the scenes :
 - A [Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) is created.
 - Create an [endpoint configuration](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) that SageMaker hosting services uses to deploy models. 
 - Create an [endpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) using the endpoint configuration specified in the request. SageMaker uses the endpoint to provision resources and deploy models. 

In [2]:
import boto3
import numpy as np
import os
import pandas as pd
import re
import json
import datetime
import time
import sagemaker
from sagemaker.tensorflow import TensorFlowModel
from sagemaker import get_execution_role, Session, image_uris

region = boto3.Session().region_name
role = sagemaker.get_execution_role()
sm_session = sagemaker.Session()
sm_client = boto3.client("sagemaker", region_name=region)

bucket = sm_session.default_bucket()
prefix = "sagemaker/tensorflow-byom"

bucket

'sagemaker-us-east-1-062083580489'

## Deploying the `Tensorflow` model using SageMaker Python SDK

In [3]:
model_dir = 's3://aws-ml-blog/artifacts/tensorflow-script-mode-local-model-inference/model.tar.gz'

In [4]:
!pygmentize code/inference.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mrequests[39;49;00m


[34mdef[39;49;00m [32mhandler[39;49;00m(data, context):
    [33m"""Handle request.[39;49;00m
[33m    Args:[39;49;00m
[33m        data (obj): the request data[39;49;00m
[33m        context (Context): an object containing request and configuration details[39;49;00m
[33m    Returns:[39;49;00m
[33m        (bytes, string): data to return to client, (optional) response content type[39;49;00m
[33m    """[39;49;00m

    [36mprint[39;49;00m([33m"[39;49;00m[33mhandler start[39;49;00m[33m"[39;49;00m)
    
    my_env_var_1 = os.environ.get([33m'[39;49;00m[33mMY_ENV_VAR_1[39;49;00m[33m'[39;49;00m)
    my_env_var_2 = os.environ.get([33m'[39;49;00m[33mMY_ENV_VAR_2[39;49;00m[33m'[39;49;00m)
    [36mprint[39;49;00m([33mf[39;49;00m[33m"[39;49;00m[33mENV Variable MY_ENV_VAR_1 value: [39;49;00m[33m{my_env_var_1

In [5]:
env={
        "MY_ENV_VAR_1":"some_value_1",
        "MY_ENV_VAR_2":"some_value_2"
    }

In [6]:
model = TensorFlowModel(
        entry_point='inference.py',
        source_dir='./code',
        role=role,
        model_data=model_dir,
        framework_version='2.8',
        env=env
)

In [7]:
%%time
predictor = model.deploy(
        initial_instance_count=1,
        instance_type='ml.c5.xlarge'
)

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


----!CPU times: user 925 ms, sys: 95.4 ms, total: 1.02 s
Wall time: 2min 3s


In [8]:
with open("instances.json", 'r') as f:
    payload = f.read().strip()

In [9]:
predictions = predictor.predict(payload)

In [10]:
predictions

{'predictions': [[-0.857780874,
   -1.81817758,
   2.76210523,
   2.03228235,
   -3.43221879,
   5.26243401,
   -0.494464874,
   -0.795938611,
   -0.56202811,
   -1.50765276],
  [-0.666391969,
   -2.0177176,
   2.43444324,
   2.35294914,
   -3.84540677,
   3.97808194,
   -0.696056128,
   0.48086521,
   0.374087,
   -2.12169862],
  [-0.329581439,
   -2.6850543,
   3.53677535,
   2.94765186,
   -3.2035954,
   4.71845722,
   1.03743863,
   -1.22946465,
   -0.418955058,
   -1.86088586],
  [-0.297454685,
   -1.6222614,
   4.09099722,
   1.18229735,
   -2.3494916,
   3.86405587,
   0.264097422,
   -0.982173324,
   0.60177213,
   -1.56709647]]}

### Listing the Model, endpoint configuration and endpoint created previously with SageMaker Python SDK `model.deploy`

In [11]:
endpoint_name = predictor.endpoint_name
endpoint_name

'tensorflow-inference-2022-11-30-13-11-35-628'

#### Get description of the endpoint

Let's get the description of the endpoint to find the endpoint configuration associated with it 

In [12]:
describe_endpoint_response = sm_client.describe_endpoint(
    EndpointName=endpoint_name
)
describe_endpoint_response

{'EndpointName': 'tensorflow-inference-2022-11-30-13-11-35-628',
 'EndpointArn': 'arn:aws:sagemaker:us-east-1:062083580489:endpoint/tensorflow-inference-2022-11-30-13-11-35-628',
 'EndpointConfigName': 'tensorflow-inference-2022-11-30-13-11-35-628',
 'ProductionVariants': [{'VariantName': 'AllTraffic',
   'DeployedImages': [{'SpecifiedImage': '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu',
     'ResolvedImage': '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference@sha256:de621da9ef516dc5e75ff238c037746e024a50e3dea79f535f8ce4946e435473',
     'ResolutionTime': datetime.datetime(2022, 11, 30, 13, 11, 36, 774000, tzinfo=tzlocal())}],
   'CurrentWeight': 1.0,
   'DesiredWeight': 1.0,
   'CurrentInstanceCount': 1,
   'DesiredInstanceCount': 1}],
 'EndpointStatus': 'InService',
 'CreationTime': datetime.datetime(2022, 11, 30, 13, 11, 36, 76000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2022, 11, 30, 13, 13, 19, 461000, tzinfo=tzlocal(

#### Get endpoint configuration name

In [13]:
endpoint_config_name = describe_endpoint_response["EndpointConfigName"]
endpoint_config_name

'tensorflow-inference-2022-11-30-13-11-35-628'

#### Get description of the endpoint configuration

Let's get the description of the endpoint configuration to find the model associated with it 

In [14]:
describe_endpoint_config_response = sm_client.describe_endpoint_config(
    EndpointConfigName=endpoint_config_name
)
describe_endpoint_config_response

{'EndpointConfigName': 'tensorflow-inference-2022-11-30-13-11-35-628',
 'EndpointConfigArn': 'arn:aws:sagemaker:us-east-1:062083580489:endpoint-config/tensorflow-inference-2022-11-30-13-11-35-628',
 'ProductionVariants': [{'VariantName': 'AllTraffic',
   'ModelName': 'tensorflow-inference-2022-11-30-13-11-34-971',
   'InitialInstanceCount': 1,
   'InstanceType': 'ml.c5.xlarge',
   'InitialVariantWeight': 1.0}],
 'CreationTime': datetime.datetime(2022, 11, 30, 13, 11, 35, 750000, tzinfo=tzlocal()),
 'ResponseMetadata': {'RequestId': '02df9507-e6c0-40cb-8b15-e5f2f4317efa',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '02df9507-e6c0-40cb-8b15-e5f2f4317efa',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '417',
   'date': 'Wed, 30 Nov 2022 13:15:34 GMT'},
  'RetryAttempts': 0}}

#### Get model name

In [15]:
model_name = describe_endpoint_config_response["ProductionVariants"][0]["ModelName"]
model_name

'tensorflow-inference-2022-11-30-13-11-34-971'

#### Get description of the model

Describing the model, we will be able to view Docker image used, Model Location in S3, and environment variables.

In [16]:
describe_model_response = sm_client.describe_model(
    ModelName=model_name
)
describe_model_response

{'ModelName': 'tensorflow-inference-2022-11-30-13-11-34-971',
 'PrimaryContainer': {'Image': '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu',
  'Mode': 'SingleModel',
  'ModelDataUrl': 's3://sagemaker-us-east-1-062083580489/tensorflow-inference-2022-11-30-13-11-33-435/model.tar.gz',
  'Environment': {'MY_ENV_VAR_1': 'some_value_1',
   'MY_ENV_VAR_2': 'some_value_2'}},
 'ExecutionRoleArn': 'arn:aws:iam::062083580489:role/service-role/AmazonSageMaker-ExecutionRole-20190829T190746',
 'CreationTime': datetime.datetime(2022, 11, 30, 13, 11, 35, 507000, tzinfo=tzlocal()),
 'ModelArn': 'arn:aws:sagemaker:us-east-1:062083580489:model/tensorflow-inference-2022-11-30-13-11-34-971',
 'EnableNetworkIsolation': False,
 'ResponseMetadata': {'RequestId': 'c9f55dc6-cd59-4bea-8b22-ef4c2b0b5511',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'c9f55dc6-cd59-4bea-8b22-ef4c2b0b5511',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '654',
  

#### Model name

In [17]:
describe_model_response["ModelName"]

'tensorflow-inference-2022-11-30-13-11-34-971'

#### Model Docker image

In [18]:
describe_model_response["PrimaryContainer"]["Image"]

'763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu'

Model S3 location

In [19]:
describe_model_response["PrimaryContainer"]["ModelDataUrl"]

's3://sagemaker-us-east-1-062083580489/tensorflow-inference-2022-11-30-13-11-33-435/model.tar.gz'

## Clean up

We will now delete the endpoint. Using SageMaker Python SDK `predictor.delete_endpoint()` will also delete the endpoint configuration  

In [20]:
predictor.delete_endpoint()

## Deploying the `Tensorflow` model using `boto3` 

Let's get the data of the model we deployed earlier. This will help us with manually deploying using `boto3`.

In [21]:
model.name

'tensorflow-inference-2022-11-30-13-11-34-971'

In [22]:
response = sm_client.describe_model(
    ModelName=model.name
)
print(json.dumps(response, indent=4, default=str))

{
    "ModelName": "tensorflow-inference-2022-11-30-13-11-34-971",
    "PrimaryContainer": {
        "Image": "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu",
        "Mode": "SingleModel",
        "ModelDataUrl": "s3://sagemaker-us-east-1-062083580489/tensorflow-inference-2022-11-30-13-11-33-435/model.tar.gz",
        "Environment": {
            "MY_ENV_VAR_1": "some_value_1",
            "MY_ENV_VAR_2": "some_value_2"
        }
    },
    "ExecutionRoleArn": "arn:aws:iam::062083580489:role/service-role/AmazonSageMaker-ExecutionRole-20190829T190746",
    "CreationTime": "2022-11-30 13:11:35.507000+00:00",
    "ModelArn": "arn:aws:sagemaker:us-east-1:062083580489:model/tensorflow-inference-2022-11-30-13-11-34-971",
    "EnableNetworkIsolation": false,
    "ResponseMetadata": {
        "RequestId": "fecd1b79-afff-4fbc-bfee-758b8d5b2f3f",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "x-amzn-requestid": "fecd1b79-afff-4fbc-bfee-758b8d5b2

In [23]:
!aws s3 cp {model_dir} .

download: s3://aws-ml-blog/artifacts/tensorflow-script-mode-local-model-inference/model.tar.gz to ./model.tar.gz


### Prepare model manually

For Tensorflow, the contents of model.tar.gz should be organized as follows:

 - Model files in the top-level directory

 - Inference script (and any other source files) in a directory named code/ (for more about the inference script, see The SageMaker PyTorch Model Server)

 - Optional requirements file located at code/requirements.txt (for more about requirements files, see Using third-party libraries)

For example:

```
model.tar.gz/
|- 00000000/
  |- assets/
  |- variables/
  |- saved_model.pb
|- code/
  |- inference.py
  |- requirements.txt 
```

In [24]:
!mkdir -p model
!tar -xvf model.tar.gz -C ./model
!rm model.tar.gz

00000000/
00000000/assets/
00000000/saved_model.pb
tar: 00000000/assets: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: 00000000/saved_model.pb: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
00000000/variables/
00000000/variables/variables.data-00000-of-00001
tar: 00000000/variables/variables.data-00000-of-00001: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
00000000/variables/variables.index
tar: 00000000/variables/variables.index: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: 00000000/variables: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: 00000000: Cannot change ownership to uid 1000, gid 1000: Operation not permitted
tar: Exiting with failure status due to previous errors


In [25]:
!cp -r code ./model

In [26]:
!ls -rtl ./model

total 8
drwxrwxr-x 4 root root 6144 Nov 20  2020 00000000
drwxr-xr-x 2 root root 6144 Nov 30 12:46 code


In [27]:
!ls -rtlR ./model

./model:
total 8
drwxrwxr-x 4 root root 6144 Nov 20  2020 00000000
drwxr-xr-x 2 root root 6144 Nov 30 12:46 code

./model/00000000:
total 56
-rw-r--r-- 1 root root 45994 Nov 20  2020 saved_model.pb
drwxrwxr-x 2 root root  6144 Nov 20  2020 variables
drwxrwxr-x 2 root root  6144 Nov 20  2020 assets

./model/00000000/variables:
total 10832
-rw-r--r-- 1 root root      515 Nov 20  2020 variables.index
-rw-r--r-- 1 root root 11084770 Nov 20  2020 variables.data-00000-of-00001

./model/00000000/assets:
total 0

./model/code:
total 8
-rw-r--r-- 1 root root    6 Nov 30 13:16 requirements.txt
-rw-r--r-- 1 root root 1637 Nov 30 13:16 inference.py


In [28]:
!cd model && tar czvf ../model.tar.gz *

00000000/
00000000/variables/
00000000/variables/variables.data-00000-of-00001
00000000/variables/variables.index
00000000/saved_model.pb
00000000/assets/
code/
code/requirements.txt
code/inference.py


In [29]:
fObj = open("model.tar.gz", "rb")
key = os.path.join(prefix, "model.tar.gz")
boto3.Session().resource("s3").Bucket(bucket).Object(key).upload_fileobj(fObj)
print(os.path.join(bucket, key))

sagemaker-us-east-1-062083580489/sagemaker/tensorflow-byom/model.tar.gz


In [30]:
pretrained_model_data = "s3://{}/{}".format(bucket, key)
pretrained_model_data

's3://sagemaker-us-east-1-062083580489/sagemaker/tensorflow-byom/model.tar.gz'

In [31]:
!aws s3 ls {pretrained_model_data}

2022-11-30 13:16:38   10291453 model.tar.gz


In [32]:
instance_type = "ml.c5.xlarge"  
dlc_uri = image_uris.retrieve(
    "tensorflow",
    region,
    version="2.8",
    py_version="py3",
    instance_type=instance_type,
    image_scope="inference",
)
dlc_uri

'763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu'

### Create a Model

We will use `boto3` to create a model.

In [33]:
model_name = "tensorflow-model-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_model_response = sm_client.create_model(
    ModelName=model_name,
    PrimaryContainer={
        "Image": dlc_uri,
        "Mode": "SingleModel",
        "ModelDataUrl": pretrained_model_data,
        "Environment": {
            "MY_ENV_VAR_1": "some_value_1",
            "MY_ENV_VAR_2": "some_value_2"
        },
    },
    ExecutionRoleArn=role,
)

create_model_response

{'ModelArn': 'arn:aws:sagemaker:us-east-1:062083580489:model/tensorflow-model-2022-11-30-13-16-41',
 'ResponseMetadata': {'RequestId': '638415f5-e34a-401c-bb26-60c8c9100b42',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '638415f5-e34a-401c-bb26-60c8c9100b42',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '98',
   'date': 'Wed, 30 Nov 2022 13:16:41 GMT'},
  'RetryAttempts': 0}}

### Create an endpoint configuration from the model

We will use `boto3` to create an endpoint configuration with the model we created.

In [34]:
endpoint_config_name = "tensorflow-endpoint-config-" + datetime.datetime.now().strftime(
    "%Y-%m-%d-%H-%M-%S"
)

endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTrafficVariant",
            "ModelName": model_name,
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c5.large",
            "InitialVariantWeight": 1,
        },
    ],
)

endpoint_config_response

{'EndpointConfigArn': 'arn:aws:sagemaker:us-east-1:062083580489:endpoint-config/tensorflow-endpoint-config-2022-11-30-13-16-45',
 'ResponseMetadata': {'RequestId': '4498f951-670c-4213-9d28-93b22a979fa1',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4498f951-670c-4213-9d28-93b22a979fa1',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '127',
   'date': 'Wed, 30 Nov 2022 13:16:44 GMT'},
  'RetryAttempts': 0}}

### Deploy the endpoint configuration to a real-time endpoint

We will use `boto3` to create an endpoint with the endpoint configuration we created.

In [35]:
endpoint_name = "tensorflow-endpoint-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

create_endpoint_response

{'EndpointArn': 'arn:aws:sagemaker:us-east-1:062083580489:endpoint/tensorflow-endpoint-2022-11-30-13-16-53',
 'ResponseMetadata': {'RequestId': '51536656-bc40-4e49-a435-8bfc7d743012',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '51536656-bc40-4e49-a435-8bfc7d743012',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '107',
   'date': 'Wed, 30 Nov 2022 13:16:53 GMT'},
  'RetryAttempts': 0}}

### Wait for Endpoint to be ready

It takes few minutes for the endpoint to be ready.  

In [36]:
describe_endpoint_response = sm_client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = sm_client.describe_endpoint(EndpointName=endpoint_name)
    print(describe_endpoint_response["EndpointStatus"])
    time.sleep(15)

describe_endpoint_response

Creating
Creating
Creating
Creating
Creating
Creating
Creating
Creating
InService


{'EndpointName': 'tensorflow-endpoint-2022-11-30-13-16-53',
 'EndpointArn': 'arn:aws:sagemaker:us-east-1:062083580489:endpoint/tensorflow-endpoint-2022-11-30-13-16-53',
 'EndpointConfigName': 'tensorflow-endpoint-config-2022-11-30-13-16-45',
 'ProductionVariants': [{'VariantName': 'AllTrafficVariant',
   'DeployedImages': [{'SpecifiedImage': '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu',
     'ResolvedImage': '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference@sha256:de621da9ef516dc5e75ff238c037746e024a50e3dea79f535f8ce4946e435473',
     'ResolutionTime': datetime.datetime(2022, 11, 30, 13, 16, 54, 413000, tzinfo=tzlocal())}],
   'CurrentWeight': 1.0,
   'DesiredWeight': 1.0,
   'CurrentInstanceCount': 1,
   'DesiredInstanceCount': 1}],
 'EndpointStatus': 'InService',
 'CreationTime': datetime.datetime(2022, 11, 30, 13, 16, 53, 805000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2022, 11, 30, 13, 18, 42, 195000, tzinfo=tzlocal(

### Invoke Endpoint with boto3

In [37]:
runtime = boto3.client("sagemaker-runtime")

In [38]:
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
)

print(response["Body"].read())

b'{\n    "predictions": [[-0.857780576, -1.81817746, 2.76210475, 2.03228283, -3.43221903, 5.26243401, -0.494465321, -0.795939, -0.56202817, -1.50765288], [-0.66639173, -2.01771832, 2.43444324, 2.35294914, -3.84540606, 3.97808194, -0.696055651, 0.480864972, 0.374087, -2.12169838], [-0.32958141, -2.6850543, 3.53677511, 2.9476521, -3.20359588, 4.71845722, 1.03743863, -1.22946429, -0.418955237, -1.86088598], [-0.297454447, -1.62226081, 4.0909977, 1.18229735, -2.34949183, 3.86405587, 0.264097035, -0.982173145, 0.601771891, -1.56709647]\n    ]\n}'


## Clean up

We will now delete the endpoint, then, delete the endpoint configuration, and finally, delete the model

In [39]:
sm_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': 'f5a8f07f-ce5d-420e-b43c-77bc606e4c6a',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f5a8f07f-ce5d-420e-b43c-77bc606e4c6a',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 30 Nov 2022 13:19:36 GMT'},
  'RetryAttempts': 0}}

In [40]:
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

{'ResponseMetadata': {'RequestId': '7aa75a1e-2d25-4a9c-8f30-5bb064ea80f9',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '7aa75a1e-2d25-4a9c-8f30-5bb064ea80f9',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 30 Nov 2022 13:19:37 GMT'},
  'RetryAttempts': 0}}

In [41]:
sm_client.delete_model(ModelName=model_name)

{'ResponseMetadata': {'RequestId': 'ff9c4d08-1287-428e-84d0-f1ea49410713',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ff9c4d08-1287-428e-84d0-f1ea49410713',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 30 Nov 2022 13:19:39 GMT'},
  'RetryAttempts': 0}}