# Deploy your Tensorflow pretrained Model to SageMaker Endpoint in VPC

In this notebook we will deploy a pre-trained Tensorflow model to SageMaker Endpoint.

First we will deploy using SageMaker Python SDK, and then we will deploy using `boto3` SDK.

In [None]:
import boto3
import numpy as np
import os
import pandas as pd
import re
import json
import datetime
import time
import sagemaker
from sagemaker.tensorflow import TensorFlowModel
from sagemaker import get_execution_role, Session, image_uris

region = boto3.Session().region_name
role = sagemaker.get_execution_role()
sm_session = sagemaker.Session()
sm_client = boto3.client("sagemaker", region_name=region)

bucket = sm_session.default_bucket()
prefix = "sagemaker/tensorflow-byom"

bucket

## Deploying the `Tensorflow` model using SageMaker Python SDK

In [None]:
model_dir = 's3://aws-ml-blog/artifacts/tensorflow-script-mode-local-model-inference/model.tar.gz'

In [None]:
!pygmentize code/inference.py

In [None]:
env={
        "MY_ENV_VAR_1":"some_value_1",
        "MY_ENV_VAR_2":"some_value_2"
    }

In [None]:
model = TensorFlowModel(
        entry_point='inference.py',
        source_dir='./code',
        role=role,
        model_data=model_dir,
        framework_version='2.8',
        env=env
)

In [None]:
predictor = model.deploy(
        initial_instance_count=1,
        instance_type='ml.c5.xlarge'
)

In [None]:
with open("instances.json", 'r') as f:
    payload = f.read().strip()

In [None]:
predictions = predictor.predict(payload)

In [None]:
predictions

In [None]:
predictor.delete_endpoint()

## Deploying the `Tensorflow` model using `boto3` 

Let's get the data of the model we deployed earlier. This will help us with manually deploying using `boto3`.

In [None]:
model.name

In [None]:
response = sm_client.describe_model(
    ModelName=model.name
)
print(json.dumps(response, indent=4, default=str))

In [None]:
!aws s3 cp {model_dir} .

### Prepare model manually

For Tensorflow, the contents of model.tar.gz should be organized as follows:

 - Model files in the top-level directory

 - Inference script (and any other source files) in a directory named code/ (for more about the inference script, see The SageMaker PyTorch Model Server)

 - Optional requirements file located at code/requirements.txt (for more about requirements files, see Using third-party libraries)

For example:

```
model.tar.gz/
|- 00000000/
  |- assets/
  |- variables/
  |- saved_model.pb
|- code/
  |- inference.py
  |- requirements.txt 
```

In [None]:
!mkdir -p model
!tar -xvf model.tar.gz -C ./model
!rm model.tar.gz

In [None]:
!cp -r code ./model

In [None]:
!ls -rtl ./model

In [None]:
!ls -rtlR ./model

In [None]:
!cd model && tar czvf ../model.tar.gz *

In [None]:
fObj = open("model.tar.gz", "rb")
key = os.path.join(prefix, "model.tar.gz")
boto3.Session().resource("s3").Bucket(bucket).Object(key).upload_fileobj(fObj)
print(os.path.join(bucket, key))

In [None]:
pretrained_model_data = "s3://{}/{}".format(bucket, key)
pretrained_model_data

In [None]:
!aws s3 ls {pretrained_model_data}

In [None]:
instance_type = "ml.c5.xlarge"  
dlc_uri = image_uris.retrieve(
    "tensorflow",
    region,
    version="2.8",
    py_version="py3",
    instance_type=instance_type,
    image_scope="inference",
)
dlc_uri

### Create a Model inside a VPC

Let's find the VPC to use.

Please ensure that an S3 VPC endpoint exists in route table or NAT gateway for the VPC mode and the URL is reachable from within the subnets provided.

In [None]:
ec2 = boto3.resource('ec2')
filters = [{'Name':'tag:Name', 'Values':['<YOUR VPC>']}]
vpc = list(ec2.vpcs.filter(Filters=filters))
default_vpc = vpc[0]
default_vpc_id = default_vpc.id
default_vpc_id

Let's find the subnets in this VPC.

You must create at least two subnets in different availability zones in your private VPC, even if you have only one hosting instance.

In [None]:
client = boto3.client('ec2')
subnets = client.describe_subnets(
    Filters=[
        {
            'Name': 'vpc-id',
            'Values': [
                default_vpc_id
            ]
        }
    ]
)

subnets_list = []
for subnet in subnets['Subnets'] :
    subnets_list.append(subnet['SubnetId'])
    
subnets_list

Security Group must be open for HTTP (port 80) and HTTPS (port 443).

In [None]:
sagemaker_endpoint_sg = "<YOUR SECURITY GROUP>"

In [None]:
model_name = "tensorflow-model-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_model_response = sm_client.create_model(
    ModelName=model_name,
    PrimaryContainer={
        "Image": dlc_uri,
        "Mode": "SingleModel",
        "ModelDataUrl": pretrained_model_data,
        "Environment": {
            "MY_ENV_VAR_1": "some_value_1",
            "MY_ENV_VAR_2": "some_value_2"
        },
    },
    ExecutionRoleArn=role,
    VpcConfig={
        'SecurityGroupIds': [
            sagemaker_endpoint_sg
        ],
        'Subnets': subnets_list
    }
)

create_model_response

### Create an Endpoint Config from the model

In [None]:
endpoint_config_name = "tensorflow-endpoint-config-" + datetime.datetime.now().strftime(
    "%Y-%m-%d-%H-%M-%S"
)

endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTrafficVariant",
            "ModelName": model_name,
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c5.large",
            "InitialVariantWeight": 1,
        },
    ],
)

endpoint_config_response

### Deploy the Endpoint Config to a real-time endpoint

In [None]:
endpoint_name = "tensorflow-endpoint-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

create_endpoint_response

### Wait for Endpoint to be ready

In [None]:
describe_endpoint_response = sm_client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = sm_client.describe_endpoint(EndpointName=endpoint_name)
    print(describe_endpoint_response["EndpointStatus"])
    time.sleep(15)

describe_endpoint_response

### Invoke Endpoint with boto3

In [None]:
runtime = boto3.client("sagemaker-runtime")

In [None]:
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(payload),
    ContentType="application/json",
)

print(response["Body"].read())

## Clean up

In [None]:
sm_client.delete_endpoint(EndpointName=endpoint_name)