# SageMaker Inference

The purpose of this notebook is to show how inferences can be made against the SageMaker endpoint. This notebook will show three different ways to make predictions.

1. Using the sagemaker client from the boto3 library 
2. Using API Gateway and Lambda
3. Using flask

## Prerequisite [VERY IMPORTANT - YOUR LAB WILL NOT WORK WITHOUT THIS]

1. This notebook requires [Lab 2](https://sagemaker-immersionday.workshop.aws/en/lab2.html) from the SageMaker Immersion day has been completed and the SageMaker endpoint is running.

2. You need to add IAM policy to the SageMaker studio user to be able to create API Gateway, Lambda, and IAM roles and policies. 
    a. Navigate to SageMaker Studio and look up the IAM role which is being used.
    b. Navigate to IAM and search for the Role from step a.
    c. Attach the AdministratorAccess managed policy to the role.

In [None]:
import boto3
import time
import json
import pandas as pd
import numpy as np
import zipfile
import requests
import sagemaker

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()
prefix = 'sagemaker/DEMO-xgboost-dm'
test_path = f"s3://{bucket}/{prefix}/test"

!aws s3 cp $test_path/test_x.csv /tmp/test_x.csv
!aws s3 cp $test_path/test_y.csv /tmp/test_y.csv

X_test = pd.read_csv('/tmp/test_x.csv', names=[f'{i}' for i in range(59)])
y_test = pd.read_csv('/tmp/test_y.csv', names=['y'])

pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 20)  
X_test.drop(X_test.columns[0], axis=1, inplace=True)
X_test.head()

## 🧠 1. Inference with boto3

The sagemaker client from boto3 can be used to make predictions against a sagemaker endpoint. 


In [None]:
sm_client = boto3.client('sagemaker')
endpoint = sm_client.list_endpoints(SortBy='CreationTime')['Endpoints'][0]['EndpointName']
print(f'endpoint name: {endpoint}')

def make_prediction(X_test, y_test, row, endpoint=endpoint):
    
    # get individual row from dataframe and format as csv string
    sample = X_test.iloc[row].values
    sample = np.array2string(sample, separator=',')
    sample = sample.strip('[').strip(']').lstrip(' ')
    sample = sample.replace('\n', '')
    sample.encode('utf-8')
    
    # create sagemaker runtime client and invoke sagemaker endpoint
    smr = boto3.client('sagemaker-runtime')
    r = smr.invoke_endpoint(EndpointName=endpoint, Body=sample, ContentType='text/csv')
    prediction = r['Body'].read().decode('utf-8')
    prediction = float(prediction)
    
    # round to 4 decimal places
    prediction = round(prediction, 4)
    return prediction

Let's make our first prediction

In [None]:
make_prediction(X_test, y_test, 260)

Since the prediction is greater than 0.5 the model is predicting a 1. Let's check the label for sample 260. The label is also a 1 (customer will purchase a CD investment).

In [None]:
y_test.iloc[260].values

### IAM Role and Policy for our Lambda

In order to create a lambda function we must first create a lambda role with attached policies. This grants lambda the permissions it needs to call other AWS services.

In [None]:
awsAccount = boto3.client('sts').get_caller_identity().get('Account')
awsRegion = boto3.session.Session().region_name
FunctionName = 'sagemaker-inference-xgboost'

lambda_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": None
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                None
            ]
        },
        {
            "Effect": "Allow",
            "Action": "sagemaker:InvokeEndpoint",
            "Resource": None,
        }
    ]
}

lambda_policy['Statement'][0]['Resource'] = f'arn:aws:logs:{awsRegion}:{awsAccount}:*'
lambda_policy['Statement'][1]['Resource'][0] = f'arn:aws:logs:{awsRegion}:{awsAccount}:log-group:/aws/lambda/{FunctionName}:*'
lambda_policy['Statement'][2]['Resource'] = f'arn:aws:sagemaker:{awsRegion}:{awsAccount}:endpoint/{endpoint}'


iam_client = boto3.client('iam')
response = iam_client.create_policy(
    PolicyName = 'boto3-lambda-policy',
    PolicyDocument = json.dumps(lambda_policy)
)


policy_arn = response['Policy']['Arn']

trust_json = {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "sagemaker.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

role_name = 'boto3-lambda-role'

role_response = iam_client.create_role(
    RoleName=role_name,
    AssumeRolePolicyDocument=json.dumps(trust_json),
    Description='This is a test role',
)


role_arn = role_response['Role']['Arn']

attach_response = iam_client.attach_role_policy(
    RoleName=role_name,
    PolicyArn=policy_arn
)


### Lambda Setup

We are writing our lambda to a file, then zipping the file, and creating our lambda function with the zip. 

In [None]:
%%writefile sagemaker_inference_function.py

import json
import boto3

smr = boto3.client('sagemaker-runtime')

def lambda_handler(event, context):
    
    # check event type and print event
    print(f'event data type: {type(event)}')
    print(event)
    
    # grab body and convert to dict
    body = json.loads(event['body'])
    
    # encode the sample to bytes
    data = body['data'].encode('utf-8')
    
    # check data types
    print(f'body data type: {type(body)}')
    print(body)

    print(f'data data type: {type(data)}')
    print(data)

    # make prediction on sagemaker endpoint
    r = smr.invoke_endpoint(EndpointName='MyEndPoint', Body=data, ContentType='text/csv')
    prediction = r['Body'].read().decode('utf-8')
    print(prediction)
    response = {'prediction': prediction}
    
    # return prediction in body
    return {
        'statusCode': 200,
        'body': json.dumps(response)
    }


In [None]:
!sed -i 's/MyEndPoint/'$endpoint'/g' sagemaker_inference_function.py

In [None]:
with zipfile.ZipFile('sm-inference-package.zip', 'w') as z:
    z.write('sagemaker_inference_function.py')

In [None]:
lambda_client = boto3.client('lambda')

ZIPNAME = "sm-inference-package.zip"


def aws_file():
    with open(ZIPNAME, 'rb') as file_data:
        bytes_content = file_data.read()
    return bytes_content


FunctionName = 'sagemaker-inference-xgboost'

time.sleep(15)

response = lambda_client.create_function(Code={'ZipFile': aws_file()},
                                         Description='sagemaker immersion day',
                                         FunctionName=FunctionName,
                                         Handler='sagemaker_inference_function.lambda_handler',
                                         Publish=True,
                                         Role=role_arn,
                                         Runtime='python3.9')

IntegrationUri = response['FunctionArn']
IntegrationUri

### Lambda test event

If you want to verify the lambda is working properly you can setup a test event by copying the following json into a lambda test:

```json
{
  "version": "2.0",
  "routeKey": "POST /predict",
  "rawPath": "/predict",
  "rawQueryString": "",
  "headers": {
    "accept": "*/*",
    "accept-encoding": "gzip, deflate, br",
    "cache-control": "no-cache",
    "content-length": "127",
    "content-type": "text/csv",
    "host": "8fpcmkyo88.execute-api.us-west-2.amazonaws.com",
    "postman-token": "2c03b0e2-3f21-434b-879d-c7a7d7ff7633",
    "user-agent": "PostmanRuntime/7.28.4",
    "x-amzn-trace-id": "Root=1-614e5cb1-59e2e394227dfc3213740435",
    "x-forwarded-for": "54.86.50.139",
    "x-forwarded-port": "443",
    "x-forwarded-proto": "https"
  },
  "requestContext": {
    "accountId": "364430515305",
    "apiId": "8fpcmkyo88",
    "domainName": "8fpcmkyo88.execute-api.us-west-2.amazonaws.com",
    "domainPrefix": "8fpcmkyo88",
    "http": {
      "method": "POST",
      "path": "/predict",
      "protocol": "HTTP/1.1",
      "sourceIp": "54.86.50.139",
      "userAgent": "PostmanRuntime/7.28.4"
    },
    "requestId": "GMNryjUfvHcEMjA=",
    "routeKey": "POST /predict",
    "stage": "$default",
    "time": "24/Sep/2021:23:18:09 +0000",
    "timeEpoch": 1632525489647
  },
  "body": "{\"data\": \"3,3,3,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1\"}",
  "isBase64Encoded": false
}
```

### Create API Gateway

The last component is to create an API Gateway to proxy our lambda function and handle public http requests and responses.

In [None]:
def create_apigateway(Name='sagemaker-inference-xgboost', IntegrationUri=IntegrationUri):
    api_client = boto3.client('apigatewayv2')
    response = api_client.create_api(Name=Name, ProtocolType='HTTP')
    ApiEndpoint = response['ApiEndpoint']
    ApiId = response['ApiId']
    
    response = api_client.create_integration(ApiId=ApiId, 
                                         IntegrationType='AWS_PROXY', 
                                         IntegrationUri=IntegrationUri,
                                         PayloadFormatVersion='2.0')
    Target = response['IntegrationId']
    IntegrationId = response['IntegrationId']
    
    response = api_client.create_route(ApiId=ApiId, RouteKey='POST /predict', Target='integrations/'+Target)
    RouteKey = response['RouteKey'].split()[1]
    
    response = api_client.create_stage(ApiId=ApiId, StageName='v1')
    StageName = response['StageName']
    
    response = api_client.get_integration(ApiId=ApiId, IntegrationId=IntegrationId)
    SourceArn = f'arn:aws:execute-api:{awsRegion}:{awsAccount}:' + ApiId + '/*/*/predict'
    
    lambda_client = boto3.client('lambda')
    response = lambda_client.add_permission(FunctionName=FunctionName, 
                                            StatementId='1',
                                            Action='lambda:InvokeFunction',
                                            Principal='apigateway.amazonaws.com',
                                            SourceArn=SourceArn)
    response = api_client.create_deployment(ApiId=ApiId, StageName=StageName)
    
    url = ApiEndpoint + '/' + StageName + RouteKey
    return url, ApiId


url, ApiId = create_apigateway()
url

## 🧠 2. Inference with API Gateway and Lambda

In this example we setup an API Gateway with Lambda to make access to our SageMaker endpoint publicly available. This can be secured in a number of ways. We could add autorization to API Gateway as one option.

In [None]:
# helper function to get a single row from test dataframe
def get_sample(X_test, row):
    sample = X_test.iloc[row].values
    sample = np.array2string(sample, separator=',')
    sample = sample.strip('[').strip(']').lstrip(' ')
    sample = sample.replace('\n', '')
    sample = sample.replace(' ', '')
    return sample

payload = {'data': get_sample(X_test, 260)}
print(payload)

r = requests.post(url, json=payload)
r.text

## Flask Example

For this example we need to get the id of the default VPC. We will then create a security group which allows port 5000 and then launch and instance to run flask. Our sample flask app is available [here](https://raw.githubusercontent.com/sciarrilli/flask-sagemaker-inference/main/flask-app.py).

In [None]:
response['Vpcs'][0]['VpcId']

In [None]:
ec2_client = boto3.client('ec2')
response = ec2_client.describe_vpcs()
for vpc in response['Vpcs']:
    if len(response['Vpcs']) == 1:
        vpc_id = vpc['VpcId']
        print(f'vpc_id = {vpc_id}')
    elif vpc['Tags'][0]['Key'] == 'Name' and vpc['Tags'][0]['Value'] == 'default':
        vpc_id = vpc['VpcId']
        print(f'vpc_id = {vpc_id}')

        

In [None]:
response = ec2_client.create_security_group(GroupName='flask-sagemaker-sg',
                                     Description='flask-sagemaker-sg',
                                     VpcId=vpc_id)
security_group_id = response['GroupId']
print('Security Group Created %s in vpc %s.' % (security_group_id, vpc_id))

data = ec2_client.authorize_security_group_ingress(
    GroupId=security_group_id,
    IpPermissions=[
        {'IpProtocol': 'tcp',
         'FromPort': 5000,
         'ToPort': 5000,
         'IpRanges': [{'CidrIp': '0.0.0.0/0'}]},
        {'IpProtocol': 'tcp',
         'FromPort': 22,
         'ToPort': 22,
         'IpRanges': [{'CidrIp': '0.0.0.0/0'}]}
    ])

security_group_id

In [None]:
if awsRegion == 'us-west-2':
    image_id = 'ami-0c2d06d50ce30b442'
elif awsRegion == 'us-east-1':
    image_id = 'ami-087c17d1fe0178315'

image_id

endpoint_arn = sm_client.list_endpoints(SortBy='CreationTime')['Endpoints'][0]['EndpointArn']
endpoint_arn

ec2_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sagemaker:InvokeEndpoint",
            "Resource": None
        }
    ]
}
ec2_policy['Statement'][0]['Resource'] = endpoint_arn

response = iam_client.create_policy(
    PolicyName = 'SageMakerInvokePolicy',
    PolicyDocument = json.dumps(ec2_policy)
)

policy_arn = response['Policy']['Arn']

ec2_trust = {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

role_name = 'EC2-SageMaker-Invoke-Role'

role_response = iam_client.create_role(
    RoleName=role_name,
    AssumeRolePolicyDocument=json.dumps(ec2_trust)
)


role_arn = role_response['Role']['Arn']

attach_response = iam_client.attach_role_policy(
    RoleName=role_name,
    PolicyArn=policy_arn
)

response = iam_client.create_instance_profile(
    InstanceProfileName=role_name
)


response = iam_client.add_role_to_instance_profile(
    InstanceProfileName=role_name,
    RoleName=role_name
)
response

In [None]:
user_data = f'''#!/bin/bash
echo 'test' > /tmp/hello
yum update -y
yum install python3 python3-pip tmux htop -y
pip3 install flask boto3
wget https://raw.githubusercontent.com/sciarrilli/flask-sagemaker-inference/main/flask-app.py

ls
pwd
sed -i 's/MyEndPoint/{endpoint}/g' flask-app.py
sed -i 's/awsRegion/{awsRegion}/g' flask-app.py

tmux new-session -d -s flask-session
tmux send-keys 'python3 flask-app.py' C-m
tmux detach -s flask-session'''

In [None]:
instances = ec2_client.run_instances(
    ImageId=image_id,
    MinCount=1,
    MaxCount=1,
    InstanceType="t3.medium",
    UserData=user_data,
    SecurityGroupIds=[security_group_id],
    KeyName='macos',
    IamInstanceProfile={
        'Name': role_name
    },
    TagSpecifications=[
    {
        'ResourceType': 'instance',
        'Tags': [
            {
                'Key': 'Name',
                'Value': 'flask-sagemaker-inference'
            },
        ]
    },
])

InstanceId = instances['Instances'][0]['InstanceId']



In [None]:
time.sleep(15)
response = ec2_client.describe_instances(
    InstanceIds=[
        InstanceId,
    ]
)

flask_ip = response['Reservations'][0]['Instances'][0]['PublicIpAddress']
flask_ip

In [None]:
sample260 = get_sample(X_test, 260)
sample260

### 🗒️ Note: The ec2 instances takes 2 minutes for flask to be available

In [None]:
time.sleep(120)

## 🧠 3. Inference with Flask

For the third example we setup an ec2 instance which is running a basic flask API to receive http POST methods and send the request to the SageMaker endpoint for a prediction from our hosted XGBoost model.

In [None]:
%%bash -s "$flask_ip" "$sample260"
curl -s --location --request POST "http://$1:5000/pred" \
--header 'Content-Type: application/json' \
--data-raw "{\"data\": \"$2\"}"

#### sample curl

In [None]:
# curl --location --request POST 'http://localhost:5000/pred' \
#                          --header 'Content-Type: application/json' \
#                          --data-raw '{
#                          "data": "3,3,3,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1"
#                      }'

# THE END

---

## Clean up

In [None]:
_ = lambda_client.delete_function(
    FunctionName=FunctionName
)


api_client = boto3.client('apigatewayv2')
_ = api_client.delete_api(
    ApiId=ApiId
)


_ = ec2_client.terminate_instances(
    InstanceIds=[
        InstanceId,
    ]
)


_ = iam_client.detach_role_policy(
    RoleName=role_name,
    PolicyArn=policy_arn
)


_ = iam_client.delete_policy(
    PolicyArn=policy_arn
)


_ = iam_client.delete_role(
    RoleName=role_name
)

time.sleep(120)
_ = ec2_client.delete_security_group(
    GroupId=security_group_id
)
_