## Connecting Sagemaker models to LAMBDA and API GATEWAY
This notebook will guide you to deploy you **Sagmaker** model in a serverless mode, using **Lambda** and **API Gateway** AWS microservcie. Instead of using a Sagemaker built-in algorithm we are going to use a the **scikit-learn framework**, first using endpoint mode to test Lambda and API Gateway acrhitecture and the batch transform.

### WINE QUALITY
I am goint to use the wine quality dataset, that aims you to predict the quality of a wine using around 11 numerical variables. There are no missing values and we just want to test Scikit-learn framework with endpoint and batch transform modes.

In [2]:
import pandas as pd
import numpy as np

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.estimator import SKLearn
import io
import boto3

sagemaker_session = sagemaker.Session()
role = get_execution_role()

In [7]:
df = pd.read_csv("winequality.csv")

In [4]:
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [5]:
#No null values!
df.isnull().sum()

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64

In [27]:
#Make sure our target variable is our first value. To make more sensego to script_winequality python files.
df = df[['quality', 'fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides',
    'free sulfur dioxide', 'total sulfur dioxide', 'density','pH', 'sulphates', 'alcohol']]

In [10]:
#for batch transform test.I will download locally and upload manually to input path on s3.
#tryouts/input/data
df.iloc[:,1:].to_csv("test.csv", header = None, index = None)

In [4]:
#Save in sagemaker notebooks space to later upload to S3
df.to_csv("train.csv", header = None, index = None)

In [60]:
#This will store data in the default bucket
train_input = sagemaker_session.upload_data("train.csv", key_prefix="tryouts/input/data/train")

In [3]:
#https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/using_sklearn.html
FRAMEWORK_VERSION = "0.23-1"
script_path = 'script_winequality_batch_transform.py'

#specify the hyperparamters that were defined in script_path
#output path is were you will store the model. This will be called whrn batch transform
sklearn_batch = SKLearn(
    entry_point=script_path,
    framework_version=FRAMEWORK_VERSION,
    output_path = 's3://sagemaker-us-east-1-563718358426/tryouts/model',    
    train_instance_type="ml.m5.large",
    role=role,
    sagemaker_session=sagemaker_session,
    hyperparameters={'alpha' : 0.8, 
                     "fit_intercept" : True,
                     "max_iter" : 1000,
                     "random_state" : 12})

In [None]:
#I am not specifying the name of the file, sice it has been specified in script file
sklearn_batch.fit({'train': "s3://sagemaker-us-east-1-563718358426/tryouts/input/data/train"})

In [5]:
#this for batch transform
#I am specifying the ouput path, were predictions will be stored
transformer = sklearn_batch.transformer(instance_count=1, instance_type="ml.m5.large", output_path='s3://sagemaker-us-east-1-563718358426/tryouts/output')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [None]:
#if you want to specify the name use job_name="name_of_job"
transformer.transform("s3://sagemaker-us-east-1-563718358426/tryouts/input/data/test.csv", content_type='text/csv', split_type='Line')
transformer.wait()

### Go to tryouts/output and you will find a test.csv.out file with the predictions
#### After that is recommended to delete the files on S3. You can reuse the model and deploy the enpoint.
#### we will not reuse but retrain the model.

In [21]:
#https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/using_sklearn.html
FRAMEWORK_VERSION = "0.23-1"
script_path = 'script_winequality_endpoint.py'

sklearn_endpoint = SKLearn(
    entry_point=script_path,
    framework_version=FRAMEWORK_VERSION,
    output_path = 's3://sagemaker-us-east-1-563718358426/tryouts/model',    
    train_instance_type="ml.m5.large",
    role=role,
    sagemaker_session=sagemaker_session,
    hyperparameters={'alpha' : 0.8, 
                     "fit_intercept" : True,
                     "max_iter" : 1000,
                     "random_state" : 12})

In [None]:
sklearn_endpoint.fit({'train': "s3://sagemaker-us-east-1-563718358426/tryouts/input/data/train"})

In [26]:
#You can check your model inside the inference --> Model section.
#And check the endpoint inside the Endpoints section
predictor = sklearn_endpoint.deploy(initial_instance_count=1, instance_type="ml.m5.large")

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: sagemaker-scikit-learn-2020-09-12-21-13-48-175


-----------------!

In [30]:
#You can try df.values for whole batch
list(df.iloc[1, 1:])

[7.8, 0.76, 0.04, 2.3, 0.092, 15.0, 54.0, 0.997, 3.26, 0.65, 9.8]

In [34]:
#Since we are using input_fn in our script_winequality file we cannot use predictor.predict. 
#If want so do not use input_fn and call:
predictor.predict(data = list(df.iloc[1, 1:]))

### Since our endpoint will accept csv type file on Bytes format. Use the following fucntion to test it.

In [35]:
def to_csv(_list):
    csv = io.BytesIO()
    np.savetxt(csv, _list, delimiter=',', fmt='%g')
    return csv.getvalue().decode().rstrip()

In [162]:
#test the observation 100.
to_csv([list(df.iloc[100, 1:])])

'7.8,0.5,0.3,1.9,0.075,8,22,0.9959,3.31,0.56,10.4'

In [36]:
#Use runtime sagemaker from boto3.Same logic will be used in lambda file.
runtime= boto3.client('runtime.sagemaker')

#Invoke endpoint for EndpointName: copy and paste name from endpoint
#In Body paste the to_csv output, or simply to_csv([list(df.iloc[100, 1:])])
response = runtime.invoke_endpoint(EndpointName="sagemaker-scikit-learn-2020-09-12-21-13-48-175",
                                   ContentType='text/csv',
                                   Body="7.8,0.76,0.04,2.3,0.092,15,54,0.997,3.26,0.65,9.8")

In [37]:
#this the output format: same in lambda
response["Body"].read().decode()

'[5.60775418662084]'

### This the Lambda function we will use to POST in API_SAGEMAKER

In [25]:
!cat lambda_function.py

import os
import io
import boto3
import json
import csv

# I Create the variable called API_SAGEMAKER and as value de saagemaker endpoint
api_sagemaker = os.environ['API_SAGEMAKER']
#with te following line you can later invoke sagemaker endpoint
runtime= boto3.client('runtime.sagemaker')

#when you create a lamnda function you have to specify a lambda handler.And basically is executed when lambda invokes de code
def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    
    #make sure that received data is converted to json format
    data = json.loads(json.dumps(event))
    #extract what is inside data key
    payload = data['data']
    print(payload)
    
    response = runtime.invoke_endpoint(EndpointName=api_sagemaker,
                                       ContentType='text/csv',
                                       Body=payload)
    print('Initializing!')
    print(response)
    result = json.loads(response['Body'].read().decode())
    

## Search for Lambda service 
### Create a Lambda function!
#### Click on "create function" and then "Author from scratch"
I will call the lambda function QUALITY (since we are testing wine quality)<br>
And Runtime select Python 3<br>

![creating-lambda](_lambda-create-function.png)

Once created, search for **enviroment variables** and add one. I will call the enviroment variable **Key** "API_SAGEMAKER" and paste on **Value** the name of the endpoint: "sagemaker-scikit-learn-2020-09-12-21-13-48-175"

![creating-lambda](lambda_variables.png)

## Go to API Gateway
### Then go to CREATE API
Choose the REST protocol and create new API. I am calling mine quality_api

![creating-lambda](API_GATEWAY_creation.png)

# Connecting LAMBDA AND API GATEWAY
After that go to actions and create a Method POST. Select Lambda function and type the name of the, in my case: QUALITY

![creating-lambda](post_setup.png)

## Before deploying the API, TEST the logic. Go to test:
<br>

![creating-lambda](TEST.png)

## FINALLY, we will deploy. 
Go to DEPLOY API by creating a new stage that we will call deploy_quality. This will return the URI to call POST, to test this go to POSTMAN.
![creating-lambda](postman_test.png)

## If you go to your lambda function, the logic will look as follows:

![creating-lambda](LAMBDA_diagram.png)

Aside from Udacity's Machine Learning Engineer Nanodegree, try the following Udemy courses:
* mastering-boto3-with-aws-services
* aws-machine-learning-a-complete-guide-with-python

## Remember!
* You can **debug** your code using AWS CloudWatch. Go to your endpoint and view logs
* **Delete** your endpoint, API (from API GATEWAY) and Lambda function, as well as your files on S3.