# Amazon SageMaker Serverless using Linear Learner
With [Amazon SageMaker Serverless ](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html), which is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless Inference is ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. This takes away the undifferentiated heavy lifting of selecting and managing servers. Serverless Inference integrates with AWS Lambda to offer you high availability, built-in fault tolerance and automatic scaling.

With a pay-per-use model, Serverless Inference is a cost-effective option if you have an infrequent or unpredictable traffic pattern. During times when there are no requests, Serverless Inference scales your endpoint down to 0, helping you to minimize your costs. 

You can integrate Serverless Inference with your MLOps Pipelines to streamline your ML workflow, and you can use a serverless endpoint to host a model registered with Model Registry.

This notebook showcases these capabilities

To demonstrate these capabilities, the notebook discusses the use case of predicting house prices in multiple cities using linear regression.  House prices are predicted based on features like number of bedrooms, number of garages, square footage etc.  Depending on the city, the features affect the house price differently.  For example, small changes in the square footage cause a drastic change in house prices in New York when compared to price changes in Houston.  For accurate house price predictions, we will train multiple linear regression models, a unique location specific model per city.  


### Contents

1. [Generate synthetic data for housing models](#Generate-synthetic-data-for-housing-models)
1. [Preprocess the raw housing data using Scikit Learn model](#Preprocess-synthetic-housing-data-using-scikit-learn)
1. [Train multiple house value prediction models for multiple cities](#Train-multiple-house-value-prediction-models)
1. [Create model entity with multi model support](#Create-sagemaker-multi-model-support)
1. [Create an inference pipeline with sklearn model and MME linear learner model](#Create-inference-pipeline)
1. [Exercise the inference pipeline - Get predictions from the different  linear learner models](#Exercise-inference-pipeline)
1. [Update Multi Model Endpoint with new models](#update-models)
1. [Explore granular access to the target models of MME](#Finegrain-control-invoke-models)
1. [Endpoint CloudWatch Metrics Analysis](#CW-metric-analysis)
1. [Clean up](#CleanUp)


## Section 1 - Generate synthetic data for housing models <a id='Generate-synthetic-data-for-housing-models'></a>

In this section, you will generate synthetic data that will be used to train the linear learner models.  The data generated consists of 6 numerical features - the year the house was built in, house size in square feet, number of bedrooms, number of bathroom, the lot size and number of garages and two categorial features - deck and front_porch.  

In [8]:
import numpy as np
import pandas as pd
import json
import datetime
import time
import boto3
import sagemaker
import os

from time import gmtime, strftime
from random import choice

from sagemaker import get_execution_role

from sagemaker.multidatamodel import MULTI_MODEL_CONTAINER_MODE
from sagemaker.multidatamodel import MultiDataModel

from sklearn.model_selection import train_test_split

In [9]:
NUM_HOUSES_PER_LOCATION = 1000
LOCATIONS  = ['NewYork_NY',    'LosAngeles_CA',   'Chicago_IL',    'Houston_TX',   'Dallas_TX',
              'Phoenix_AZ',    'Philadelphia_PA', 'SanAntonio_TX', 'SanDiego_CA',  'SanFrancisco_CA']
MAX_YEAR = 2019

In [10]:
def gen_price(house):
    """Generate price based on features of the house"""
    
    if house['FRONT_PORCH'] == 'y':
        garage = 1
    else:
        garage = 0
        
    if house['FRONT_PORCH'] == 'y':
        front_porch = 1
    else:
        front_porch = 0
        
    price = int(150 * house['SQUARE_FEET'] + \
                10000 * house['NUM_BEDROOMS'] + \
                15000 * house['NUM_BATHROOMS'] + \
                15000 * house['LOT_ACRES'] + \
                10000 * garage + \
                10000 * front_porch + \
                15000 * house['GARAGE_SPACES'] - \
                5000 * (MAX_YEAR - house['YEAR_BUILT']))
    return price

In [11]:
def gen_yes_no():
    """Generate values (y/n) for categorical features"""
    answer = choice(['y', 'n'])
    return answer

In [12]:
def gen_random_house():
    """Generate a row of data (single house information)"""
    house = {'SQUARE_FEET':    np.random.normal(3000, 750),
             'NUM_BEDROOMS':  np.random.randint(2, 7),
             'NUM_BATHROOMS': np.random.randint(2, 7) / 2,
             'LOT_ACRES':     round(np.random.normal(1.0, 0.25), 2),
             'GARAGE_SPACES': np.random.randint(0, 4),
             'YEAR_BUILT':    min(MAX_YEAR, int(np.random.normal(1995, 10))),
             'FRONT_PORCH':   gen_yes_no(),
             'DECK':          gen_yes_no()
            }
    
    price = gen_price(house)
    
    return [house['YEAR_BUILT'],   
            house['SQUARE_FEET'], 
            house['NUM_BEDROOMS'], 
            house['NUM_BATHROOMS'], 
            house['LOT_ACRES'],    
            house['GARAGE_SPACES'],
            house['FRONT_PORCH'],    
            house['DECK'], 
            price]

In [13]:
def gen_houses(num_houses):
    """Generate housing dataset"""
    house_list = []
    
    for _ in range(num_houses):
        house_list.append(gen_random_house())
        
    df = pd.DataFrame(
        house_list, 
        columns=[
            'YEAR_BUILT',    
            'SQUARE_FEET',  
            'NUM_BEDROOMS',            
            'NUM_BATHROOMS',
            'LOT_ACRES',
            'GARAGE_SPACES',
            'FRONT_PORCH',
            'DECK', 
            'PRICE']
    )
    return df

In [14]:
def save_data_locally(location, train, test): 
    """Save the housing data locally"""
    os.makedirs('data/{0}/train'.format(location), exist_ok=True)
    train.to_csv('data/{0}/train/train.csv'.format(location), sep=',', header=False, index=False)
       
    os.makedirs('data/{0}/test'.format(location), exist_ok=True)
    test.to_csv('data/{0}/test/test.csv'.format(location), sep=',', header=False, index=False) 

In [15]:
#Generate housing data for multiple locations.
#Change "PARALLEL_TRAINING_JOBS " to a lower number to limit the number of training jobs and models. Or to a higher value to experiment with more models.

PARALLEL_TRAINING_JOBS = 1

for loc in LOCATIONS[:PARALLEL_TRAINING_JOBS]:
    houses = gen_houses(NUM_HOUSES_PER_LOCATION)
    
    #Spliting data into train and test in 90:10 ratio
    #Not splitting the train data into train and val because its not preprocessed yet
    train, test = train_test_split(houses, test_size=0.1)
    save_data_locally(loc, train, test)


In [None]:
#Shows the first few lines of data.
houses.head()

## Section 2a ) Bring your own Model as tar ball in S3 <a id='Bring-your-own-model-as-tarball'></a>
Here we will use the Tar ball as is and then create all the required artifacts from scratch
    First we upload the Model tar ball to S3 to be used in our Transformer

In [16]:
sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')
sagemaker_session = sagemaker.Session()

s3 = boto3.resource('s3')
s3_client = boto3.client('s3')

BUCKET  = sagemaker_session.default_bucket()
print("BUCKET : ", BUCKET)

role = get_execution_role()
print("ROLE : ", role)

ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']
REGION = boto3.Session().region_name

DATA_PREFIX = 'DEMO_MME_LINEAR_LEARNER'
HOUSING_MODEL_NAME = 'housing'
MULTI_MODEL_ARTIFACTS = 'multi_model_artifacts'

BUCKET :  sagemaker-us-east-1-622343165275
ROLE :  arn:aws:iam::622343165275:role/service-role/AmazonSageMaker-ExecutionRole-20220208T115633


#### Upload the RAW data set to S3

In [18]:
#Upload the raw training data to S3 bucket, to be accessed by SKLearn
test_inputs = []
PARALLEL_TRAINING_JOBS=1
for loc in LOCATIONS[:PARALLEL_TRAINING_JOBS]:

    test_input = sagemaker_session.upload_data(
        path='./data/transform_output/use_for_test.csv'.format(loc),
        bucket=BUCKET,
        key_prefix='data/realtime/transform'
    )
    
    test_inputs.append(test_input)
    print("FINAL Test data uploaded to : ", test_input)

FINAL Test data uploaded to :  s3://sagemaker-us-east-1-622343165275/data/realtime/transform/use_for_test.csv


#### Upload the Model as tar ball into S3 location for use

In [19]:
# - UPLOAD the MODEL to S3
desired_model_s3 = 's3://{}/{}'.format(sagemaker_session.default_bucket(),'byom/scikit-learnestimator/model/realtime')
print(desired_model_s3)
model_s3_upload=sagemaker.s3.S3Uploader().upload(local_path='./models/realtime_linearlearner/model.tar.gz', desired_s3_uri=desired_model_s3,sagemaker_session=sagemaker_session)  
print(model_s3_upload)

s3://sagemaker-us-east-1-622343165275/byom/scikit-learnestimator/model/realtime
s3://sagemaker-us-east-1-622343165275/byom/scikit-learnestimator/model/realtime/model.tar.gz


#### Create the REAL TIME Jobs:

The steps to create the Job is straight forward
* Create a Model object from the S3 location with the Image
* Create a Model and a Real Time Serverless 
* Create a Serverless End Point
* Run the Metrics test



In [20]:
# Retrieve the Container image
container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework="sklearn", version="0.20.0") # 0.23-1"
container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='linear-learner')

In [21]:
print(f"using the model from s3={model_s3_upload}:")

using the model from s3=s3://sagemaker-us-east-1-622343165275/byom/scikit-learnestimator/model/realtime/model.tar.gz:


In [24]:
from sagemaker.model import Model
linear_model_name = "DEMO-SERVERLESS-LINEAR-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

linear_model = Model(
    name=linear_model_name,
    model_data=model_s3_upload,  
    role=role,
    sagemaker_session=sagemaker_session,
    #entry_point="scripts/sklearn_preprocessor_batch.py",
    #framework_version="0.20.0", #"0.23-1", #"0.20.0",
    image_uri=container,
    #source_dir="scripts",
)
print(linear_model.source_dir)
print(linear_model.entry_point)
print(linear_model)

None
None
<sagemaker.model.Model object at 0x7fb1836e3370>


#### Create Predictor
we will use the predictor object to run our inference on the end points and 
also DEPOY the Model

In [25]:
csv_serializer = sagemaker.serializers.CSVSerializer()
csv_deserializer = sagemaker.deserializers.CSVDeserializer()


serverless_config = sagemaker.serverless.ServerlessInferenceConfig(
    memory_size_in_mb=4096, 
    max_concurrency=3
)

predictor = linear_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    serializer=csv_serializer,
    deserializer=csv_deserializer,
    endpoint_name=linear_model_name, # use the same name 
    wait=True,
    #async_inference_config=None,
    serverless_inference_config=serverless_config,
)
print(predictor)

-----!

#### Create the Test Data set
To be used to test for our Serverless interface

In [28]:
test_data_df = pd.read_csv('./data/transform_output/use_for_test.csv', header=None) # Has the Price column Removed -- ONLY Features 
print(test_data_df.shape)
test_data_df.head(2)



(900, 10)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,-0.744936,-0.831786,-0.009418,-1.420279,0.106776,-1.312254,1.0,0.0,1.0,0.0
1,-1.136893,0.188842,-1.422126,-1.420279,-0.045943,0.451792,0.0,1.0,1.0,0.0


In [56]:
def predict_one_house_value_serverless(features, predictor_to_use):
    #print('SERVERLESS:to predict price of this house: {}'.format(features))
    body = ','.join(map(str, features)) + '\n'
    start_time = time.time()
     
    response = predictor_to_use.predict(features)
    #response_json = json.loads(response)
    #predicted_value = response_json['predictions'][0]['score']   
    predicted_value = float(response[0][0])
    
    duration = time.time() - start_time
    
    print('SEVERLESS:Price:of:house:${:.2f}, took {:,d} ms\n'.format(predicted_value, int(duration * 1000)))

#### Invoke 1st time
we will see the time to invoke 1st time will be much longer

In [42]:
predictor = sagemaker.predictor.Predictor(
    sagemaker_session=sagemaker_session,
    serializer=csv_serializer,
    deserializer=csv_deserializer,
    endpoint_name=linear_model_name, # use the same name 

)
print(f"Predictor:created:={predictor}:")

Predictor:created:=<sagemaker.predictor.Predictor object at 0x7fb181ef4220>:


In [57]:
# cell 15
# INVOKE the Predictor for the 1st time

predict_one_house_value_serverless(
    test_data_df.values.tolist()[0],  
    predictor
)

SEVERLESS:Price:of:house:$273472.94, took 103 ms



#### Now call it for the rest of all 900 rows to see the values
we will iterate and send all 900 rows in but 1 by 1

In [None]:
for one_row in test_data_df.values.tolist():
    predict_one_house_value_serverless(one_row,  predictor)
    time.sleep(0.005) # MINIMUM time to sleep -- so we should see the scale up of the serverless

SEVERLESS:Price:of:house:$273472.94, took 86 ms

SEVERLESS:Price:of:house:$394512.62, took 48 ms

SEVERLESS:Price:of:house:$443970.53, took 40 ms

SEVERLESS:Price:of:house:$651414.62, took 44 ms

SEVERLESS:Price:of:house:$358996.16, took 45 ms

SEVERLESS:Price:of:house:$458998.72, took 43 ms

SEVERLESS:Price:of:house:$339078.88, took 44 ms

SEVERLESS:Price:of:house:$305665.94, took 40 ms

SEVERLESS:Price:of:house:$596214.25, took 42 ms

SEVERLESS:Price:of:house:$242778.41, took 41 ms

SEVERLESS:Price:of:house:$434828.34, took 54 ms

SEVERLESS:Price:of:house:$573561.00, took 45 ms

SEVERLESS:Price:of:house:$274597.91, took 38 ms

SEVERLESS:Price:of:house:$396197.97, took 595 ms

SEVERLESS:Price:of:house:$387287.69, took 53 ms

SEVERLESS:Price:of:house:$533519.12, took 39 ms

SEVERLESS:Price:of:house:$187328.31, took 38 ms

SEVERLESS:Price:of:house:$579758.44, took 43 ms

SEVERLESS:Price:of:house:$486177.62, took 38 ms

SEVERLESS:Price:of:house:$483680.66, took 40 ms

SEVERLESS:Price:of:

#### This completes the serverless

## Section 5 : Invoke using the Boto3 API model <a id='invoke-boto3'></a>

Set up the Boto3 client and invoke the end point

In [58]:
import boto3
client = boto3.client('sagemaker-runtime')
content_type = "text/csv"
features = test_data_df.values.tolist()[0]
payload = ','.join(map(str, features)) + '\n'
response = client.invoke_endpoint(
    EndpointName=linear_model_name,
    ContentType=content_type,
    Body=payload)
result = json.loads(response['Body'].read().decode())
print(result)
print(f"The Serverless:returned:Price:={result['predictions'][0]['score']}")

{'predictions': [{'score': 273472.9375}]}
The Serverless:returned:Price:=273472.9375


## Section 6 - Endpoint CloudWatch Metrics Analysis <a id='CW-metric-analysis'></a>


Amazon SageMaker provides CloudWatch metrics for  endpoints so you can determine the endpoint usage and the cache hit rate and optimize your endpoint.  To analyze the endpoint and the container behavior, you will invoke the model:

    a. Create a Full 900 row data set
    b. Invoke one by one row to generate traffic with no stop
    c. Check cloud watch for instances and invocations

We use this order of invocations to observe the behavior of the CloudWatch metrics - LoadedModelCount, MemoryUtilization and ModelCacheHit.  You are encouraged to experiment with loading varying number of models to use the CloudWatch charts to help make ongoing decisions on the optimal choice of instance type, instance count, and number of models that a given endpoint should host.



In [59]:
##Invoke multiple models in a loop
def invoke_multiple_model(model_range_low, model_range_high, features_list):
    for i in range(model_range_low, model_range_high):
        predict_one_house_value_serverless(features_list[i],  predictor)


In [61]:
##Starting with no models loaded into the container
##Invoke the first 100 models
invoke_multiple_model(10, 19, test_data_df.values.tolist())

SEVERLESS:Price:of:house:$434828.34, took 141 ms

SEVERLESS:Price:of:house:$573561.00, took 39 ms

SEVERLESS:Price:of:house:$274597.91, took 41 ms

SEVERLESS:Price:of:house:$396197.97, took 38 ms

SEVERLESS:Price:of:house:$387287.69, took 64 ms

SEVERLESS:Price:of:house:$533519.12, took 45 ms

SEVERLESS:Price:of:house:$187328.31, took 41 ms

SEVERLESS:Price:of:house:$579758.44, took 38 ms

SEVERLESS:Price:of:house:$486177.62, took 38 ms



## Clean up<a id='CleanUp'></a>
Clean up the endpoint to avoid unneccessary costs.



In [None]:
#Delete the endpoint and underlying model
predictor.delete_model() 
predictor.delete_endpoint()
for t in preprocessor_transformers:
    t.delete_model()

In [None]:
#Delete the IAM Role
iam_client.detach_role_policy(
    PolicyArn=policy_arn,
    RoleName=role_name
)
iam_client.delete_role(RoleName=role_name)

In [None]:
#Delete the IAM Policy
iam_client.delete_policy(PolicyArn=policy_arn)