![@mikegchambers](../../images/header.png)

# SageMaker SDK - Training a model in 'script mode' and deploying the endpoint. 

In this notebook we will look at how models can be trained, saved, loaded and run.  Then we will see how the same thing can be achived using the SageMaker SDK, and SageMaker managed infrastructure. 

This notebook represents a massively over engineered excercise, however the principles, if not the code itself, and be leveraged for much bigger projects.

# Create a dataset and save

In [None]:
from sklearn import datasets
import pickle

In [None]:
X, y = datasets.make_regression(100, 1, noise=5, bias=0)

In [None]:
pickle.dump([X,y], open('./train.pickle', 'wb'))

# Create a model from the dataset

In [None]:
from sklearn.linear_model import LinearRegression
import pickle

In [None]:
[XX, yy] = pickle.load(open('./train.pickle', 'rb'))

In [None]:
model = LinearRegression()

In [None]:
model.fit(XX,yy)

## Make a test prediction

In [None]:
model.predict([[0],[1],[2],[3]])

# Save the model to a file

In [None]:
p = pickle.dumps(model)

In [None]:
pickle.dump(model, open('./model.pickle', 'wb'))

# Later load the model from a file

In [None]:
from sklearn.linear_model import LinearRegression
import pickle

In [None]:
loaded_model = pickle.load(open('./model.pickle', 'rb'))

## Make a test prediction

In [None]:
loaded_model.predict([[0],[1],[2],[3]])

# SageMaker Training

In [None]:
import sagemaker
from sagemaker.sklearn.estimator import SKLearn
import boto3
import os

Create some variables that will be used through this process:

In [None]:
role = sagemaker.get_execution_role()
sess = sagemaker.Session()
bucket = sess.default_bucket()

s3_prefix = "script-mode-workflow"
pickle_s3_prefix = f"{s3_prefix}/pickle"
pickle_s3_uri = f"s3://{bucket}/{s3_prefix}/pickle"
pickle_train_s3_uri = f"{pickle_s3_uri}/train"

train_dir = os.path.join(os.getcwd(), "")

Upload the training data to S3, so it's available for SageMaker training:

In [None]:
s3_resource_bucket = boto3.Session().resource("s3").Bucket(bucket)
s3_resource_bucket.Object(os.path.join(pickle_s3_prefix, "train.pickle")).upload_file(
    train_dir + "/train.pickle"
)

Create some hyperparameters:

In [None]:
# This is not required as these values are the defaults:

hyperparameters = {
    "copy_X": True,
    "fit_intercept": True,
    "normalize": False,
}

More configuration for the model:

In [None]:
train_instance_type = "ml.m5.large"

inputs = {
    "train": pickle_train_s3_uri
}

The SageMaker Estimator object is a high level interface for SageMaker training.  This object represents the algorithm, the data, and other configuration. 

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html

In [None]:
estimator_parameters = {
    "entry_point": "script.py",
    "source_dir": "script",
    "framework_version": "0.23-1",
    "py_version": "py3",
    "instance_type": train_instance_type,
    "instance_count": 1,
    "hyperparameters": hyperparameters,
    "role": role,
    "base_job_name": "linearregression-model",
}

estimator = SKLearn(**estimator_parameters)

When we call 'fit' SageMaker will spin up managed containers, transfer the code and data to the container and then start the training.  All this happens off of the notebook server.  We can watch the training through the console, and watch the logs in CloudWatch Logs.

In [None]:
estimator.fit(inputs)

# SageMaker Endpoint

We can now take create a 'predictor' by deploying the estimator.  Then we can use it to make new predictions.

(Make sure that the 'endpoint_name' used is not currently running.)

In [None]:
sklearn_predictor = estimator.deploy(initial_instance_count=1,
                                     instance_type='ml.m5.large',
                                     endpoint_name='linearregression-endpoint')

In [None]:
sklearn_predictor.predict([[0],[1],[2],[3]])

## Clean up

Running this cell will remove the endpoint and configuration:

In [None]:
sklearn_predictor.delete_endpoint(True)