# Bring Your Own Model with SageMaker Script Mode

# Model Training on Sagemaker Training

### Overview

This notebook is derived from the original, larger, [Sagemaker bring your own model with script mode notebook](
https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-script-mode)

This notebook will demonstrate how you can bring your own model by using custom training and inference scripts, similar to those you would use outside of SageMaker, with SageMaker's prebuilt container for PyTorch.

SageMaker Script Mode is flexible so you'll also be seeing examples of how to include your own dependencies, such as a custom Python library, in your training and inference.

### Prerequisites

You need:
1. An S3 bucket for storing your model training and validation data
2. Model definition in a separate file "pytorch_model_def.py"
3. Custom model training and deploy code in a separate file "train_deploy_pytorch_without_dependencies.py"

### Imports

In [13]:
import random
import sagemaker
import os
import subprocess
import sys
from sagemaker.pytorch import PyTorch
from sagemaker.predictor import Predictor

Update Sagemaker to latest version

In [2]:
pip install -U sagemaker

Note: you may need to restart the kernel to use updated packages.


Make sure your SageMaker version is updated.

### Parameters

In [3]:
random.seed(42)

# Useful SageMaker variables
try:
    # You're using a SageMaker notebook
    sess = sagemaker.Session()
    bucketname = 'ml-cert-prep'
    role = sagemaker.get_execution_role()
except ValueError:
    # You're using a notebook somewhere else
    print("Setting role and SageMaker session manually...")
    bucketname = "ml-cert-prep"
    region = "us-east-1"

    iam = boto3.client("iam")
    sagemaker_client = boto3.client("sagemaker")

    sagemaker_execution_role_name = (
        "AmazonSageMaker-ExecutionRole-20191005T132574"  # Change this to your role name
    )
    role = iam.get_role(RoleName=sagemaker_execution_role_name)["Role"]["Arn"]
    boto3.setup_default_session(region_name=region, profile_name="default")
    sess = sagemaker.Session(sagemaker_client=sagemaker_client, default_bucket=bucket)

# Data paths in S3
s3_prefix = "pytorch-multiclass"
numpy_train_s3_prefix = f"{s3_prefix}/data/train"
numpy_train_s3_uri = f"s3://{bucketname}/{numpy_train_s3_prefix}"
numpy_test_s3_prefix = f"{s3_prefix}/data/test"
numpy_test_s3_uri = f"s3://{bucketname}/{numpy_test_s3_prefix}"

# Endpoint names
pytorch_endpoint_name = "pytorch-endpoint"

### Model training on Sagemaker training

In [4]:
hyperparameters = {"epochs": 50, "batch_size": 100, "learning_rate": 0.01}

train_instance_type = "ml.g4dn.xlarge"
inputs = {"train": numpy_train_s3_uri, "test": numpy_test_s3_uri}

estimator_parameters = {
    "entry_point": "train_deploy_pytorch_without_dependencies.py",
    "source_dir": "pytorch_script",
    "instance_type": train_instance_type,
    "instance_count": 1,
    "hyperparameters": hyperparameters,
    "role": role,
    "base_job_name": "pytorch-model",
    "framework_version": "1.10",
    "py_version": "py38",
}

estimator = PyTorch(**estimator_parameters)
estimator.fit(inputs)

2022-03-04 06:07:05 Starting - Starting the training job...
2022-03-04 06:07:21 Starting - Preparing the instances for trainingProfilerReport-1646374024: InProgress
.........
2022-03-04 06:09:01 Downloading - Downloading input data...
2022-03-04 06:09:28 Training - Downloading the training image........................
2022-03-04 06:13:35 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-03-04 06:13:37,842 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-03-04 06:13:37,864 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-03-04 06:13:37,874 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-03-04 06:13:38,379 sagemaker-training-toolkit INFO     Installing dependencies from requirement

[34mAttempting uninstall: ipython[0m
[34mFound existing installation: ipython 7.18.1[0m
[34mUninstalling ipython-7.18.1:[0m
[34mSuccessfully uninstalled ipython-7.18.1[0m
[34mAttempting uninstall: matplotlib[0m
[34mFound existing installation: matplotlib 3.5.0[0m
[34mUninstalling matplotlib-3.5.0:[0m
[34mSuccessfully uninstalled matplotlib-3.5.0[0m
[34mAttempting uninstall: seaborn[0m
[34mFound existing installation: seaborn 0.11.2[0m
[34mUninstalling seaborn-0.11.2:[0m
[34mSuccessfully uninstalled seaborn-0.11.2[0m
[34mSuccessfully installed Send2Trash-1.8.0 argon2-cffi-21.3.0 argon2-cffi-bindings-21.2.0 asttokens-2.0.5 bleach-4.1.0 celluloid-0.2.0 d2l-0.16.0 debugpy-1.5.1 defusedxml-0.7.1 entrypoints-0.4 executing-0.8.3 ipykernel-6.9.1 ipython-8.1.1 ipywidgets-7.6.5 jupyter-1.0.0 jupyter-client-7.1.2 jupyter-console-6.4.0 jupyter-core-4.9.2 jupyterlab-pygments-0.1.2 jupyterlab-widgets-1.0.2 matplotlib-3.3.4 matplotlib-inline-0.1.3 mistune-0.8.4 nbclient-0.5.1

[34mepoch: 6 -> loss: 0.33282962441444397[0m
[34mINFO:__main__:epoch: 6 -> loss: 0.33282962441444397[0m
[34mepoch: 7 -> loss: 0.2644403278827667[0m
[34mINFO:__main__:epoch: 7 -> loss: 0.2644403278827667[0m
[34mepoch: 8 -> loss: 0.1977018415927887[0m
[34mINFO:__main__:epoch: 8 -> loss: 0.1977018415927887[0m
[34mepoch: 9 -> loss: 0.09165088087320328[0m
[34mINFO:__main__:epoch: 9 -> loss: 0.09165088087320328[0m
[34mepoch: 10 -> loss: 0.1701180785894394[0m
[34mINFO:__main__:epoch: 10 -> loss: 0.1701180785894394[0m
[34mepoch: 11 -> loss: 0.07433140277862549[0m
[34mINFO:__main__:epoch: 11 -> loss: 0.07433140277862549[0m
[34mepoch: 12 -> loss: 0.07176687568426132[0m
[34mINFO:__main__:epoch: 12 -> loss: 0.07176687568426132[0m
[34mepoch: 13 -> loss: 0.2582663893699646[0m
[34mINFO:__main__:epoch: 13 -> loss: 0.2582663893699646[0m
[34mepoch: 14 -> loss: 0.1702108085155487[0m
[34mINFO:__main__:epoch: 14 -> loss: 0.1702108085155487[0m
[34mepoch: 15 -> loss: 0.08

### Deploy the mode on a Sagemaker endpoint

In [16]:
pytorch_predictor = estimator.deploy(
        initial_instance_count=1, instance_type="ml.m5.4xlarge", endpoint_name=pytorch_endpoint_name
    )

-------!

Then we can use the endpoint to make predictions.

In [None]:
pytorch_predictor.serializer = JSONSerializer()
pytorch_predictor.deserializer = JSONDeserializer()

pytorch_predictor.predict(x_test.values[0])

### Cleanup

In [17]:
pytorch_predictor = Predictor(
        endpoint_name="pytorch-endpoint",
        sagemaker_session=sess,
    )

resources = (
    [pytorch_endpoint_name, pytorch_predictor],
)

for resource in resources:
    existing_endpoints = sess.sagemaker_client.list_endpoints(
        NameContains=resource[0], MaxResults=30
    )["Endpoints"]
    if existing_endpoints:
        resource[1].delete_endpoint(delete_endpoint_config=True)