# Sagemaker Studio Exploration

The code below has been used as part of the exercise to explore Sagemaker, especially the scripts. The content has been taken from :
https://github.com/learn-mikegchambers-com/aws-mls-c01/blob/master/8-SageMaker/SageMaker-Script-Mode/SageMaker-Script-Mode.ipynb

Note: Code is structured in such a way that each chunk can be run independently

## Create a dataset and save

In [1]:
from sklearn import datasets
import pickle

In [2]:
X, y = datasets.make_regression(100, 1, noise=5, bias=0)

In [3]:
pickle.dump([X,y], open('./train.pickle', 'wb'))

## Create a model from the dataset

In [5]:
from sklearn.linear_model import LinearRegression
import pickle

In [6]:
[XX, yy] = pickle.load(open('./train.pickle', 'rb'))
model = LinearRegression()
model.fit(XX,yy)

## Make a test prediction

In [7]:
model.predict([[0],[1],[2],[3]])

array([ 0.526403  ,  9.47321826, 18.42003352, 27.36684877])

## Save model to a file

In [8]:
p = pickle.dumps(model)
pickle.dump(model, open('./model.pickle', 'wb'))

## Later load the model from a file

In [9]:
from sklearn.linear_model import LinearRegression
import pickle

In [10]:
loaded_model = pickle.load(open('./model.pickle', 'rb'))

In [11]:
loaded_model.predict([[0],[1],[2],[3]])

array([ 0.526403  ,  9.47321826, 18.42003352, 27.36684877])

# Sagemaker Training

In [14]:
import sagemaker
from sagemaker.sklearn.estimator import SKLearn # Can also specify scpecific version
import boto3
import os

Create some variables that will be used through this process:

In [16]:
# Setting up sagemaker session

role = sagemaker.get_execution_role()
sess = sagemaker.Session()
bucket = sess.default_bucket()

# Good practice to keep things portable to run with different accounts
s3_prefix = "script-mode-workflow"
pickle_s3_prefix = f"{s3_prefix}/pickle"
pickle_s3_uri = f"s3://{bucket}/{s3_prefix}/pickle"
pickle_train_s3_uri = f"{pickle_s3_uri}/train"

train_dir = os.path.join(os.getcwd(), "")

Upload the training data to S3, so it's available for SageMaker training:

In [17]:
# Makes data available in S3 - S3 will be main channel where data is available

s3_resource_bucket = boto3.Session().resource("s3").Bucket(bucket)
s3_resource_bucket.Object(os.path.join(pickle_s3_prefix, "train.pickle")).upload_file(
    train_dir + "/train.pickle"
)

Create some hyperparameters:

In [18]:
# This is not required as these values are the defaults:

hyperparameters = {
    "copy_X": True,
    "fit_intercept": True,
    "normalize": False,
}

More configuration for the model:

In [25]:
train_instance_type = "ml.m5.large"

inputs = {
    "train": pickle_train_s3_uri
}

The SageMaker Estimator object is a high level interface for SageMaker training. This object represents the algorithm, the data, and other configuration.

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html

In [26]:
estimator_parameters = {
    "entry_point": "script.py",
    "source_dir": "script",
    "framework_version": "0.23-1", #specify which version of SK Learn is to be used
    "py_version": "py3",
    "instance_type": train_instance_type,
    "instance_count": 1,
    "hyperparameters": hyperparameters,
    "role": role,
    "base_job_name": "linearregression-model",
}

estimator = SKLearn(**estimator_parameters)

When we call 'fit' SageMaker will spin up managed containers, transfer the code and data to the container and then start the training. All this happens off of the notebook server. We can watch the training through the console, and watch the logs in CloudWatch Logs.

In [27]:
estimator.fit(inputs)

INFO:sagemaker:Creating training-job with name: linearregression-model-2024-12-01-09-50-40-487


2024-12-01 09:50:42 Starting - Starting the training job...
2024-12-01 09:50:57 Starting - Preparing the instances for training...
2024-12-01 09:51:44 Downloading - Downloading the training image......
2024-12-01 09:52:35 Training - Training image download completed. Training in progress.
2024-12-01 09:52:35 Uploading - Uploading generated training model[34m2024-12-01 09:52:28,649 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2024-12-01 09:52:28,653 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2024-12-01 09:52:28,701 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2024-12-01 09:52:28,866 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2024-12-01 09:52:28,879 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2024-12-01 09:52:28,892 sagemaker-training-toolkit INFO     No

## SageMaker Endpoint
We can now take create a 'predictor' by deploying the estimator. Then we can use it to make new predictions.

(Make sure that the 'endpoint_name' used is not currently running.)

In [28]:
sklearn_predictor = estimator.deploy(initial_instance_count=1,
                                     instance_type='ml.m5.large',
                                     endpoint_name='linearregression-endpoint')

INFO:sagemaker:Creating model with name: linearregression-model-2024-12-01-09-57-15-521
INFO:sagemaker:Creating endpoint-config with name linearregression-endpoint
INFO:sagemaker:Creating endpoint with name linearregression-endpoint


------!

In [29]:
sklearn_predictor.predict([[0],[1],[2],[3]])

array([ 0.526403  ,  9.47321826, 18.42003352, 27.36684877])

## Clean up
Running this cell will remove the endpoint and configuration:

In [30]:
sklearn_predictor.delete_endpoint(True)

INFO:sagemaker:Deleting endpoint configuration with name: linearregression-endpoint
INFO:sagemaker:Deleting endpoint with name: linearregression-endpoint
