# 06 - An ML script with custom dependencies

We have a training script but it requires various other python modules, such as helper module defined in the local directory of your laptop. 

In [1]:
import os
import boto3
from sagemaker import Session
from sagemaker.sklearn.estimator import SKLearn

## AWS Session

In [2]:
region = os.environ.get("DEMO_AWS_REGION")
boto3_session = boto3.Session(region_name=region, profile_name=os.environ.get("DEMO_AWS_PROFILE_NAME"))

sagemaker_session = Session(boto_session=boto3_session)

account = os.environ.get("DEMO_AWS_ACCOUNT")  # sandbox-admin account
role = f"arn:aws:iam::{account}:role/service-role/AmazonSageMaker-ExecutionRole-20171129T145583"

## Upload data to S3

A SageMaker job needs permission to access the data in S3. Your user/role will also need permissions to run a SageMaker job. You can find more details about the needed permissions in [SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). In a SageMaker notebook, you can use the notebook role defined below.

In [3]:
# Upload training data from local machine to S3
local_data_location = "../data"

data_location = sagemaker_session.upload_data(
    path=local_data_location, key_prefix="sagemaker_demo_data"
)

In [4]:
data_location

's3://sagemaker-eu-west-1-604842001064/sagemaker_demo_data'

## Run Script using a source_dir

In [5]:
sklearn = SKLearn(
    entry_point='ml_script_with_dependancies.py',
    train_instance_type="ml.m5.large",
    role=role,
    sagemaker_session=sagemaker_session,
    hyperparameters={"penalty": "l1", "C": 0.01},
    source_dir="."
)

In [6]:
sklearn.fit(
    {"train": data_location}
)

2020-02-20 10:20:28 Starting - Starting the training job...
2020-02-20 10:20:30 Starting - Launching requested ML instances......
2020-02-20 10:21:29 Starting - Preparing the instances for training...
2020-02-20 10:22:09 Downloading - Downloading input data...
2020-02-20 10:22:56 Training - Training image download completed. Training in progress.
2020-02-20 10:22:56 Uploading - Uploading generated training model[34m2020-02-20 10:22:50,706 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2020-02-20 10:22:50,708 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-02-20 10:22:50,718 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2020-02-20 10:22:50,986 sagemaker-containers INFO     Module ml_script_with_dependancies does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-02-20 10:22:50,986 sagemaker-containers INFO     Generating setup.cfg[0m
[34m


2020-02-20 10:23:02 Completed - Training job completed
Training seconds: 53
Billable seconds: 53


## Run Script using a custom Docker image

In [8]:
sklearn = SKLearn(
    entry_point='ml_script_with_dependancies.py',
    train_instance_type="ml.m5.large",
    role=role,
    sagemaker_session=sagemaker_session,
    hyperparameters={"penalty": "l1", "C": 0.01},
    image_name=f"{account}.dkr.ecr.{region}.amazonaws.com/sagemaker-sklearn-expanded:latest",
)

In [9]:
sklearn.fit(
    {"train": data_location}
)

2020-02-20 10:24:52 Starting - Starting the training job...
2020-02-20 10:24:53 Starting - Launching requested ML instances......
2020-02-20 10:25:53 Starting - Preparing the instances for training...
2020-02-20 10:26:37 Downloading - Downloading input data
2020-02-20 10:26:37 Training - Downloading the training image......
2020-02-20 10:27:51 Uploading - Uploading generated training model
2020-02-20 10:27:51 Completed - Training job completed
[34m2020-02-20 10:27:39,005 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2020-02-20 10:27:39,008 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-02-20 10:27:39,018 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2020-02-20 10:27:39,019 sagemaker-containers INFO     Module ml_script_with_dependancies does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-02-20 10:27:39,019 sagemaker-containers INFO  

Training seconds: 81
Billable seconds: 81
