# Huggingface Sagemaker example using `Trainer` class

Each folder starting with `0X_..` contains an specific sagemaker example. Each example contains a jupyter notebooke `sagemaker-example.ipynb` and a `src/` folder. The `sagemaker-example` is a jupyter notebook which is used to train transformers and datasets on AWS Sagemaker. The `src/` folder contains the `train.py`, our training script and `requirements.txt` for additional dependencies.


## Initializing Sagemaker Session with local AWS Profile

In [1]:
local_profile_name='hf-sm'

In [10]:
import sagemaker
import boto3

# creates a boto3 session using the local profile we defined
bt3 = boto3.session.Session(profile_name=local_profile_name)


sess = sagemaker.Session(boto_session=bt3)

# since we are using the sagemaker-sdk locally we cannot `get_execution_role` 
# role = sagemaker.get_execution_role()

Couldn't call 'get_role' to get Role ARN from role name philipp to get Role path.


From outside these notebooks, `get_execution_role()` will return an exception because it does not know what is the role name that SageMaker requires.

To solve this issue, pass the IAM role name instead of using `get_execution_role()`.

In [3]:
role_name = "SageMakerRole"

_WARNING: This policy gives full S3 access to the container that is running in SageMaker. You can change this policy to a more restrictive one, or create your own policy._

In [4]:
%%bash  -s "$local_profile_name" "$role_name" 
# This script creates a role named SageMakerRole
# that can be used by SageMaker and has Full access to S3.

ROLE_NAME=$2

# WARNING: this policy gives full S3 access to container that
# is running in SageMaker. You can change this policy to a more
# restrictive one, or create your own policy.
POLICY_S3=arn:aws:iam::aws:policy/AmazonS3FullAccess

# Creates a AWS policy that allows the role to interact
# with ANY S3 bucket
cat <<EOF > /tmp/assume-role-policy-document.json
{
	"Version": "2012-10-17",
	"Statement": [{
		"Effect": "Allow",
		"Principal": {
			"Service": "sagemaker.amazonaws.com"
		},
		"Action": "sts:AssumeRole"
	}]
}
EOF

# Creates the role
aws iam create-role --profile $1  --role-name ${ROLE_NAME} --assume-role-policy-document file:///tmp/assume-role-policy-document.json

# attaches the S3 full access policy to the role
aws iam attach-role-policy --profile $1 --policy-arn ${POLICY_S3}  --role-name ${ROLE_NAME}


An error occurred (EntityAlreadyExists) when calling the CreateRole operation: Role with name SageMakerRole already exists.


In [11]:
# get create role arn 
iam = bt3.client('iam')
role = iam.get_role(RoleName=role_name)['Role']['Arn']

## Create an local estimator for testing

You run PyTorch training scripts on SageMaker by creating PyTorch Estimators. SageMaker training of your script is invoked when you call fit on a PyTorch Estimator. The following code sample shows how you train a custom PyTorch script `train.py`, passing in three hyperparameters (`epochs`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`

in sagemaker you can test you training in a "local-mode" by setting your instance_type to `'local'`


In [24]:
from sagemaker.pytorch import PyTorch

pytorch_estimator = PyTorch(entry_point='train.py',
                            source_dir='src',
                            base_job_name='huggingface',
                            instance_type='local',
                            instance_count=1,
                            role=role,
                            framework_version='1.5.0',
                            py_version='py3',
                            hyperparameters = {'epochs': 1,
                                               'train-batch-size': 32})

In [None]:
pytorch_estimator.fit()

## Create an Estimator

You run PyTorch training scripts on SageMaker by creating PyTorch Estimators. SageMaker training of your script is invoked when you call fit on a PyTorch Estimator. The following code sample shows how you train a custom PyTorch script `train.py`, passing in three hyperparameters (`epochs`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`


In [16]:
from sagemaker.pytorch import PyTorch

pytorch_estimator = PyTorch(entry_point='train.py',
                            source_dir='src',
                            sagemaker_session=sess,
                            base_job_name='huggingface',
                            instance_type='ml.p3.2xlarge',
                            instance_count=1,
                            role=role,
                            framework_version='1.5.0',
                            py_version='py3',
                            hyperparameters = {'epochs': 20, 'batch-size': 64, 'learning-rate': 0.1})

In [17]:
pytorch_estimator.fit()

2020-12-22 08:39:46 Starting - Starting the training job...
2020-12-22 08:39:48 Starting - Launching requested ML instancesProfilerReport-1608626385: InProgress
......
2020-12-22 08:40:56 Starting - Preparing the instances for training.........
2020-12-22 08:42:51 Downloading - Downloading input data
2020-12-22 08:42:51 Training - Downloading the training image.........
2020-12-22 08:44:25 Uploading - Uploading generated training model
2020-12-22 08:44:25 Completed - Training job completed
..Training seconds: 106
Billable seconds: 106
