# Huggingface Sagemaker example using `Trainer` class

Each folder starting with `0X_..` contains an specific sagemaker example. Each example contains a jupyter notebooke `sagemaker-example.ipynb` and a `src/` folder. The `sagemaker-example` is a jupyter notebook which is used to train transformers and datasets on AWS Sagemaker. The `src/` folder contains the `train.py`, our training script and `requirements.txt` for additional dependencies.


## Initializing Sagemaker Session with local AWS Profile

In [13]:
!pip install 'sagemaker[local]'

Collecting docker-compose>=1.25.2; extra == "local"
  Using cached docker_compose-1.27.4-py2.py3-none-any.whl (110 kB)
Collecting dockerpty<1,>=0.4.1
  Using cached dockerpty-0.4.1.tar.gz (13 kB)
Collecting docopt<1,>=0.6.1
  Using cached docopt-0.6.2.tar.gz (25 kB)
Collecting cached-property<2,>=1.2.0
  Using cached cached_property-1.5.2-py2.py3-none-any.whl (7.6 kB)
Collecting websocket-client<1,>=0.32.0
  Using cached websocket_client-0.57.0-py2.py3-none-any.whl (200 kB)
Collecting docker[ssh]<5,>=4.3.1
  Using cached docker-4.4.0-py2.py3-none-any.whl (146 kB)
Collecting texttable<2,>=0.9.0
  Using cached texttable-1.6.3-py2.py3-none-any.whl (10 kB)
Collecting python-dotenv<1,>=0.13.0
  Using cached python_dotenv-0.15.0-py2.py3-none-any.whl (18 kB)
Collecting distro<2,>=1.5.0
  Using cached distro-1.5.0-py2.py3-none-any.whl (18 kB)
Collecting jsonschema<4,>=2.5.1
  Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 665 kB/s  et

In [4]:
import sagemaker


sess = sagemaker.Session()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    import boto3
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='SageMakerRole')['Role']['Arn']

Couldn't call 'get_role' to get Role ARN from role name SageMakerRole to get Role path.


## Create an local estimator for testing

You run PyTorch training scripts on SageMaker by creating PyTorch Estimators. SageMaker training of your script is invoked when you call fit on a PyTorch Estimator. The following code sample shows how you train a custom PyTorch script `train.py`, passing in three hyperparameters (`epochs`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`

in sagemaker you can test you training in a "local-mode" by setting your instance_type to `'local'`


In [11]:
from sagemaker.pytorch import PyTorch

pytorch_estimator = PyTorch(entry_point='train.py',
                            source_dir='src',
                            base_job_name='huggingface',
                            instance_type='local',
                            instance_count=1,
                            role=role,
                            framework_version='1.6.0',
                            py_version='py3',
                            hyperparameters = {'epochs': 1,
                                               'train-batch-size': 32})

In [14]:
pytorch_estimator.fit()

Using the short-lived AWS credentials found in session. They might expire while running.


FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'

## Create an Estimator

You run PyTorch training scripts on SageMaker by creating PyTorch Estimators. SageMaker training of your script is invoked when you call fit on a PyTorch Estimator. The following code sample shows how you train a custom PyTorch script `train.py`, passing in three hyperparameters (`epochs`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`


In [15]:
from sagemaker.pytorch import PyTorch

pytorch_estimator = PyTorch(entry_point='train.py',
                            source_dir='src',
                            sagemaker_session=sess,
                            base_job_name='huggingface',
                            instance_type='ml.p3.2xlarge',
                            instance_count=1,
                            role=role,
                            framework_version='1.5.0',
                            py_version='py3',
                            hyperparameters = {'epochs': 20, 'batch-size': 64, 'learning-rate': 0.1})

In [None]:
pytorch_estimator.fit()

2020-12-22 10:19:19 Starting - Starting the training job...
2020-12-22 10:19:20 Starting - Launching requested ML instancesProfilerReport-1608632358: InProgress
......
2020-12-22 10:20:44 Starting - Preparing the instances for training......
2020-12-22 10:21:45 Downloading - Downloading input data
2020-12-22 10:21:45 Training - Downloading the training image........