# Huggingface Sagemaker-sdk extension example using `Trainer` class

## Initializing Sagemaker Session with local AWS Profile

From outside these notebooks, `get_execution_role()` will return an exception because it does not know what is the role name that SageMaker requires.

To solve this issue, pass the IAM role name instead of using `get_execution_role()`.

Therefore you have to create an IAM-Role with correct permission for sagemaker to start training jobs and download files from s3. Beware that you need s3 permission on bucket-level `"arn:aws:s3:::sagemaker-*"` and on object-level     `"arn:aws:s3:::sagemaker-*/*"`. 

You can read [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) how to create a role with right permissions.

In [1]:
# local aws profile configured in ~/.aws/credentials
local_profile_name='default' # optional if you only have default configured

# role name for sagemaker -> needs the described permissions from above
role_name = "AmazonSageMaker-ExecutionRole-20201222T210251"

In [2]:
import sagemaker
import os
try:
    sess = sagemaker.Session()
    role = sagemaker.get_execution_role()
except Exception:
    import boto3
    # creates a boto3 session using the local profile we defined
    if local_profile_name:
        os.environ['AWS_PROFILE'] = local_profile_name # setting env var bc local-mode cannot use boto3 session
        #bt3 = boto3.session.Session(profile_name=local_profile_name)
        #iam = bt3.client('iam')
        # create sagemaker session with boto3 session
        #sess = sagemaker.Session(boto_session=bt3)
    iam = boto3.client('iam')
    sess = sagemaker.Session()
    # get role arn
    role = iam.get_role(RoleName=role_name)['Role']['Arn']
    


print(role)


Couldn't call 'get_role' to get Role ARN from role name lagunas to get Role path.


arn:aws:iam::854676674973:role/service-role/AmazonSageMaker-ExecutionRole-20201222T210251


### Sagemaker Session prints

In [3]:
print(sess.list_s3_files(sess.default_bucket(),'datasets/')) # list objects in s3 under datsets/
print(sess.default_bucket()) # s3 bucketname
print(sess.boto_region_name) # aws region of sagemaker session

['datasets/imdb/test/dataset.arrow', 'datasets/imdb/test/dataset_info.json', 'datasets/imdb/test/state.json', 'datasets/imdb/train/dataset.arrow', 'datasets/imdb/train/dataset_info.json', 'datasets/imdb/train/state.json']
sagemaker-eu-west-1-854676674973
eu-west-1


# Imports

Since we are using the `.py` module directly from `huggingface/` we have to adjust our `sys.path` to be able to import our estimator

In [4]:
import sys, os

module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)


# Preprocessing the data

## Upload data to sagemaker S3

## Create an local estimator for testing

You run PyTorch training scripts on SageMaker by creating PyTorch Estimators. SageMaker training of your script is invoked when you call fit on a PyTorch Estimator. The following code sample shows how you train a custom PyTorch script `train.py`, passing in three hyperparameters (`epochs`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`

in sagemaker you can test you training in a "local-mode" by setting your instance_type to `'local'`


## Importing custom sdk-extension for HuggingFace

In [5]:
from huggingface.estimator import HuggingFace

## Create an local Estimator

The following code sample shows how you train a custom HuggingFace script `train.py`, passing in three hyperparameters (`epochs`,`train_batch_size`,`model_name`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`

In [7]:

local = True
if local:
    instance_type = "local"
    sess = None
    batch_size = 1
    num_train_epochs = 0.0005
    logging_steps=20
else:
    instance_type = "ml.p3.2xlarge"
    sagemaker_session=sess
    batch_size = 16
    num_train_epochs=20
    logging_steps=250

    
    
def build_metric_definitions():
    ret = []
    train_metrics = ['loss',
 'learning_rate',
 'threshold',
 'ampere_temperature',
 'regu_lambda',
 'ce_loss',
 'distil_loss',
 'nnz_perc_attention',
 'regu_loss_attention',
 'nnz_perc_dense',
 'regu_loss_dense',
 'regu_loss',
 'nnz_perc',
 'epoch']
    eval_metrics = ["f1", "exact_match"]
        
    metric_types = {"train":("",train_metrics), "validation":("eval_", eval_metrics)}
    for k, (prefix, metrics) in metric_types.items():
        for m in metrics:
            ret += {'Name': f"{k}:{m}", 'Regex':f"'{prefix}{m}': (.*?),"},
    return ret
        
    
metric_definitions = build_metric_definitions()

from nn_pruning.examples.question_answering.qa_sparse_xp import SparseQAShortNamer

def estimator_build(attention:int, regu_lambda:float):
    hyperparameters = {"attention_block_rows":attention,
                       "attention_block_cols":attention,
                       "regularization_final_lambda": regu_lambda,
                       "num_train_epochs":num_train_epochs,
                       "logging_steps":logging_steps,
                       "per_device_train_batch_size":batch_size}
    
    hyperparameters = {k.replace("_", "-"):v for k,v in hyperparameters.items()}

    def get_hp_name(hyper_parameters):
        p = {k.replace("-", "_"):v for k,v in hyper_parameters.items()}    

        sn = SparseQAShortNamer()

        ret = sn.shortname(p)
        return ret

    base_job_name = "nn-pruning-v1" #+ get_hp_name(hyperparameters)[3:].replace(".", "-")
    print(base_job_name)

    huggingface_estimator = HuggingFace(entry_point='nn_pruning_train.py',
                                        source_dir='../scripts',
                                        sagemaker_session=sess,
                                        base_job_name=base_job_name,
                                        volume_size=50,
                                        instance_type=instance_type,
                                        instance_count=1,
                                        role=role,
                                        framework_version={'transformers':'4.1.1','datasets':'1.1.3'},
                                        py_version='py3',
                                        metric_definitions = metric_definitions,
                                        hyperparameters = hyperparameters)
    #print(huggingface_estimator.image_uri)
    return huggingface_estimator

estimators = []

attentions = [16]
regu_lambdas = [20.0]
for attention in attentions:
    for regu_lambda in regu_lambdas:
        estimator = estimator_build(attention, regu_lambda)
        print(estimator)
        estimator.fit(wait = True)


nn-pruning-v1
IMAGE_URI 854676674973.dkr.ecr.eu-west-1.amazonaws.com/huggingface-nn-pruning-training:0.0.1-gpu-transformers4.1.1-datasets1.1.3-cu110
<huggingface.estimator.HuggingFace object at 0x7ffb35f520d0>
2021-01-15 14:36:12,060 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2021-01-15 14:36:12,415 - sagemaker.image_uris - INFO - Defaulting to the only supported framework/algorithm version: latest.
2021-01-15 14:36:12,426 - sagemaker.image_uris - INFO - Ignoring unnecessary instance type: None.
2021-01-15 14:36:39,455 - sagemaker - INFO - Creating training-job with name: nn-pruning-v1-2021-01-15-13-36-11-984
2021-01-15 14:36:39,456 - sagemaker.local.local_session - INFO - Starting training job
2021-01-15 14:36:39,457 - sagemaker.local.image - INFO - Using the long-lived AWS credentials found in session
2021-01-15 14:36:39,459 - sagemaker.local.image - INFO - docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-loc

[36malgo-1-4uojc_1  |[0m ['nn_pruning_train.py', '--attention-block-cols', '16', '--attention-block-rows', '16', '--logging-steps', '20', '--num-train-epochs', '0.0005', '--per-device-train-batch-size', '1', '--regularization-final-lambda', '20.0']
[36malgo-1-4uojc_1  |[0m {'model_name_or_path': 'bert-base-uncased', 'dataset_name': 'squad', 'do_train': 1, 'do_eval': 1, 'per_device_train_batch_size': 1, 'max_seq_length': 384, 'doc_stride': 128, 'num_train_epochs': 0.0005, 'logging_steps': 20, 'save_steps': 5000, 'eval_steps': 5000, 'save_total_limit': 50, 'seed': 17, 'evaluation_strategy': 'steps', 'learning_rate': 3e-05, 'mask_scores_learning_rate': 0.01, 'output_dir': '/opt/ml/model', 'logging_dir': '/opt/ml/output', 'overwrite_cache': 0, 'overwrite_output_dir': 1, 'warmup_steps': 5400, 'initial_warmup': 1, 'final_warmup': 10, 'initial_threshold': 0, 'final_threshold': 0.1, 'dense_pruning_method': 'sigmoied_threshold:1d_alt', 'dense_block_rows': 1, 'dense_block_cols': 1, 'dense_l

KeyboardInterrupt: 

In [None]:
huggingface_estimator.fit()

## Create an Estimator

The following code sample shows how you train a custom HuggingFace script `train.py`, passing in three hyperparameters (`epochs`,`train_batch_size`,`model_name`). We are not going to pass any data into sagemaker training job instead it will be downloaded in `train.py`


In [None]:
from huggingface.estimator import HuggingFace


huggingface_estimator = HuggingFace(entry_point='train.py',
                            source_dir='../scripts',
                            sagemaker_session=sess,
                            base_job_name='huggingface-sdk-extension',
                            instance_type='ml.p3.2xlarge',
                            instance_count=1,
                            role=role,
                            framework_version={'transformers':'4.1.1','datasets':'1.1.3'},
                            py_version='py3',
                            hyperparameters = {'epochs': 1,
                                               'train_batch_size': 32,
                                               'model_name':'distilbert-base-uncased'
                                                })

In [None]:
huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path})

# Estimator Parameters

### Get S3 url for model data

In [None]:
huggingface_estimator.model_data

### Get latest training job name

In [None]:
huggingface_estimator.latest_training_job.name

### Attach to old estimator 

e.g. to get model data

In [None]:
old_job_name='huggingface-sdk-extension-2020-12-27-15-25-50-506'

In [None]:
from sagemaker.estimator import Estimator

In [None]:
huggingface_estimator_loaded = Estimator.attach(old_job_name)

In [None]:
huggingface_estimator_loaded.model_data

### Download model from s3

**using huggingface utils**

In [None]:
from huggingface.utils import download_model

download_model(model_data=huggingface_estimator_loaded.model_data,
               unzip=True,
               model_dir=huggingface_estimator_loaded.latest_training_job.name)

**using class built-in method**

In [None]:
huggingface_estimator.download_model(unzip=False)

### Access logs

until [PR](https://github.com/aws/sagemaker-python-sdk/pull/2059) is merged

In [None]:
huggingface_estimator.sagemaker_session.logs_for_job(huggingface_estimator.latest_training_job.name, wait=True)

**after merged PR**

In [None]:
huggingface_estimator.logs()

In [None]:
hyperparameters = {"attention_block_rows":attention,
                   "attention_block_cols":attention,
                   "regularization_final_lambda": regu_lambda,
                   "num_train_epochs":num_train_epochs,
                   "logging_steps":logging_steps,
                   "per_device_train_batch_size":batch_size}
hp2 = {k.replace("-", "_"):v for k,v in hyperparameters.items()}

In [None]:
hp2