# MLOps - NLP Lab with Amazon SageMaker
**Step 2** - *Train a sentiment analysis model using Transformers on Amazon SageMaker*

## Initialization
---
### Setup environment

In [None]:
import os
import sagemaker
from sagemaker.pytorch import PyTorch as PyTorchEstimator

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()

### Define data inputs from S3

In [None]:
input_location_fname = '../1_prepare_data/training_input_location.txt'
if os.path.exists(input_location_fname):
    with open(input_location_fname, 'r') as f:
        s3_input_location = f.readline()

    inputs = {'train': s3_input_location}
    print(inputs)
    
else:
    print(f'Training input location file not found ({input_location_fname}): check that the previous notebook was fully executed.')

## Train the model
---

As usual, we define our model hyperparameters, define the metric we want to capture, build an Estimator object and launch the training with the `.fit()` method.

In [None]:
# Dictionary of all hyperparameters and variable we want to pass to our training image:
hyperparameters={
        "model_name":"bert-base-cased",
        "data_folder": '/opt/ml/input/data/train',
        "output_folder": '/opt/ml/model',
        "epochs": 1,
        "learning_rate": 2e-5,
        "batch_size": 64,
        "seed": 42,
        "max_len": 160
    }

# We want to capture the following metric: this would be useful for hyperparameter tuning:
metric_definitions = [{'Name': 'validation_accuracy',
                       'Regex': 'val_accuracy: ([0-9\\.]+)'}]

# We define the Estimator object (the one leveraging the PyTorch framework container):
estimator = PyTorchEstimator(
    entry_point='train.py',
    source_dir='source_dir',
    role=role,
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    train_volume_size=50,
    hyperparameters=hyperparameters,
    metric_definitions=metric_definitions,
    framework_version='1.5.0',
    py_version='py3',
)

Let's train this model!

In [None]:
estimator.fit(inputs)

Let's persist the model artifact location: this will be useful for deployment in the next step:

In [None]:
print(estimator.model_data)

with open('model_artifact_location.txt', 'w') as f:
    f.writelines(estimator.model_data)