# Hyperparameter Tuning
> Yuanyuan Sun, 11/27/2023

Automated Hyperparameter Tuning is a feature supported by Amazon SageMaker, enabling the execution of numerous training jobs on the specified hyperparameter ranges within the user's training dataset. The optimal model candidate is determined by selecting the combination of hyperparameters that yields the highest objective metrics, such as maximizing validation accuracy.

Here, I employ a random strategy for hyperparameter tuning, and the specific rationale is detailed in the project paper. For further details, please refer to the provided reference.

Note: This is a separate exploration to determine how to conduct hyperparameter tuning at scale. The tuning and model training are integrated in the code script 'hyperparameter tuning & model training.ipynb'.

# 1. Configure Dataset

In [None]:
import boto3
import sagemaker
import pandas as pd
import botocore

config = botocore.config.Config(user_agent_extra='')

# low-level service client of the boto3 session
sm = boto3.client(service_name='sagemaker', config=config)
sess = sagemaker.Session(sagemaker_client=sm)

bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = sess.boto_region_name

In [4]:
train_data = 's3://{}/transformed/data/sentiment-train/'.format(bucket)
validation_data = 's3://{}/transformed/data/sentiment-validation/'.format(bucket)
test_data = 's3://{}/transformed/data/sentiment-test/'.format(bucket)

In [5]:
# Upload the data to the data lake (S3 bucket):
!aws s3 cp --recursive ./data/sentiment-train $train_data
!aws s3 cp --recursive ./data/sentiment-validation $validation_data
!aws s3 cp --recursive ./data/sentiment-test $test_data

upload: data/sentiment-train/part-algo-1-womens_clothing_ecommerce_reviews.tsv to s3://sagemaker-us-east-1-106271089005/transformed/data/sentiment-train/part-algo-1-womens_clothing_ecommerce_reviews.tsv

upload: data/sentiment-validation/part-algo-1-womens_clothing_ecommerce_reviews.tsv to s3://sagemaker-us-east-1-106271089005/transformed/data/sentiment-validation/part-algo-1-womens_clothing_ecommerce_reviews.tsv

upload: data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv to s3://sagemaker-us-east-1-106271089005/transformed/data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv


In [6]:
# Check datasets that uploaded to S3 buckets
!aws s3 ls --recursive $train_data

2023-11-27 21:22:41    4894416 transformed/data/sentiment-train/part-algo-1-womens_clothing_ecommerce_reviews.tsv


In [7]:
!aws s3 ls --recursive $validation_data

2023-11-27 21:22:42     276522 transformed/data/sentiment-validation/part-algo-1-womens_clothing_ecommerce_reviews.tsv


In [8]:
!aws s3 ls --recursive $test_data

2023-11-27 21:22:43     273414 transformed/data/sentiment-test/part-algo-1-womens_clothing_ecommerce_reviews.tsv


In [9]:
# Define data channels with 'train' and 'validation' keys
from sagemaker.inputs import TrainingInput

data_channels = {
    'train': train_data, 
    'validation': validation_data 
}

# 2. Configure Hyperparameter Tuning Job
Initially, set up static hyperparameters, which include specifications such as instance type, instance count, maximum sequence length, and so on.

In [10]:
# maximum number of input tokens passed to BERT model
max_seq_length=128 
# specifies the depth of training within the network
freeze_bert_layer=False 
epochs=3
train_steps_per_epoch=50
validation_batch_size=64
validation_steps_per_epoch=50
seed=42

train_instance_count=1
train_instance_type='ml.c5.9xlarge'
train_volume_size=256
input_mode='File'
run_validation=True

In [11]:
'''
Certain parameters from this set will be included in the hyperparameters argument for both 
the PyTorch estimator and tuner. Set up the dictionary for the parameters that will be 
included in the hyperparameters argument
'''
hyperparameters_static={
    'freeze_bert_layer': freeze_bert_layer,
    'max_seq_length': max_seq_length,
    'epochs': epochs,
    'train_steps_per_epoch': train_steps_per_epoch,
    'validation_batch_size': validation_batch_size,
    'validation_steps_per_epoch': validation_steps_per_epoch,
    'seed': seed,
    'run_validation': run_validation
}

In [12]:
# Define the ranges for hyperparameters to explore in the tuning job.

from sagemaker.tuner import IntegerParameter
from sagemaker.tuner import ContinuousParameter
from sagemaker.tuner import CategoricalParameter
                                                
hyperparameter_ranges = {
     # specifying continuous variable type, the tuning job will explore the range of values
    'learning_rate': ContinuousParameter(0.00001, 0.00005, scaling_type='Linear'), 
     # specifying categorical variable type, the tuning job will explore only listed values
    'train_batch_size': CategoricalParameter([128, 256]), 
}

In [1]:
# Select loss and accuracy as the evaluation metrics. 
# The regular expressions (Regex) will capture the emitted metric values from the algorithm.
metric_definitions = [
     {'Name': 'validation:loss', 'Regex': 'val_loss: ([0-9.]+)'},
     {'Name': 'validation:accuracy', 'Regex': 'val_acc: ([0-9.]+)'},
]


# 3. Initiate the Tuning Job

In [14]:
from sagemaker.pytorch import PyTorch as PyTorchEstimator
estimator = PyTorchEstimator(
    entry_point='train.py',
    source_dir='src',
    role=role,
    instance_count=train_instance_count,
    instance_type=train_instance_type,
    volume_size=train_volume_size,
    py_version='py3',
    framework_version='1.6.0',
    hyperparameters=hyperparameters_static,
    metric_definitions=metric_definitions,
    input_mode=input_mode,
)

# 4. Employ a Random Search Strategy 
Employ a Random search strategy to identify combinations of hyperparameters, within defined ranges, for each training job within the tuning process. Upon completion of the tuning job, we can choose the hyperparameters employed by the top-performing training job with respect to the objective metric.


In [15]:
# Set up the Hyperparameter Tuner.
from sagemaker.tuner import HyperparameterTuner

tuner = HyperparameterTuner(
    estimator=estimator, 
    hyperparameter_ranges=hyperparameter_ranges, 
    metric_definitions=metric_definitions, 
    strategy='Random',
    objective_type='Maximize',
    objective_metric_name='validation:accuracy',
    max_jobs=2, # maximum number of jobs to run
    max_parallel_jobs=2, # maximum number of jobs to run in parallel
    early_stopping_type='Auto' # early stopping criteria
)

# 5. Launch the SageMaker Hyper-Parameter Tuning (HPT) Job.

In [16]:
tuner.fit(
    inputs = data_channels, 
    include_cls_metadata = False,
    wait=False
)

# 6. Verify the status of the Tuning Job.

In [17]:
tuning_job_name = tuner.latest_tuning_job.job_name
print(tuning_job_name)

pytorch-training-231127-2131


Check the status of the Tuning Job.

In [18]:
from IPython.core.display import display, HTML
    
display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/{}">Hyper-Parameter Tuning Job</a></b>'.format(region, tuning_job_name)))

Wait for the Tuning Job to complete.

In [19]:
%%time

tuner.wait()

...............................................................................................................................................................................................................................................................................................!

CPU times: user 1.31 s, sys: 121 ms, total: 1.43 s

Wall time: 24min 35s


The outcomes of the SageMaker Hyperparameter Tuning Job can be accessed through the analytics of the tuner object. Utilizing the dataframe function directly converts the results into a dataframe. To explore the results, the following lines of code can be used:

In [20]:
import time

time.sleep(10) # slight delay to allow the analytics to be calculated

df_results = tuner.analytics().dataframe()
df_results.shape

(2, 8)

In [21]:
df_results.sort_values('FinalObjectiveValue', ascending=0)

Unnamed: 0,learning_rate,train_batch_size,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
1,2.8e-05,"""128""",pytorch-training-231127-2131-001-a5e33943,Completed,73.440002,2023-11-27 21:33:17+00:00,2023-11-27 21:53:49+00:00,1232.0
0,3.4e-05,"""128""",pytorch-training-231127-2131-002-770f4749,Completed,67.580002,2023-11-27 21:33:08+00:00,2023-11-27 21:53:49+00:00,1241.0


In the context of large-scale training and tuning, ongoing monitoring and selecting appropriate compute resources are crucial. Although we have the flexibility to opt for various compute options, determining the specific instance types and sizes requires a case-by-case approach. There is no one-size-fits-all solution; it hinges on comprehending the workload and conducting empirical testing to ascertain the optimal compute resources for the training process.

In [23]:
from IPython.core.display import display, HTML
    
display(HTML('<b>Review Training Jobs of the <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/{}">Hyper-Parameter Tuning Job</a></b>'.format(region, tuning_job_name)))

# 7. Evaluate the Results

A critical aspect of model development involves assessing the model using a test dataset that it has not encountered during its training phase. The metrics derived from this evaluation serve as a basis for comparing different machine learning models. The greater the value of these metrics, the more effectively the model can generalize to new, unseen data.

In [24]:
# Show the best candidate - the one with the highest accuracy result.
df_results.sort_values(    
    'FinalObjectiveValue', 
    ascending=0).head(1)

Unnamed: 0,learning_rate,train_batch_size,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
1,2.8e-05,"""128""",pytorch-training-231127-2131-001-a5e33943,Completed,73.440002,2023-11-27 21:33:17+00:00,2023-11-27 21:53:49+00:00,1232.0


In [25]:
# Evaluate the best candidate
# The information about the best candidate from the dataframe and then take the Training Job name from the column `TrainingJobName`.

best_candidate = df_results.sort_values('FinalObjectiveValue', ascending=0).iloc[0]

best_candidate_training_job_name = best_candidate['TrainingJobName']
print('Best candidate Training Job name: {}'.format(best_candidate_training_job_name))

Best candidate Training Job name: pytorch-training-231127-2131-001-a5e33943


# 8. Show accuracy result for the best candidate.

In [26]:
best_candidate_accuracy = best_candidate['FinalObjectiveValue'] 
print('Best candidate accuracy result: {}'.format(best_candidate_accuracy))

Best candidate accuracy result: 73.44000244140625


We can use the function `describe_training_job` of the service client to get some more information about the best candidate. The result is in dictionary format. Let's check that it has the same Training Job name:

In [27]:
best_candidate_description = sm.describe_training_job(TrainingJobName=best_candidate_training_job_name)

best_candidate_training_job_name2 = best_candidate_description['TrainingJobName']

print('Training Job name: {}'.format(best_candidate_training_job_name2))

Training Job name: pytorch-training-231127-2131-001-a5e33943


Pull the Tuning Job and Training Job Amazon Resource Name (ARN) from the best candidate training job description.

In [None]:
print(best_candidate_description.keys())

In [28]:
best_candidate_tuning_job_arn = best_candidate_description['TuningJobArn'] 
best_candidate_training_job_arn = best_candidate_description['TrainingJobArn'] 
print('Best candidate Tuning Job ARN: {}'.format(best_candidate_tuning_job_arn))
print('Best candidate Training Job ARN: {}'.format(best_candidate_training_job_arn))

Best candidate Tuning Job ARN: arn:aws:sagemaker:us-east-1:106271089005:hyper-parameter-tuning-job/pytorch-training-231127-2131

Best candidate Training Job ARN: arn:aws:sagemaker:us-east-1:106271089005:training-job/pytorch-training-231127-2131-001-a5e33943


Pull the path of the best candidate model in the S3 bucket. We will need it later to set up the Processing Job for the evaluation.

In [29]:
model_tar_s3_uri = sm.describe_training_job(TrainingJobName=best_candidate_training_job_name)['ModelArtifacts']['S3ModelArtifacts']
print(model_tar_s3_uri)

s3://sagemaker-us-east-1-106271089005/pytorch-training-231127-2131-001-a5e33943/output/model.tar.gz


For conducting model evaluation, we will employ a Processing Job built on scikit-learn. This constitutes a versatile Python Processing Job with pre-installed scikit-learn. We have the flexibility to indicate the desired version of scikit-learn. Additionally, the SageMaker execution role, processing instance type, and instance count need to be provided.

In [30]:
from sagemaker.sklearn.processing import SKLearnProcessor

processing_instance_type = "ml.c5.2xlarge"
processing_instance_count = 1

processor = SKLearnProcessor(
    framework_version="0.23-1",
    role=role,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    max_runtime_in_seconds=7200,
)

The model evaluation Processing Job will be running the Python code from the file [src/evaluate_model_metrics.py](src/evaluate_model_metrics.py). Launch the Processing Job, passing the defined above parameters, custom script, path and the S3 bucket location of the test data.

In [31]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

processor.run(
    code="src/evaluate_model_metrics.py",
    inputs=[
        ProcessingInput(  
            input_name="model-tar-s3-uri",                        
            source=model_tar_s3_uri,                               
            destination="/opt/ml/processing/input/model/"
        ),
        ProcessingInput(
            input_name="evaluation-data-s3-uri",
            source=test_data,                                    
            destination="/opt/ml/processing/input/data/",
        ),
    ],
    outputs=[
        ProcessingOutput(s3_upload_mode="EndOfJob", output_name="metrics", source="/opt/ml/processing/output/metrics"),
    ],
    arguments=["--max-seq-length", str(max_seq_length)],
    logs=True,
    wait=False,
)



Job Name:  sagemaker-scikit-learn-2023-11-27-21-59-50-944

Inputs:  [{'InputName': 'model-tar-s3-uri', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-106271089005/pytorch-training-231127-2131-001-a5e33943/output/model.tar.gz', 'LocalPath': '/opt/ml/processing/input/model/', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'evaluation-data-s3-uri', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-106271089005/transformed/data/sentiment-test/', 'LocalPath': '/opt/ml/processing/input/data/', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-106271089005/sagemaker-scikit-learn-2023-11-27-21-59-50-944/input/code/evaluate_model_metrics.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 

You can see the information about the Processing Jobs using the `describe` function. The result is in dictionary format. Let's pull the Processing Job name:

In [32]:
scikit_processing_job_name = processor.jobs[-1].describe()["ProcessingJobName"]

print('Processing Job name: {}'.format(scikit_processing_job_name))

Processing Job name: sagemaker-scikit-learn-2023-11-27-21-59-50-944


# 9. Pull the Processing Job status from the Processing Job description.

Print the keys of the Processing Job description dictionary, choose the one related to the status of the Processing Job and print the value of it.

In [33]:
print(processor.jobs[-1].describe().keys())

dict_keys(['ProcessingInputs', 'ProcessingOutputConfig', 'ProcessingJobName', 'ProcessingResources', 'StoppingCondition', 'AppSpecification', 'RoleArn', 'ProcessingJobArn', 'ProcessingJobStatus', 'LastModifiedTime', 'CreationTime', 'ResponseMetadata'])


In [34]:
scikit_processing_job_status = processor.jobs[-1].describe()['ProcessingJobStatus']
print('Processing job status: {}'.format(scikit_processing_job_status))

Processing job status: InProgress


In [35]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/processing-jobs/{}">Processing Job</a></b>'.format(
            region, scikit_processing_job_name
        )
    )
)

In [38]:
# Monitor the Processing Job

from pprint import pprint

running_processor = sagemaker.processing.ProcessingJob.from_processing_name(
    processing_job_name=scikit_processing_job_name, sagemaker_session=sess
)

processing_job_description = running_processor.describe()

pprint(processing_job_description)

{'AppSpecification': {'ContainerArguments': ['--max-seq-length', '128'],

                      'ContainerEntrypoint': ['python3',

                                              '/opt/ml/processing/input/code/evaluate_model_metrics.py'],

                      'ImageUri': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3'},

 'CreationTime': datetime.datetime(2023, 11, 27, 21, 59, 51, 424000, tzinfo=tzlocal()),

 'LastModifiedTime': datetime.datetime(2023, 11, 27, 21, 59, 52, 49000, tzinfo=tzlocal()),

 'ProcessingInputs': [{'AppManaged': False,

                       'InputName': 'model-tar-s3-uri',

                       'S3Input': {'LocalPath': '/opt/ml/processing/input/model/',

                                   'S3CompressionType': 'None',

                                   'S3DataDistributionType': 'FullyReplicated',

                                   'S3DataType': 'S3Prefix',

                                   'S3InputMode': 'File',

     

In [39]:
%%time

running_processor.wait(logs=False)

....................................................................!CPU times: user 306 ms, sys: 25.2 ms, total: 331 ms

Wall time: 5min 44s


# 10. Inspect the processed output data

In [40]:
# Take a look at the results of the Processing Job. Get the S3 bucket location of the output metrics:
processing_job_description = running_processor.describe()

output_config = processing_job_description["ProcessingOutputConfig"]
for output in output_config["Outputs"]:
    if output["OutputName"] == "metrics":
        processed_metrics_s3_uri = output["S3Output"]["S3Uri"]

print(processed_metrics_s3_uri)

s3://sagemaker-us-east-1-106271089005/sagemaker-scikit-learn-2023-11-27-21-59-50-944/output/metrics


List the content of the folder:

In [41]:
!aws s3 ls $processed_metrics_s3_uri/

2023-11-27 22:06:27      19721 confusion_matrix.png

2023-11-27 22:06:27         56 evaluation.json


In [42]:
# The test accuracy can be extracted from the evaluation.json file.

import json
from pprint import pprint

metrics_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
    processed_metrics_s3_uri
))

print('Test accuracy: {}'.format(json.loads(metrics_json)))

Test accuracy: {'metrics': {'accuracy': {'value': 0.7540453074433657}}}


Copy image with the confusion matrix generated during the model evaluation into the folder `generated`.

In [43]:
!aws s3 cp $processed_metrics_s3_uri/confusion_matrix.png ./generated/

import time
time.sleep(10) # Slight delay for our notebook to recognize the newly-downloaded file

download: s3://sagemaker-us-east-1-106271089005/sagemaker-scikit-learn-2023-11-27-21-59-50-944/output/metrics/confusion_matrix.png to generated/confusion_matrix.png


Show and review the confusion matrix, which is a table of all combinations of true (actual) and predicted labels. Each cell contains the number of the reviews for the corresponding sentiments. You can see that the highest numbers of the reviews appear in the diagonal cells, where the predicted sentiment equals the actual one.<br>
Note: I put the related output files (`confusion_matrix.png`, `evaluation.json`) in ./code-hyperparameter tuning/output