# Warm Start from a Completed Hyper-Parameter Tuning Job
Once the previous hyper-parameter tuning job completes, we can perform another round of optimization using `Warm Start`.

Warm start configuration allows you to create a new tuning job with the learning gathered in a parent tuning job by specifying up to 5 parent tuning jobs. If a warm start configuration is specified, Automatic Model Tuning will load the previous [hyperparameter set, objective metrics values] to warm start the new tuning job. This means, you can continue optimizing your model from the point you finished your previous tuning job experiment.

<img src="img/hpt-warmstart.png" width="90%" align="left">

In [1]:
import boto3
import sagemaker
import pandas as pd

sess = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name="sagemaker", region_name=region)

# Pre-Requisite

## The previous hyper-parameter tuning job needs to complete, before we can perform another round of optimization using `Warm Start`.

In [2]:
%store -r tuning_job_name

In [3]:
try:
    tuning_job_name
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the previous Hyperparameter Tuning notebook.")
    print("+++++++++++++++++++++++++++++++++++++++++++++")

[OK]


In [4]:
print(tuning_job_name)

tensorflow-training-240306-1631


### Check the status of the previous Hyperparameter Job

In [5]:
job_description = sm.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)

In [6]:
if not bool(job_description):
    print("+++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the previous Hyperparameter Tuning notebook before you continue.")
    print("+++++++++++++++++++++++++++++++++++++++++++++")
elif job_description["HyperParameterTuningJobStatus"] == "Completed":
    print("[OK] Previous Tuning Job has completed. Please continue.")
else:
    print("+++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the previous Hyperparameter Tuning notebook.")
    print("+++++++++++++++++++++++++++++++++++++++++++++")

[OK] Previous Tuning Job has completed. Please continue.


# Specify the S3 Location of the Features

In [7]:
%store -r processed_train_data_s3_uri

In [8]:
try:
    processed_train_data_s3_uri
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous PREPARE section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


In [9]:
print(processed_train_data_s3_uri)

s3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-train


In [10]:
%store -r processed_validation_data_s3_uri

In [11]:
try:
    processed_validation_data_s3_uri
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous PREPARE section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


In [12]:
print(processed_validation_data_s3_uri)

s3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-validation


In [13]:
%store -r processed_test_data_s3_uri

In [14]:
try:
    processed_test_data_s3_uri
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous PREPARE section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


In [15]:
print(processed_test_data_s3_uri)

s3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-test


In [16]:
print(processed_train_data_s3_uri)
!aws s3 ls $processed_train_data_s3_uri/

s3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-train
2024-03-03 03:59:47    6988023 part-algo-1-amazon_reviews_us_Digital_Software_v1_00.tfrecord
2024-03-03 03:59:47    1541730 part-algo-1-amazon_reviews_us_Gift_Card_v1_00.tfrecord
2024-03-03 03:59:42    7809721 part-algo-2-amazon_reviews_us_Digital_Video_Games_v1_00.tfrecord


In [17]:
print(processed_validation_data_s3_uri)
!aws s3 ls $processed_validation_data_s3_uri/

s3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-validation
2024-03-03 03:59:47    2330909 part-algo-1-amazon_reviews_us_Digital_Software_v1_00.tfrecord
2024-03-03 03:59:47     513937 part-algo-1-amazon_reviews_us_Gift_Card_v1_00.tfrecord
2024-03-03 03:59:42    2604075 part-algo-2-amazon_reviews_us_Digital_Video_Games_v1_00.tfrecord


In [18]:
print(processed_test_data_s3_uri)
!aws s3 ls $processed_test_data_s3_uri/

s3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-test
2024-03-03 03:59:47    2326926 part-algo-1-amazon_reviews_us_Digital_Software_v1_00.tfrecord
2024-03-03 03:59:47     514186 part-algo-1-amazon_reviews_us_Gift_Card_v1_00.tfrecord
2024-03-03 03:59:42    2597730 part-algo-2-amazon_reviews_us_Digital_Video_Games_v1_00.tfrecord


In [19]:
from sagemaker.inputs import TrainingInput

s3_input_train_data = TrainingInput(s3_data=processed_train_data_s3_uri, distribution="ShardedByS3Key")
s3_input_validation_data = TrainingInput(s3_data=processed_validation_data_s3_uri, distribution="ShardedByS3Key")
s3_input_test_data = TrainingInput(s3_data=processed_test_data_s3_uri, distribution="ShardedByS3Key")

print(s3_input_train_data.config)
print(s3_input_validation_data.config)
print(s3_input_test_data.config)

{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-train', 'S3DataDistributionType': 'ShardedByS3Key'}}}
{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-validation', 'S3DataDistributionType': 'ShardedByS3Key'}}}
{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': 's3://sagemaker-us-east-1-211125778552/sagemaker-scikit-learn-2024-03-03-03-46-55-548/output/bert-test', 'S3DataDistributionType': 'ShardedByS3Key'}}}


In [20]:
!cat src/tf_bert_reviews.py

import time
import random
import pandas as pd
from glob import glob
import pprint
import argparse
import json
import subprocess
import sys
import os
import csv

# subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'tensorflow==2.1.0'])
import tensorflow as tf
import pandas as pd
import numpy as np

subprocess.check_call([sys.executable, "-m", "pip", "install", "transformers==3.5.1"])
# subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker-tensorflow==2.1.0.1.0.0'])
# subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'smdebug==0.9.3'])
subprocess.check_call([sys.executable, "-m", "pip", "install", "scikit-learn==0.23.1"])
subprocess.check_call([sys.executable, "-m", "pip", "install", "matplotlib==3.2.1"])

from transformers import DistilBertTokenizer
from transformers import DistilBertConfig
from transformers import TFDistilBertForSequenceClassification

from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models impor

# Setup Hyper-Parameters for Classification Layer
First, retrieve `max_seq_length` from the prepare phase.

In [21]:
%store -r max_seq_length

In [22]:
try:
    max_seq_length
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous PREPARE section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


In [23]:
print(max_seq_length)

64


In [24]:
epochs = 1
epsilon = 0.00000001
train_batch_size = 128
validation_batch_size = 128
test_batch_size = 128
train_steps_per_epoch = 100
validation_steps = 100
test_steps = 100
train_instance_count = 1
train_instance_type = "ml.c5.4xlarge"
train_volume_size = 1024
use_xla = True
use_amp = True
enable_sagemaker_debugger = False
enable_checkpointing = False
enable_tensorboard = False
input_mode = "File"
run_validation = True
run_test = True
run_sample_predictions = True

# Track the Optimizations Within our Experiment

In [25]:
%store -r experiment_name

In [26]:
try:
    experiment_name
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous TRAIN section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


In [27]:
print(experiment_name)

Amazon-Customer-Reviews-BERT-Experiment-1709436495


In [28]:
%store -r trial_name

In [29]:
try:
    trial_name
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous TRAIN section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


In [30]:
print(trial_name)

trial-1709436495


In [31]:
import time
from smexperiments.trial import Trial

timestamp = "{}".format(int(time.time()))

trial = Trial.load(trial_name=trial_name)
print(trial)

Trial(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fccdb22a050>,trial_name='trial-1709436495',trial_arn='arn:aws:sagemaker:us-east-1:211125778552:experiment-trial/trial-1709436495',display_name='trial-1709436495',experiment_name='Amazon-Customer-Reviews-BERT-Experiment-1709436495',creation_time=datetime.datetime(2024, 3, 3, 3, 28, 16, 49000, tzinfo=tzlocal()),created_by={'UserProfileArn': 'arn:aws:sagemaker:us-east-1:211125778552:user-profile/d-k0wihxzgpgsi/default-user', 'UserProfileName': 'default-user', 'DomainId': 'd-k0wihxzgpgsi'},last_modified_time=datetime.datetime(2024, 3, 6, 16, 51, 11, 857000, tzinfo=tzlocal()),last_modified_by={'UserProfileArn': 'arn:aws:sagemaker:us-east-1:211125778552:user-profile/d-k0wihxzgpgsi/default-user'},response_metadata={'RequestId': '15215458-f1f4-4125-bde1-bddc85d8e405', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '15215458-f1f4-4125-bde1-bddc85d8e405', 'content-type': 'application/x-amz-json-1.1', 'content-length'

In [32]:
from smexperiments.tracker import Tracker

tracker_optimize = Tracker.create(display_name="optimize-2", sagemaker_boto_client=sm)

optimize_trial_component_name = tracker_optimize.trial_component.trial_component_name
print("Optimize trial component name {}".format(optimize_trial_component_name))

Optimize trial component name TrialComponent-2024-03-06-165147-drsw


# Attach the `deploy` Trial Component and Tracker as a Component to the Trial

In [33]:
trial.add_trial_component(tracker_optimize.trial_component)

# Setup Dynamic Hyper-Parameter Ranges to Explore
While not necessary, we can choose to statically define any hyper-parameters that we are not choosing to explore in this WarmStart optimization run.


In [34]:
from sagemaker.tuner import IntegerParameter
from sagemaker.tuner import ContinuousParameter
from sagemaker.tuner import CategoricalParameter
from sagemaker.tuner import HyperparameterTuner

hyperparameter_ranges = {
    "learning_rate": ContinuousParameter(0.00015, 0.00075, scaling_type="Linear"),
    "train_batch_size": CategoricalParameter([64, 128]),
    "freeze_bert_layer": CategoricalParameter([True, False]),
}

# Track the Hyper-Parameter Ranges

In [35]:
tracker_optimize.log_parameters(hyperparameter_ranges)

# must save after logging
tracker_optimize.trial_component.save()

TrialComponent(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fcce036ddd0>,trial_component_name='TrialComponent-2024-03-06-165147-drsw',display_name='optimize-2',tags=None,trial_component_arn='arn:aws:sagemaker:us-east-1:211125778552:experiment-trial-component/TrialComponent-2024-03-06-165147-drsw',response_metadata={'RequestId': '266dc5ed-4206-4f19-89ff-9331535d7bc1', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '266dc5ed-4206-4f19-89ff-9331535d7bc1', 'content-type': 'application/x-amz-json-1.1', 'content-length': '129', 'date': 'Wed, 06 Mar 2024 16:51:47 GMT'}, 'RetryAttempts': 0},parameters={'learning_rate': <sagemaker.parameter.ContinuousParameter object at 0x7fccda88fa10>, 'train_batch_size': <sagemaker.parameter.CategoricalParameter object at 0x7fccda88fb90>, 'freeze_bert_layer': <sagemaker.parameter.CategoricalParameter object at 0x7fccda88fb50>},input_artifacts={},output_artifacts={})

# Setup Metrics

In [36]:
metrics_definitions = [
    {"Name": "train:loss", "Regex": "loss: ([0-9\\.]+)"},
    {"Name": "train:accuracy", "Regex": "accuracy: ([0-9\\.]+)"},
    {"Name": "validation:loss", "Regex": "val_loss: ([0-9\\.]+)"},
    {"Name": "validation:accuracy", "Regex": "val_accuracy: ([0-9\\.]+)"},
]

In [37]:
from sagemaker.tensorflow import TensorFlow

estimator = TensorFlow(
    entry_point="tf_bert_reviews.py",
    source_dir="src",
    role=role,
    instance_count=train_instance_count,  # Make sure you have at least this number of input files or the ShardedByS3Key distibution strategy will fail the job due to no data available
    instance_type=train_instance_type,
    volume_size=train_volume_size,
    py_version="py37",
    framework_version="2.3.1",
    hyperparameters={
        "epochs": epochs,
        "epsilon": epsilon,
        "validation_batch_size": validation_batch_size,
        "test_batch_size": test_batch_size,
        "train_steps_per_epoch": train_steps_per_epoch,
        "validation_steps": validation_steps,
        "test_steps": test_steps,
        "use_xla": use_xla,
        "use_amp": use_amp,
        "max_seq_length": max_seq_length,
        "enable_sagemaker_debugger": enable_sagemaker_debugger,
        "enable_checkpointing": enable_checkpointing,
        "enable_tensorboard": enable_tensorboard,
        "run_validation": run_validation,
        "run_test": run_test,
        "run_sample_predictions": run_sample_predictions,
    },
    input_mode=input_mode,
    metric_definitions=metrics_definitions,
    #                       max_run=7200 # max 2 hours * 60 minutes seconds per hour * 60 seconds per minute
)

# Setup Warm Start Config
We configure `WarmStartConfig` using 1 or more  of the previous hyper-parameter tuning job runs called the `parent` jobs - as well as a `WarmStartType`.  The parents must have finished either with one of the following success or failure states: `Completed`, `Stopped`, or `Failed`.

`WarmStartType` is one of the following strategies:

* `IDENTICAL_DATA_AND_ALGORITHM` uses the same input data and algorithm as the parent tuning jobs, but allows a practitioner to explore more hyper-parameter range values.  Upon completion, a tuning job with this strategy will return an additional field, `OverallBestTrainingJob` containing the best model candidate including this tuning job as well as the completed parent tuning jobs.
* `TRANSFER_LEARNING` allows you to transfer the knowledge from previous tuning jobs.  You can use different input dataset and algorithm - as well as everything from the `IDENTICAL_DATA_AND_ALGORITHM` strategy.

_Note:  Recursive parent-child relationships are not supported._

In [38]:
print("Previous Tuning Job Name: {}".format(tuning_job_name))

Previous Tuning Job Name: tensorflow-training-240306-1631


In [39]:
from sagemaker.tuner import WarmStartConfig
from sagemaker.tuner import WarmStartTypes

warm_start_config = WarmStartConfig(
    warm_start_type=WarmStartTypes.IDENTICAL_DATA_AND_ALGORITHM, parents={tuning_job_name}
)

# Setup HyperparameterTuner with Warm Start Config including New Hyper-Parameter Ranges

In [40]:
objective_metric_name = "train:accuracy"

tuner = HyperparameterTuner(
    estimator=estimator,
    objective_type="Maximize",
    objective_metric_name=objective_metric_name,
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=metrics_definitions,
    max_jobs=2,
    max_parallel_jobs=1,
    strategy="Bayesian",
    early_stopping_type="Auto",
    warm_start_config=warm_start_config,
)

# Start Tuning Job

In [41]:
tuner.fit(
    {"train": s3_input_train_data, "validation": s3_input_validation_data, "test": s3_input_test_data},
    include_cls_metadata=False,
    wait=False,
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating hyperparameter tuning job with name: tensorflow-training-240306-1651


# If You See an Error, Please Wait for the Hyper-Parameter Tuning Job to Complete from the Previous Notebook

##  Check Tuning Job Status

Re-run this cell to track the status.

In [42]:
from pprint import pprint

tuning_job_name = tuner.latest_tuning_job.job_name

job_description = sm.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)

status = job_description["HyperParameterTuningJobStatus"]

print("\n")
print(status)
print("\n")
pprint(job_description)

if status != "Completed":
    job_count = job_description["TrainingJobStatusCounters"]["Completed"]
    print("Not yet complete, but {} jobs have completed.".format(job_count))

    if job_description.get("BestTrainingJob", None):
        print("Best candidate:")
        pprint(job_description["BestTrainingJob"]["TrainingJobName"])
        pprint(job_description["BestTrainingJob"]["FinalHyperParameterTuningJobObjectiveMetric"])
    else:
        print("No training jobs have reported results yet.")



InProgress


{'ConsumedResources': {'RuntimeInSeconds': 0},
 'CreationTime': datetime.datetime(2024, 3, 6, 16, 51, 50, 138000, tzinfo=tzlocal()),
 'HyperParameterTuningJobArn': 'arn:aws:sagemaker:us-east-1:211125778552:hyper-parameter-tuning-job/tensorflow-training-240306-1651',
 'HyperParameterTuningJobConfig': {'HyperParameterTuningJobObjective': {'MetricName': 'train:accuracy',
                                                                        'Type': 'Maximize'},
                                   'ParameterRanges': {'CategoricalParameterRanges': [{'Name': 'train_batch_size',
                                                                                       'Values': ['"64"',
                                                                                                  '"128"']},
                                                                                      {'Name': 'freeze_bert_layer',
                                                                           

In [43]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/hyper-tuning-jobs/{}">Hyper-Parameter Tuning Job</a></b>'.format(
            region, tuning_job_name
        )
    )
)

# _Please Wait for the ^^ Tuning Job ^^ to Complete Above_

In [44]:
tuner.wait()

.....................................................................................................................................................................................................................................................................................!


# Show the Tuning Job
### _Note:  This will fail at first.  Please wait about 15-30 seconds and re-run._

In [45]:
from sagemaker.analytics import HyperparameterTuningJobAnalytics

hp_results = HyperparameterTuningJobAnalytics(sagemaker_session=sess, hyperparameter_tuning_job_name=tuning_job_name)

df_results = hp_results.dataframe()
df_results.shape

(2, 9)

In [46]:
df_results.sort_values("FinalObjectiveValue", ascending=0)

Unnamed: 0,freeze_bert_layer,learning_rate,train_batch_size,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,"""False""",0.000189,"""128""",tensorflow-training-240306-1651-002-2601ee0c,Stopped,0.3843,2024-03-06 17:06:45+00:00,2024-03-06 17:15:28+00:00,523.0
1,"""False""",0.000342,"""64""",tensorflow-training-240306-1651-001-3233e475,Completed,0.2959,2024-03-06 16:52:42+00:00,2024-03-06 17:04:25+00:00,703.0


# Show the Overall Best Candidate

In [47]:
df_results.sort_values("FinalObjectiveValue", ascending=0).head(1)

Unnamed: 0,freeze_bert_layer,learning_rate,train_batch_size,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,"""False""",0.000189,"""128""",tensorflow-training-240306-1651-002-2601ee0c,Stopped,0.3843,2024-03-06 17:06:45+00:00,2024-03-06 17:15:28+00:00,523.0


In [48]:
best_candidate_tuning_job_name = df_results.sort_values("FinalObjectiveValue", ascending=0).head(1)["TrainingJobName"]

# Log the Best Hyper-Parameter and Objective Metric in the Experiment

Logging `learning_rate` parameter and `accuracy` metric

In [49]:
best_learning_rate = df_results.sort_values("FinalObjectiveValue", ascending=0).head(1)["learning_rate"]
print(best_learning_rate)

0    0.000189
Name: learning_rate, dtype: float64


In [50]:
best_accuracy = df_results.sort_values("FinalObjectiveValue", ascending=0).head(1)["FinalObjectiveValue"]
print(best_accuracy)

0    0.3843
Name: FinalObjectiveValue, dtype: float64


In [51]:
tracker_optimize.log_parameters({"learning_rate": float(best_learning_rate)})

# must save after logging
tracker_optimize.trial_component.save()

TrialComponent(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fcce036ddd0>,trial_component_name='TrialComponent-2024-03-06-165147-drsw',display_name='optimize-2',tags=None,trial_component_arn='arn:aws:sagemaker:us-east-1:211125778552:experiment-trial-component/TrialComponent-2024-03-06-165147-drsw',response_metadata={'RequestId': '9383cbed-9861-445e-b0ff-c0fcf48d2632', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '9383cbed-9861-445e-b0ff-c0fcf48d2632', 'content-type': 'application/x-amz-json-1.1', 'content-length': '129', 'date': 'Wed, 06 Mar 2024 17:15:32 GMT'}, 'RetryAttempts': 0},parameters={'learning_rate': 0.00018922006661625286, 'train_batch_size': <sagemaker.parameter.CategoricalParameter object at 0x7fccda88fb90>, 'freeze_bert_layer': <sagemaker.parameter.CategoricalParameter object at 0x7fccda88fb50>},input_artifacts={},output_artifacts={})

In [52]:
tracker_optimize.log_metric("accuracy", float(best_accuracy))

tracker_optimize.trial_component.save()

TrialComponent(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fcce036ddd0>,trial_component_name='TrialComponent-2024-03-06-165147-drsw',display_name='optimize-2',tags=None,trial_component_arn='arn:aws:sagemaker:us-east-1:211125778552:experiment-trial-component/TrialComponent-2024-03-06-165147-drsw',response_metadata={'RequestId': '3d206742-c5ff-402d-8e9d-24203cc6ac0d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3d206742-c5ff-402d-8e9d-24203cc6ac0d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '129', 'date': 'Wed, 06 Mar 2024 17:15:32 GMT'}, 'RetryAttempts': 0},parameters={'learning_rate': 0.00018922006661625286, 'train_batch_size': <sagemaker.parameter.CategoricalParameter object at 0x7fccda88fb90>, 'freeze_bert_layer': <sagemaker.parameter.CategoricalParameter object at 0x7fccda88fb50>},input_artifacts={},output_artifacts={})

## _Ignore any ^^ ERROR above ^^. This is OK._

# Show Experiment Analytics

In [53]:
from sagemaker.analytics import ExperimentAnalytics

lineage_table = ExperimentAnalytics(
    sagemaker_session=sess,
    experiment_name=experiment_name,
    metric_names=["validation:accuracy"],
    sort_by="CreationTime",
    sort_order="Descending",
)

lineage_df = lineage_table.dataframe()
lineage_df.shape

(6, 71)

In [54]:
lineage_df

Unnamed: 0,TrialComponentName,DisplayName,freeze_bert_layer,learning_rate,train_batch_size,Trials,Experiments,SourceArn,SageMaker.InstanceCount,SageMaker.InstanceType,...,SageMaker.ModelArtifact - Value,AWS_DEFAULT_REGION,raw-input-data - MediaType,raw-input-data - Value,bert-test - MediaType,bert-test - Value,bert-train - MediaType,bert-train - Value,bert-validation - MediaType,bert-validation - Value
0,TrialComponent-2024-03-06-165147-drsw,optimize-2,<sagemaker.parameter.CategoricalParameter obje...,<sagemaker.parameter.ContinuousParameter objec...,<sagemaker.parameter.CategoricalParameter obje...,[trial-1709436495],[Amazon-Customer-Reviews-BERT-Experiment-17094...,,,,...,,,,,,,,,,
1,TrialComponent-2024-03-06-163102-dwoe,optimize-1,,0.000015,,[trial-1709436495],[Amazon-Customer-Reviews-BERT-Experiment-17094...,,,,...,,,,,,,,,,
2,sagemaker-scikit-learn-2024-03-06-15-39-19-291...,evaluate,,,,[trial-1709436495],[Amazon-Customer-Reviews-BERT-Experiment-17094...,arn:aws:sagemaker:us-east-1:211125778552:proce...,1.0,ml.m5.xlarge,...,,,,,,,,,,
3,tensorflow-training-2024-03-03-04-02-13-539-aw...,train,false,0.00001,128.0,[trial-1709436495],[Amazon-Customer-Reviews-BERT-Experiment-17094...,arn:aws:sagemaker:us-east-1:211125778552:train...,1.0,ml.c4.2xlarge,...,s3://sagemaker-us-east-1-211125778552/tensorfl...,,,,,,,,,
4,sagemaker-scikit-learn-2024-03-03-03-46-55-548...,prepare,,,,[trial-1709436495],[Amazon-Customer-Reviews-BERT-Experiment-17094...,arn:aws:sagemaker:us-east-1:211125778552:proce...,2.0,ml.c5.2xlarge,...,,us-east-1,,s3://sagemaker-us-east-1-211125778552/amazon-r...,,s3://sagemaker-us-east-1-211125778552/sagemake...,,s3://sagemaker-us-east-1-211125778552/sagemake...,,s3://sagemaker-us-east-1-211125778552/sagemake...
5,sagemaker-scikit-learn-2024-03-03-03-28-16-715...,prepare,,,,[trial-1709436495],[Amazon-Customer-Reviews-BERT-Experiment-17094...,arn:aws:sagemaker:us-east-1:211125778552:proce...,2.0,ml.t3.medium,...,,us-east-1,,s3://sagemaker-us-east-1-211125778552/amazon-r...,,s3://sagemaker-us-east-1-211125778552/sagemake...,,s3://sagemaker-us-east-1-211125778552/sagemake...,,s3://sagemaker-us-east-1-211125778552/sagemake...


# Pass `tuning_job_name` to the Next Notebook

In [55]:
print(best_candidate_tuning_job_name)

0    tensorflow-training-240306-1651-002-2601ee0c
Name: TrainingJobName, dtype: object


In [56]:
%store best_candidate_tuning_job_name

Stored 'best_candidate_tuning_job_name' (Series)


In [57]:
%store

Stored variables and their in-db values:
autopilot_endpoint_arn                                -> 'arn:aws:sagemaker:us-east-1:211125778552:endpoint
autopilot_model_arn                                   -> 'arn:aws:sagemaker:us-east-1:211125778552:model/au
autopilot_train_s3_uri                                -> 's3://sagemaker-us-east-1-211125778552/data/amazon
balance_dataset                                       -> True
balanced_bias_data_jsonlines_s3_uri                   -> 's3://sagemaker-us-east-1-211125778552/bias-detect
balanced_bias_data_s3_uri                             -> 's3://sagemaker-us-east-1-211125778552/bias-detect
best_candidate_tuning_job_name                        -> 0    tensorflow-training-240306-1651-002-2601ee0c

bias_data_s3_uri                                      -> 's3://sagemaker-us-east-1-211125778552/bias-detect
comprehend_endpoint_arn                               -> 'arn:aws:comprehend:us-east-1:211125778552:documen
comprehend_train_s3_uri          

# Release Resources

In [58]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

In [59]:
%%javascript

try {
    Jupyter.notebook.save_checkpoint();
    Jupyter.notebook.session.delete();
}
catch(err) {
    // NoOp
}

<IPython.core.display.Javascript object>