In this notebook, I will be preparing the training code for SageMaker.

This section will contain the `HuggingFace` class through which the training will begin. I will be specifying the hypeparameters in the `script.py` file, therefore the `HuggingFace()` class will not contain any hyperparameter inputs - they will come from the `entry_point` 

In [1]:
import sagemaker
import transformers
import torch
from sagemaker import get_execution_role
from sagemaker.huggingface import HuggingFace

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [2]:
# 1. Get the SageMaker execution role - this tells our program whether we have access to all the tools we need to access for training
role = get_execution_role()

In [3]:
role

'arn:aws:iam::041434534908:role/service-role/AmazonSageMaker-ExecutionRole-20250111T113739'

In [4]:
# Define checkpoint configuration - such that we resume training from the last saved state if training is interrupted
#checkpoint_config = {
#    'LocalPath': '/opt/ml/checkpoints',  # Container path where checkpoints will be saved
#    'S3Uri': 's3://tk5-huggingface-multiclass-textclassification-bucket/checkpoints/'  # S3 path to store checkpoints
#}

In [5]:
# 2. Create a Hugging Face estimator class - this has the Hugging Face training script that we will run later
huggingface_estimator = HuggingFace(
    py_version='py39',
    entry_point='script.py',
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    source_dir='./',
    instance_count=1,
    instance_type='ml.g4dn.xlarge',
    output_path='s3://tk5-huggingface-multiclass-textclassification-bucket/output/tk5-generated-output/',
    hyperparameters={
        'max_length':512,
        'epochs':2,
        'train_batch_size':4,
        'test_batch_size':2,
        'learning_rate':1e-05
    },
    enable_sagemaker_metrics=True,
    role=role
)

In [6]:
# The line below will start training - first, we need to create the script.py where we will specify the model, Dataset(s) and DataLoader(s), training loop etc.
huggingface_estimator.fit()

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: huggingface-pytorch-training-2025-01-19-06-52-30-974


2025-01-19 06:54:20 Starting - Starting the training job...
2025-01-19 06:54:33 Starting - Preparing the instances for training...
2025-01-19 06:55:21 Downloading - Downloading the training image..................
2025-01-19 06:58:13 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
  "cipher": algorithms.TripleDES,[0m
  "class": algorithms.TripleDES,[0m
[34m2025-01-19 06:58:25,524 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2025-01-19 06:58:25,545 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2025-01-19 06:58:25,558 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2025-01-19 06:58:25,562 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2025-01-19

KeyboardInterrupt: 