## Project Description

Imagine you're developing a deep learning system tailored for sentiment analysis of product reviews, specifically for a newly established online beautiy product retail company. The goal is to assist the company in making informed decisions about inventory management – deciding what products to retain and what to remove from stock. The company, keen on enhancing customer satisfaction, has been actively monitoring comments on their website and has invested in annotators to label sentiments. They hand you a dataset comprising 80,000 customer reviews, each labeled with 0 for negative sentiment and 1 for positive sentiment. After extensive effort and refinement, you successfully train and deploy a classifier that predicts sentiment based on online comments. Excitedly, you report an 86% accuracy on a held-out test set to your bosses. However, to your disappointment, management expresses dissatisfaction, insisting on a minimum of 90% accuracy before considering the widespread implementation of the AI model. 
You suspect that certain annotators might have made errors, potentially affecting your model's effectiveness. Empowered by a newfound "confidence," you opt for "confidence" learning to pinpoint and rectify any inaccuracies in the dataset before embarking on the retraining process once more.

First, we prepare the environment for AWS SageMaker operations by setting up clients and retrieving essential configuration details like the default S3 bucket, execution role, and AWS region. 

In [1]:
import sagemaker
import logging
import boto3
import sagemaker
import pandas as pd
import json
import botocore
from botocore.exceptions import ClientError

config = botocore.config.Config(user_agent_extra='dlai-pds/c2/w3')

# low-level service client of the boto3 session
sm = boto3.client(service_name='sagemaker', 
                  config=config)

sm_runtime = boto3.client('sagemaker-runtime',
                          config=config)

sess = sagemaker.Session(sagemaker_client=sm,
                         sagemaker_runtime_client=sm_runtime)

bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = sess.boto_region_name

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [2]:
role

'arn:aws:iam::641943045730:role/service-role/AmazonSageMaker-ExecutionRole-20240423T153327'

In [3]:
bucket

'sagemaker-us-east-1-641943045730'

We then configure the data source for a training job in SageMaker, defining where the training data is located (in this case, an S3 bucket) and the nature of the data.

In [4]:
from sagemaker.inputs import TrainingInput

# TODO: set the path to the train data
train_data = TrainingInput(
    ..., 
    content_type='application/x-sagemaker-training-data'
)


In [5]:
from sagemaker.inputs import TrainingInput

# Define the content type for CSV files
content_type = 'application/x-sagemaker-training-data'

# Set the S3 path to the training data
train_data_s3_path = f"s3://sagemaker-us-east-1-641943045730/data/train.csv"

# Configure the input for the training job
train_data = TrainingInput(
    s3_data=train_data_s3_path, 
    content_type=content_type
)

# Similarly, for validation (dev) and test data, if needed
validation_data_s3_path = f"s3://sagemaker-us-east-1-641943045730/data/dev.csv"
validation_data = TrainingInput(
    s3_data=validation_data_s3_path, 
    content_type=content_type
)

test_data_s3_path = f"s3://sagemaker-us-east-1-641943045730/data/test.csv"
test_data = TrainingInput(
    s3_data=test_data_s3_path, 
    content_type=content_type
)


In [8]:
import os
source_path = os.getcwd()

A PyTorch estimator with the specified configurations for a SageMaker training job is created. The training job will use the provided entry point script, run on the specified instance type, and output the trained model to the specified S3 path. The entry point script main.py contains the main steps that needs to be completed in this project.

In [9]:
from sagemaker.pytorch import PyTorch
s3_output_path = f"s3://{bucket}/outputs/"

estimator = PyTorch(
    entry_point="main.py",
    source_dir= source_path,  
    base_job_name="sagemaker-script-mode",
    role=role,  
    instance_count=1,
    instance_type="ml.p3.2xlarge",  
    framework_version="2.1",  
    py_version="py310",  
    dependencies=[],  
    output_path= s3_output_path ,  
    environment={'PYTHONPATH': 'src'}  
)




In [10]:
os.environ

environ{'SHELL': '/bin/bash',
        'SUPERVISOR_GROUP_NAME': 'jupyterlabserver',
        'MAMBA_USER_ID': '57439',
        'SAGEMAKER_SPACE_NAME': 'project1',
        'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/_sagemaker-instance-credentials/cbb50b051dcf7dd7f8f4f52886c2297d01d7ae8f8e97c8b2b4d02fa67b281e0a',
        'ENV_NAME': 'base',
        'MAMBA_USER': 'sagemaker-user',
        'SUPERVISOR_SERVER_URL': 'unix:///var/run/supervisord/supervisor.sock',
        'HOSTNAME': 'default',
        'SAGEMAKER_APP_TYPE_LOWERCASE': 'jupyterlab',
        'SAGEMAKER_LOG_FILE': '/var/log/studio/jupyterlab.log',
        'AWS_DEFAULT_REGION': 'us-east-1',
        'XML_CATALOG_FILES': 'file:///opt/conda/etc/xml/catalog file:///etc/xml/catalog',
        'EDITOR': 'nano',
        'AWS_REGION': 'us-east-1',
        'PWD': '/home/sagemaker-user',
        'GSETTINGS_SCHEMA_DIR': '/opt/conda/share/glib-2.0/schemas',
        'CONDA_PREFIX': '/opt/conda',
        'REGION_NAME': 'us-east-1',
        'MAMBA_

The following script sets up a ModelCheckpoint callback to automatically save the best model (based on development loss) during the training process in a SageMaker training job. The best model will be stored at the specified directory path within the SageMaker environment.

In [11]:
# Save the best model during training by specifying the output path
# (Note: The output path should be where the best model will be saved within the S3 bucket)

# model_dir = os.environ.get('SM_MODEL_DIR', './')

# model_checkpoint = {
#     'ModelCheckpoint': {
#         'monitor': 'dev_loss',
#         'dirpath': model_dir,
#         'filename': 'best_model',
#         'save_top_k': 1,
#         'mode': 'min'
#     }
# }


model_checkpoint = {
    'ModelCheckpoint': {
        'monitor': 'dev_loss',
        'dirpath': 's3://sagemaker-us-east-1-641943045730/output/',
        'filename': 'best_model',
        'save_top_k': 1,
        'mode': 'min'
    }
}

# Attach the ModelCheckpoint callback to the estimator
estimator._hyperparameters['callbacks'] = [model_checkpoint]


Starting the training process: 

In [None]:
bucket = 'sagemaker-us-east-1-641943045730'
prefix = 'data'  # Assuming all data files are under this prefix in your bucket

s3_input = {
    'train': f's3://{bucket}/{prefix}/',
    'validation': f's3://{bucket}/{prefix}/',
    'test': f's3://{bucket}/{prefix}/'
}

estimator.fit(s3_input)


INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: sagemaker-script-mode-2024-05-07-11-22-59-576


2024-05-07 11:23:08 Starting - Starting the training job...
2024-05-07 11:23:08 Pending - Training job waiting for capacity...
2024-05-07 11:24:04 Pending - Preparing the instances for training......
2024-05-07 11:24:40 Downloading - Downloading input data...
2024-05-07 11:25:10 Downloading - Downloading the training image...............
2024-05-07 11:27:56 Training - Training image download completed. Training in progress...[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-05-07 11:28:14,131 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-05-07 11:28:14,149 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-05-07 11:28:14,161 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-05-07 11:28:14,163 sagemaker_pytorch_container.training INFO   

## Model Deployment

We need to copy the training artifacts, i.e, output.tar.gz, from the corresponding S3 bucket to the current working directory.

In [46]:
import os
cwd = os.getcwd()

In [67]:
!aws s3 cp s3://sagemaker-us-east-1-641943045730/output/sagemaker-script-mode-2024-04-24-18-04-37-309/output/output.tar.gz .


download: s3://sagemaker-us-east-1-641943045730/output/sagemaker-script-mode-2024-04-24-18-04-37-309/output/output.tar.gz to ./output.tar.gz


We can decompress the training artifacts to `extracted_files` for further exploration.

In [68]:
!tar -xzf output.tar.gz -C extracted_training_artifacts


tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.creationtime'


In [73]:
estimator.latest_training_job.name

'sagemaker-script-mode-2024-04-24-18-04-37-309'

In [82]:
predictor = estimator.deploy(
    initial_instance_count=1,  
    instance_type='ml.p3.2xlarge',  
    endpoint_name='sentiment-analysis-endpoint-4'  
)