# Breed classifcation

This notebook lists all the steps that you need to complete this project. You will need to complete all the TODOs in this notebook as well as in the README and the two python scripts included with the starter code.


**TODO**: Give a helpful introduction to what this notebook is for. Remember that comments, explanations and good documentation make your project informative and professional.

**Note:** This notebook has a bunch of code and markdown cells with TODOs that you have to complete. These are meant to be helpful guidelines for you to finish your project while meeting the requirements in the project rubrics. Feel free to change the order of these the TODO's and use more than one TODO code cell to do all your tasks.

In [1]:
# TODO: Install any packages that you might need
# For instance, you will need the smdebug package
!pip install smdebug

[0m

In [2]:
!pip install torchvision torch --no-cache-dir

[0m

In [3]:
# Standard library imports
import os
import csv
import io
from pprint import pprint

# Third-party imports
import numpy as np
import pandas as pd
from PIL import Image
import boto3
from botocore.exceptions import NoCredentialsError

# SageMaker imports
import sagemaker
from sagemaker.pytorch import PyTorch, PyTorchModel
from sagemaker.tuner import HyperparameterTuner, CategoricalParameter, ContinuousParameter, IntegerParameter
from sagemaker.inputs import TrainingInput
from sagemaker.debugger import (
    DebuggerHookConfig,
    Rule,
    ProfilerRule,
    rule_configs,
    TensorBoardOutputConfig,
    ProfilerConfig,
    FrameworkProfile
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


## Dataset

For this project, we are using the Udacity Dog Breed Identification dataset. This dataset contains images of dogs categorized into different breeds and is used for training machine learning models to identify dog breeds from images.

In [5]:
%%capture
# Command to download and unzip data
#!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
#!unzip dogImages.zip

In [26]:
# Initialize S3 client
s3 = boto3.client('s3')

In [27]:
# Define the default bucket
default_bucket = sagemaker.Session().default_bucket()
print(default_bucket)

sagemaker-us-east-1-482545180177


In [8]:
# Define the bucket name and data directory
data_directory = 'dogImages'
# List to store the data for the CSV file
data_for_csv = []

In [None]:
# Function to upload a file to S3
def upload_to_aws(local_file, bucket, s3_file):
    try:
        s3.upload_file(local_file, bucket, s3_file)
        print(f"Upload Successful: {s3_file}")
        return True
    except FileNotFoundError:
        print("The file was not found")
        return False
    except NoCredentialsError:
        print("Credentials not available")
        return False

In [16]:
%%capture
# Walk through the data directory and upload each file
for root, dirs, files in os.walk(data_directory):
    for file in files:
        local_file = os.path.join(root, file)
        s3_file = os.path.relpath(local_file, data_directory)
        upload_to_aws(local_file, default_bucket, s3_file)

        # Extract type, breed, and dog_image_name from the file path
        parts = os.path.relpath(local_file, data_directory).split(os.sep)
        if len(parts) == 3:  # Ensure the path has exactly 3 parts: type/breed/image
            type_, breed, dog_image_name = parts
            data_for_csv.append({
                'type': type
                'breed': breed,
                'dog_image_name': dog_image_name
            })


In [20]:
# Write the data to a CSV file
csv_file = 'dog_images_info.csv'
csv_columns = ['type', 'breed', 'dog_image_name']

with open(csv_file, mode='w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=csv_columns)
    writer.writeheader()
    for row in data_for_csv:
        writer.writerow(row)

print(f"CSV file '{csv_file}' created successfully.")

In [24]:
# Read the CSV file
csv_file = 'dog_images_info.csv'
df = pd.read_csv(csv_file)

# Display the first few rows of the dataframe
print("First few rows of the dataframe:")
print(df.head())

# Summary analysis
summary = {
    'total_images': len(df),
    'images_per_type': df['type'].value_counts().to_dict(),
    'images_per_breed': df['breed'].value_counts().to_dict(),
    'types_per_breed': df.groupby('breed')['type'].nunique().to_dict()
}

In [33]:
# Prettify the output
print("\nSummary:")
print(f"Total breeds: {len(summary['images_per_breed'])}\n")

print("Images per type:")
for type_, count in summary['images_per_type'].items():
    print(f"  {type_}: {count}")

print("\nImages per breed:")
for breed, count in summary['images_per_breed'].items():
    print(f"  {breed}: {count}")

## Hyperparameter Tuning
**TODO:** This is the part where you will finetune a pretrained model with hyperparameter tuning. Remember that you have to tune a minimum of two hyperparameters. However you are encouraged to tune more. You are also encouraged to explain why you chose to tune those particular hyperparameters and the ranges.

**Note:** You will need to use the `hpo.py` script to perform hyperparameter tuning.

In [9]:
# Define hyperparameter ranges
hyperparameter_ranges = {
    'batch_size': CategoricalParameter([32, 64, 128, 256, 512]),
    'lr': ContinuousParameter(0.0001, 0.1),
    'epochs': IntegerParameter(5, 20)
}

objective_metric_name = 'validation:accuracy'
metric_definitions = [{'Name': 'validation:accuracy', 'Regex': 'Accuracy: ([0-9\\.]+)'}]

# Define the PyTorch estimator
estimator = PyTorch(
    entry_point='hpo.py',
    role=sagemaker.get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.xlarge',
    framework_version='1.11.0',
    py_version='py38',
)

# Define hyperparameter tuner
tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=4,
    max_parallel_jobs=2,
    objective_type='Maximize'  # Use 'Maximize' if your metric is accuracy; 'Minimize' otherwise
)

# Define the input data channels
inputs = {'training': f's3://{default_bucket}/dogImages/train',
          'validation': f's3://{default_bucket}/dogImages/valid'}

In [None]:
# Start the tuning job
tuner.fit(inputs)

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config
No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

In [15]:
from pprint import pprint

In [16]:
# TODO: Get the best estimators and the best HPs
best_estimator = tuner.best_estimator()

# Get the hyperparameters of the best trained model
best_hyperparameters = best_estimator.hyperparameters()

# Print the best hyperparameters
print("Best hyperparameters:")
pprint(best_hyperparameters)


2024-08-05 01:58:29 Starting - Preparing the instances for training
2024-08-05 01:58:29 Downloading - Downloading the training image
2024-08-05 01:58:29 Training - Training image download completed. Training in progress.
2024-08-05 01:58:29 Uploading - Uploading generated training model
2024-08-05 01:58:29 Completed - Resource released due to keep alive period expiry
Best hyperparameters:
{'_tuning_objective_metric': '"validation:accuracy"',
 'batch_size': '"512"',
 'epochs': '7',
 'lr': '0.0001628244955280907',
 'sagemaker_container_log_level': '20',
 'sagemaker_estimator_class_name': '"PyTorch"',
 'sagemaker_estimator_module': '"sagemaker.pytorch.estimator"',
 'sagemaker_job_name': '"pytorch-training-2024-08-04-22-23-37-174"',
 'sagemaker_program': '"hpo.py"',
 'sagemaker_region': '"us-east-1"',
 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-482545180177/pytorch-training-2024-08-04-22-23-37-174/source/sourcedir.tar.gz"'}


## Model Profiling and Debugging
TODO: Using the best hyperparameters, create and finetune a new model

**Note:** You will need to use the `train_model.py` script to perform model profiling and debugging.

In [24]:
rules = [
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.poor_weight_initialization()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport()),
]

In [31]:


debugger_hook_config = DebuggerHookConfig(
    hook_parameters={
        "train.save_interval": "1",
        "eval.save_interval": "1"
    }
)

profiler_config = ProfilerConfig(
    system_monitor_interval_millis=500, framework_profile_params=FrameworkProfile(num_steps=1)
)

Framework profiling will be deprecated from tensorflow 2.12 and pytorch 2.0 in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [None]:
# Create and fit an estimator
estimator = PyTorch(
    entry_point='train_model.py',
    role=sagemaker.get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.xlarge',
    framework_version='1.11.0',
    py_version='py38',
    hyperparameters=best_hyperparameters,
    debugger_hook_config=debugger_hook_config,
    profiler_config=profiler_config,
    rules=rules
)

In [None]:
# Define the input data channels
inputs = {'training': f's3://{default_bucket}/dogImages/train','testing': f's3://{default_bucket}/dogImages/test'}

# Fit the estimator
estimator.fit(inputs)

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: pytorch-training-2024-08-05-03-07-57-927


2024-08-05 03:07:59 Starting - Starting the training job..

In [None]:
job_name = estimator._current_job_name
print('Job name:', job_name)
debug_artifacts_path = estimator.latest_job_debugger_artifacts_path()
print('Debug artifacts path', debug_artifacts_path)

In [None]:
# TODO: Plot a debugging output.
tensorboard_output_config = TensorBoardOutputConfig(
    s3_output_path=f's3://{default_bucket}/tensorboard-output'
)

# Assuming that you have enabled tensorboard in your train_model.py script
estimator.fit(inputs, tensorboard_output_config=tensorboard_output_config)

**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

In [None]:
# TODO: Display the profiler output
profiler_report_s3_uri = f's3://{default_bucket}/profiler-output/{estimator.latest_training_job.name}/profiler-output/profiler-report.html'
print(f'Profiler report: {profiler_report_s3_uri}')

## Model Deploying

In [None]:
# TODO: Deploy your model to an endpoint
predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='dog-breed-classifier-endpoint'
)

# Function to preprocess the image
def preprocess_image(image_path):
    from torchvision import transforms

    preprocess = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    image = Image.open(image_path).convert("RGB")
    image = preprocess(image)
    image = np.expand_dims(image.numpy(), axis=0)
    return image


In [None]:
# TODO: Run an prediction on the endpoint
image_path = f's3://{default_bucket}/dogImages/test/Affenpinscher_00003.jpg'  # Path to your test image
image = preprocess_image(image_path)

# Convert the image to the format expected by the model
payload = np.array(image).tolist()
response = predictor.predict(payload)

# Decode the prediction response
predicted_class = np.argmax(response)
print(f'Predicted class: {predicted_class}')

In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done
#predictor.delete_endpoint()