# TABLE OF CONTENTS:
---
* [Notebook Summary](#Notebook-Summary)
* [Setup](#Setup)
* [Data](#Data)
    * [Load Data](#Load-Data)
* [Model](#Model)
    * [Load Model](#Load-Model)
    * [Model Evaluation](#Model-Evaluation)
---

# Notebook Summary

This notebook will evaluate the model trained in `02_model_training` on the stanford dog test set.

# Setup

Append parent directory to sys path to be able to import created modules from src directory.

In [8]:
import sys
sys.path.append(os.path.dirname(os.path.abspath("")))

Automatically reload modules when changes are made.

In [9]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Import libraries and modules.

In [15]:
# Import libraries
import azureml.core
import numpy as np
import shutil
import torch
from azureml.core import Dataset, Environment, Experiment, Keyvault, Model, ScriptRunConfig, Workspace
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.core.model import InferenceConfig 
from azureml.train.hyperdrive import BanditPolicy, HyperDriveConfig, PrimaryMetricGoal, RandomParameterSampling
from azureml.train.hyperdrive import choice, uniform
from azureml.widgets import RunDetails
from torchvision import datasets

# Import created modules
from src.utils import load_data

print(f"azureml.core version: {azureml.core.VERSION}")

azureml.core version: 1.20.0


# Data

### Load Data

Use the previously created utility function to create dataloaders and retrieve dataset_sizes.

In [11]:
# Load data
dataloaders, dataset_sizes, class_names = load_data("../data")

# Model

### Load Model

Load the model that was trained in the `02_model_training` notebook.

In [12]:
model_path = "../outputs/model.pt"

model = torch.load(model_path, map_location=lambda storage, loc: storage)
model.eval()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1

### Model Evaluation

Evaluate the model on the test set by calculating the accuracy (which is our primary evaluation metric).

In [14]:
# Leverage GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Initialize running_correct_preds (this will later be updated with every batch)
running_correct_preds = 0

# Use the test dataloader to load in data in batches
for inputs, labels in dataloaders["test"]:
    inputs = inputs.to(device)
    labels = labels.to(device)
    
    # Make predictions on test data batch
    with torch.no_grad():
        outputs = model(inputs)
        _, preds = torch.max(outputs, 1)
        
    # Add correct predictions from current batch
    running_correct_preds += torch.sum(preds == labels.data)

Test accuracy: 0.8667832167832168


In [18]:
# Calculate the total accuracy over the test set
test_accuracy = running_correct_preds.double() / dataset_sizes["test"]
print(f"Test accuracy: {np.round(test_accuracy, 4)*100}%")

Test accuracy: 86.68%
