# CNN CLASSIFIER

PURPOSE:
  - API to train and apply leveraged pretrained vision models for classification
  
REQUIREMENTS:
  - Pretained model is downloaded and can be trained on a dataset by user
  - The number of attached fully connected layers is customizable by the user
  - The deeper convolutional layers are unfrozen for a period of time during training for tuning
  - User can load a model and continue training or move directly to inference
  - Saved trained model information is stored in a specific folder with a useful naming convention
  - There are time-limited prompts that allow the user to direct processes as needed
  - Training performance can be tested before moving onward to inference if desired
  - Predictions are made using paralleled batches and are saved in a results dictionary
  
HOW TO USE:
  - If no model has been trained and saved, start by training a model
  - Store data in folders at this location: os.path.expanduser('~') + '/Programming Data/'
  - For training, 'train' and 'valid' folders with data are required in the data_dir
  - For overfit testing, an 'overfit' folder with data is required in the data_dir
  - For performance testing, a 'test' folder with data is required in the data_dir
  - For inference, put data of interest in a 'predict' folder in the data_dir
  - For saving and loading models, create a 'models' folder in the data_dir

## Import Libraries

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import time, os, random
import numpy as np
import json
import argparse
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import transforms, datasets, models
from torch import nn, optim
from PIL import Image
from threading import Thread

## Define Arguments

In [3]:
'''
ArgParse Not Used in Notebooks

def u1_get_input_args(): 
    Purpose:
        - Creates and stores command line arguments inputted by the user.
        - Attaches default arguments and help text to aid user.
    Command Line Arguments:
        1. Data directory as --dir
        2. Choose to load model as --load
        3. Choose to train model as --train
        4. Define number of training epochs as --epoch
        5. Define network number of hidden layers --layer
        6. Define learnrate as --learn
        7. Choose pretrained CNN model as --model
    Returns:
        - Stored command line arguments as an Argument Parser Object with parse_args() data structure
'''

argdir = 'Flower_data'
argload = 'n'
argtrain = 'n'
argepoch = 50
arglayer = 2
arglearn = 0.003
argmodel = 'googlenet' # 'vgg', 'alexnet', 'googlenet', 'densenet', 'inception', 'resnext', 'shufflenet'

## Import Utility Functions

In [4]:
# from cnn_utility_functions import *

def u2_load_processed_data(data_dir):
    '''
    Purpose:
        - Access data directory and produce a dictionary of datasets
        - Create a dictionary of the class labels and read in the data labels
    Parameters:
        - data_dir = pathway to the data
    Returns:
        - dictionary of datasets
        - dictionary of data labels
        - dictionary of class labels
    '''
    # Initialize empty dictionaries to hold data and data labels
    dict_datasets = {}
    dict_data_labels = {}

    # Iterate through folders in the data directory
    for folder in os.listdir(data_dir):

        # If data exists, create datasets for overfitting, testing, training, and validating data
        if folder in ['overfit', 'test', 'train', 'valid']:
            dict_datasets[folder + '_data'] = datasets.ImageFolder(data_dir + folder, transform=u3_process_data(folder))

        # If data for inference exists, create a dataset from the predict folder
        if folder == 'predict':
            predict_transform = u3_process_data(folder)
            dict_datasets['predict_data'] = [(predict_transform(Image.open(data_dir + folder + '/' + filename)),
                            filename) for filename in os.listdir(data_dir + folder)]

        # If a data names are added to the data directory as a json, open it and read into data label dictionary
        if os.path.splitext(folder)[1] == '.json':
            with open(data_dir + folder, 'r') as f:
                dict_data_labels = json.load(f)

    # Create a dictionary connecting class indexes to class labels, return the datasets and label dictionaries
    dict_class_labels = {value : key for (key, value) in dict_datasets['train_data'].class_to_idx.items()}
    return dict_datasets, dict_data_labels, dict_class_labels


def u3_process_data(transform_request):
    '''
    Purpose:
        - Define an assortment of transforms for application to specific datasets
        - Return the appropriate transformation that corresponds to the inputted request
        - Defined transforms are composed of a sequence of individual transform operations
        - Depending on the needs of each data set, a transform will use specific operations
    Parameters:
        - transformation_request = selected transformation type
    Returns:
        - transform that corresponds to the request
    '''
    image_1d_size = 224

    predict_transform = transforms.Compose([transforms.Resize(int(np.round_(image_1d_size*1.1, decimals=0))),
                                            transforms.CenterCrop(image_1d_size),
                                            transforms.ToTensor(),
                                            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    inverse_transform = transforms.Compose([transforms.Normalize([0, 0, 0], [1/0.229, 1/0.224, 1/0.225]),
                                            transforms.Normalize([-0.485, -0.456, -0.406], [1, 1, 1])])

    train_transform = transforms.Compose([transforms.RandomRotation(20),
                                          transforms.RandomResizedCrop(image_1d_size),
                                          transforms.RandomHorizontalFlip(),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])

    valid_transform = transforms.Compose([transforms.Resize(int(np.round_(image_1d_size*1.1, decimals=0))),
                                          transforms.CenterCrop(image_1d_size),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    test_transform = transforms.Compose([transforms.Resize(int(np.round_(image_1d_size*1.1, decimals=0))),
                                         transforms.CenterCrop(image_1d_size),
                                         transforms.ToTensor(),
                                         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    game_transform = transforms.Compose([transforms.Resize(int(np.round_(image_1d_size*1.1, decimals=0))),
                                         transforms.CenterCrop(image_1d_size),
                                         transforms.ToTensor(),
                                         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    overfit_transform = train_transform
    return locals()[transform_request + '_transform']


def u4_data_iterator(dict_datasets):
    '''
    Purpose:
        - Receive a dictionary of datasets
        - Convert each dataset to a dataLoader
        - Return a dictionary of dataloaders
    Parameters:
        - dict_datasets = dictionary of datasets
    Returns:
        - dict_data_loaders = dictionary of dataloaders
    '''
    dict_data_loaders = {}
    for dataset in dict_datasets:
        loader_type = dataset.split('_')[0] + '_loader'
        dict_data_loaders[loader_type] = torch.utils.data.DataLoader(dict_datasets[dataset], batch_size=128, shuffle=True)
    return dict_data_loaders


def u5_time_limited_input(prompt, default=True):
    '''
    Purpose:
        - Receive text and start a thread to initiate a user input prompt with that text
        - Track thread time and limit time to an established TIMEOUT limit
        - Return user input or after the TIMEOUT limit is reached return the default choice
    Parameters:
        - prompt = specific question text for display
        - default = default choice if no user input is provided
    Returns:
        - choice = the user input or the default
    '''
    TIMEOUT = 10
    prompt = prompt + f': \'y\' for yes, \'n\' for no ({TIMEOUT} seconds to choose): '
    user_input_thread = Thread(target=u6_user_input_prompt, args=(prompt, default), daemon = True)
    user_input_thread.start() # Start the thread, calling the user input function
    user_input_thread.join(TIMEOUT) # Limit the thread to the TIMEOUT time limit
    if not answered:
        print('\n No valid input, proceeding with operation...')
    return choice


def u6_user_input_prompt(prompt, default):
    '''
    Purpose:
        - Receive a prompt and use it for user input prompting
        - Once answered return True or False if input is yes or no
        - Ask question again if the input is incorrect
    Parameters:
        - prompt = complete user input question text for display
        - default = default choice if no user input is provided
    Returns:
        - choice = the user input or the default
    '''
    global choice, answered # Global variables are required to communicate input statuses back to the thread manager
    choice = default
    answered = False
    while not answered:
        choice = input(prompt)
        if choice == 'Y' or choice == 'y':
            print('User input = Yes\n')
            choice = True
            answered = True
        elif choice == 'N' or choice == 'n':
            choice = False
            answered = True
            print('User input = No\n')
        else:
            choice=choice
            print('Error, please use the character inputs \'Y\' and \'N\'')


## Import Model Functions

In [5]:
# from cnn_model_functions import *

class Classifier(nn.Module):
    '''
    Inherits Class information from the nn.Module and creates a Classifier Class:
        - Class has these attributes:
            o fully connected layer with specified number of in_features and out_features
            o number of hidden layers equivalent to the inputted requirements
            o dropout parameter for the fully connected layers
        - Class has a forward method:
            o Flattens the input data in an input layer for computation
            o Connects each layer with a relu activation, the defined dropout, and linear regression
            o Returns outputs from the final hidden layer into an categorical output probability using log_softmax
    Parameters:
        - in_features
        - hidden_layers
        - out_features
    '''
    # Initialize attributes, requiring input arguments for the number of hidden layers, and input and output features
    # Use super() for multiple inheritance from nn.Module, use arguments to create in, out, hidden layer attributes
    def __init__(self, in_features, hidden_layers, out_features):
        super().__init__()
        self.in_features = in_features
        self.hidden_layers = hidden_layers
        self.out_features = out_features
        self._index = 1

        # Iterate to create the requested number of hidden layers, tapering down shape by a factor of 2 between layers
        # Setattr is used to create a layer attribute with the fc('index') name and the factored shape
        # Use the required number of out features for the output of the last layer, set dropout to a desired value
        while self._index < self.hidden_layers:
            setattr(self, 'fc'+str(self._index), nn.Linear(round(self.in_features/(2**(self._index-1))),
                            round(self.in_features/(2**self._index))))
            self._index += 1
        setattr(self, 'fc'+str(self._index), nn.Linear(round(self.in_features/(2**(self._index-1))), self.out_features))
        self.dropout = nn.Dropout(p=0.3)

    # Define the forward function that will take an input and compute it through the number of layers
    # Start by flattening the data, then use the number of hidden layers to iterate through each existing layer
    # Use the Relu activation function between layers, apply the defined dropout rate, and return the softmax probability
    def forward(self, x):
        x = x.view(x.shape[0], -1)
        self._index = 1
        while self._index < self.hidden_layers:
            x = self.dropout(F.relu(getattr(self,'fc'+str(self._index))(x)))
            self._index += 1
        x = F.log_softmax(getattr(self,'fc'+str(self._index))(x), dim=1)
        return x


def m1_create_classifier(model_name, hidden_layers, classes_length):
    '''
    Purpose:
        - Return an integrated CNN architecture by:
            o downloading a pretrained model
            o attaching a fully connected network
        - Leverages the requested pretrained model to provide base features
    Parameters:
        - model_name = base pretrained model
        - hidden_layers = number of hidden layers in final fully connected network
        - out_features = number of classes in data
    Returns:
        - model
    '''
    # Store a dictionary of available models as names to avoid downloading models until a choice has been made
    model_name_dic = {'vgg': 'vgg16', 'alexnet': 'alexnet', 'googlenet': 'googlenet', 'densenet': 'densenet161',
                      'resnext': 'resnext50_32x4d', 'shufflenet': 'shufflenet_v2_x1_0'}

    # Download the pretrained convolutional neural network architecture requested by the user and freeze the parameters
    model = getattr(models, model_name_dic[model_name])(pretrained=True)
    for param in model.parameters():
        param.requires_grad = False

    # Search the pretrained architecture for the first fully-connected layer and return the number of in features
    for module in list(model.modules()):
        if module._get_name() == 'Linear':
            in_features = module.weight.shape[1]
            break

    # Use the known number of in and out features to ensure compatibility for the attached fully connected layers
    # Replace the fully connected layer(s) at the end of the model with our own fully connected classifier
    setattr(model, list(model._modules.items())[-1][0], Classifier(in_features, hidden_layers, classes_length))

    # Print the name of the model and the architecture of the attached layers, then return the model
    print('\nUsing ', model_name, ' with the following attached ', hidden_layers,
                    ' layer classifier:\n', list(model.children())[-1])
    return model


def m2_save_model_checkpoint(model, file_name_scheme, model_hyperparameters):
    '''
    Purpose:
        - Receive a model, a naming convention, and model hyperparameter
        - Save model checkpoint and hyperparameters
    Parameters:
        - model = model to be saved
        - file_name_scheme = directory and naming convention for saving
        - model_hyperparameters = information about state of model
    Returns:
        - none
    '''
    # Save the model state_dict per the naming convention as a pth file
    torch.save(model.state_dict(), file_name_scheme + '_dict.pth')

    # Save the model hyperparameters per the naming convention as a JSON file
    with open(file_name_scheme + '_hyperparameters.json', 'w') as file:
        json.dump(model_hyperparameters, file)


def m3_load_model_checkpoint(model, file_name_scheme):
    '''
    Purpose:
        - Receive a model, a naming convention, and model hyperparameters
        - Load model checkpoint and hyperparameters
    Parameters:
        - model = model to be loaded
        - file_name_scheme = directory and naming convention for loading
    Returns:
        - model
        - model hyperparameters
    '''
    # Load the model state_dict by using the naming convention to find the file
    checkpoint = torch.load(file_name_scheme + '_dict.pth')
    model.load_state_dict(checkpoint)

    # Load the model hyperparameters by using the naming convention and display the learnrate and train time
    with open(file_name_scheme + '_hyperparameters.json', 'r') as file:
        model_hyperparameters = json.load(file)
    print('\nThe loaded model learnrate = {:.2e}..'.format( model_hyperparameters['learnrate']),
          'The loaded model training time = {:.0f} min\n'.format( model_hyperparameters['training_time']))
    return model, model_hyperparameters


## Import Operational Functions

In [6]:
# from cnn_operational_functions  import *

def o1_train_model(model, train_loader, valid_loader, epoch, decay, model_hyperparameters, criterion):
    '''
    Purpose:
        - Receive a model and start or continue training on it for e epochs
    Parameters:
        - model = inputted model (can be loaded with training history)
        - train_loader = data loader for training data for iterating
        - valid_loader = data loader for validation data for iterating
        - epoch = number of epochs to train
        - model_hyperparameters = dictionary of model hyperparameter information
        - criterion = the loss calculation method
    Returns:
        - model = model after e epochs of training
        - model_hyperparameters = revised hyperparameters for model after training
    '''
    # Print the GPU information or indicate that the GPU is not available if there is an issue
    print('Using GPU =', torch.cuda.get_device_name(), round(torch.cuda.get_device_properties(0).total_memory*(10**-9)), 'GB'\
                    if torch.cuda.is_available() else "WARNING GPU UNAVAILABLE")

    # Initialize a reference start time and subtract previous training time from reference point
    # Document starting learnrate and initialize a Boolean running variable to track deeper layer training
    t0 = time.time() - model_hyperparameters['training_time']*60
    startlearn = model_hyperparameters['learnrate']
    running = False

    # Set the optimizer for backpropogation. All parameters are set so that when unfrozen, they are included in backprop
    # NOTE: optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=lr, weight_decay= wd) to access unfrozen params
    optimizer = optim.Adam(model.parameters(), lr=model_hyperparameters['learnrate'], weight_decay=model_hyperparameters['weightdecay'])

    # Run the requested number of epochs worth of training by iterating e through the range of epochs
    for e in range(epoch):
        # Call backprop function with training data and model to return an updated model and the epoch training loss
        # Call validation function (no backprop) with the validation data and model and return performance data
        model, ave_training_loss = o2_model_backprop(model, train_loader, optimizer, criterion)
        val_count_correct, ave_validate_loss = o3_model_no_backprop(model, valid_loader, criterion)

        # Update the training history log with the average training loss and validation loss for this epoch
        model_hyperparameters['training_loss_history'].append(ave_training_loss)
        model_hyperparameters['validate_loss_history'].append(ave_validate_loss)

        # Print the epoch loss, accuracy, GPU usage, and runtime data. Accuracy is total correct over total in data
        print('Epoch: {}/{}..'.format(e+1, epoch),
            'Train Loss: {:.3f}..'.format(ave_training_loss),
            'Valid Loss: {:.3f}..'.format(ave_validate_loss),
            'Valid Accy: {:.2f}..'.format(val_count_correct / len(valid_loader.dataset)),
            'Mem: {:.2f}GB..'.format(np.around(torch.cuda.memory_allocated()*(10**-9), decimals=2)),
            'Time: {:.0f}min'.format((time.time() - t0)/60))

        # Reassigned model_hyperparameters['training_loss_history'] for this section to tlh for readability
        tlh = model_hyperparameters['training_loss_history']
        # This next section determines when to adjust learning based on training progress
        # This section is mainly a for fun exercise to tune a math algorithm for deciding when to adjust training
        # Hold loop until training_loss_history has enough elements to satisfy search requirements
        if len(tlh) > 3: # NOTE: 2
            # Compute reference: 3 times the first training loss factored by the current learnrate and the decay squared
            # Compute progress in training: the average of the last 2 training loss slopes
            # If progress in training is inverted sufficient enough to be greater than the reference, decay learnrate
            if 3*model_hyperparameters['learnrate']*decay*decay*tlh[0] < np.mean([tlh[-1]-tlh[-2], tlh[-2]-tlh[-3]]):
                model_hyperparameters['learnrate'] *= decay # multiply learnrate by the decay hyperparameter
                optimizer = optim.Adam(model.parameters(), lr=model_hyperparameters['learnrate'], weight_decay=model_hyperparameters['weightdecay']) # revise the optimizer to use the new learnrate
                print('Learnrate changed to: {:f}'.format(model_hyperparameters['learnrate']))
            # Compute reference: starting learnrate factored by decay^(9*decay^3))
            # Once learnrate has decayed to less than this value, call control_model_grad to activate deep layer training
            # Don't call if deep layer training has already been activated, set running to True to begin counting
            # In practice this performed well for various models and for overfitting vs regular training
            if model_hyperparameters['learnrate'] <= startlearn*decay**(9*(decay**3)) and model_hyperparameters['running_count'] == 0:
                model = o4_control_model_grad(model, True)
                model_hyperparameters['epoch_on'] = e
                running = True
            # If running, add to model running count to track the number of epochs run
            if running:
                model_hyperparameters['running_count'] +=1
            # Once the deep layers have trained for 20 epochs, call control_model_grad to deactivate deep layer training
            # Set the running variable to False to stop counting and prevent recalling deactivate layers
            if running and model_hyperparameters['running_count'] > 20:
                model = o4_control_model_grad(model, False)
                running = False
            # Find the basename of the loader's file root and check if it is overfit data
            # If overfit data, see if the train loss has gone below a target. If so end and print success
            # If the train loss has not gone below the target and the epochs have elapsed, print failure
            if os.path.basename(train_loader.dataset.root) == 'overfit':
                if np.mean([tlh[-1], tlh[-2], tlh[-3]]) < 0.0001:
                    print('\nModel successfully overfit images\n')
                    return model, model_hyperparameters
                if e+1 == epoch:
                    print('\nModel failed to overfit images\n')

    # Document the training time for the model and return the trained model and hyperparameters
    model_hyperparameters['training_time'] = np.around((time.time() - t0)/60, decimals=1)
    return model, model_hyperparameters


def o2_model_backprop(model, data_loader, optimizer, criterion):
    '''
    Purpose:
        - Conduct backpropogation on a model for data from a dataloader
    Parameters:
        - model = inputted model
        - data_loader = generator for data to provide model training
        - optimizer = defined optimizer for backpropogation
        - criterion = the loss calculation method
    Returns:
        - model = model after cycling through the data_loader (one epoch of training)
        - ave_training_loss = averaged training loss per batch of data
    '''
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Set device to GPU if available
    torch.cuda.empty_cache() # refresh GPU memory before starting
    model.to(device) # Move model to device
    epoch_train_loss = 0 # initialize total training loss for this epoch
    model.train() # Set model to training mode to activate regularizations such as dropout

    for images, labels in data_loader: # cycle through training data to conduct backpropogation
        images, labels = images.to(device), labels.to(device) # move data to GPU

        optimizer.zero_grad() # clear gradient history
        log_out = model(images) # run images through model to get logarithmic probability
        loss = criterion(log_out, labels) # calculate loss (error) for this image batch based on criterion

        loss.backward() # backpropogate gradients through model based on error
        optimizer.step() # update weights in model based on calculated gradient information
        epoch_train_loss += loss.item() # add training loss to total train loss this epoch, convert to value with .item()

    ave_training_loss = epoch_train_loss / len(data_loader.dataset) # determine average loss per training image
    return model, ave_training_loss # return the updated model and the average training loss


def o3_model_no_backprop(model, data_loader, criterion):
    '''
    Purpose:
        - Use the model to conduct predictions using the model
        - Return performance of the predictions across the data
    Parameters:
        - model = inputted model
        - data_loader = generator for data to conduct predictions
        - criterion = the loss calculation method
    Returns:
        - val_count_correct = number of correctly predicted data items
        - ave_validate_loss = averaged criterion loss per batch of data
    '''
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Set device to GPU if available
    torch.cuda.empty_cache() # refresh GPU memory before starting
    model.to(device) # Move model to device
    epoch_valid_loss = 0 # initialize total validate loss for this epoch
    val_count_correct = 0 # initialize total correct predictions on valid set
    model.eval() # set model to evaluate mode to deactivate generalizing operations such as dropout and leverage full model

    with torch.no_grad(): # turn off gradient tracking and calculation for computational efficiency
        for images, labels in data_loader: # cycle through validate data to observe performance
            images, labels = images.to(device), labels.to(device) # move data to GPU

            log_out = model(images) # run images through model to get logarithmic probability
            loss = criterion(log_out, labels) # calculate loss (error) for this image batch based on criterion
            epoch_valid_loss += loss.item() # add validate loss to total valid loss this epoch, convert to value with .item()

            out = torch.exp(log_out) # obtain probability from the logarithmic probability calculated by the model
            highest_prob, chosen_class = out.topk(1, dim=1) # obtain the top classes and probabilities from the output
            equals = chosen_class.view(labels.shape) == labels # determine how many correct matches were made in this batch
            val_count_correct += equals.sum()  # add the count of correct matches this batch to the total number this epoch

        ave_validate_loss = epoch_valid_loss / len(data_loader.dataset) # determine average loss per validate image
    return val_count_correct, ave_validate_loss # return this epoch's total correct predictions and average training loss


def o4_control_model_grad(model, control=False):
    '''
    Purpose:
        - Input a model and control active gradients on parameters at various layer depths
        - Print which layes have been controlled
    Parameters:
        - model = inputted model
        - control = whether to activate or deativate gradients
    Returns:
        - model = edited model with controlled layers
    '''
    # NOTE: Don't use model.children for network_depth, as this does not capture sublayers!
    network_depth = len(list(model.modules())) # Obtain the length of the layers used in the network
    param_freeze_depth = network_depth // 3 # Define what fraction of the network will be frozen and unfrozen
    controlled_layers = [] # Initialize the controlled layers list that will track what is frozen and unfrozen
    layer_depth = 0 # Initialize the start for iterating through layers

    for layer in list(model.modules()): # Iterate through layers in the model
        layer_depth += 1 # Increase current layer depth by 1 to progress through network layers

        if (network_depth - param_freeze_depth) <= layer_depth: # Once sufficiently deep, control layers
            controlled_layers.append(layer._get_name()) # Add current layer's name to list of controlled layers
            for param in layer.parameters(): # Iterate through the parameters in this layer
                param.requires_grad = control # Freeze or unfreeze the gradient on the parameter

        if layer._get_name() == 'Linear': # The fully connected layers are always unfrozen
            for param in layer.parameters(): # Iterate parameters
                param.requires_grad = True # Set gradient to true

    print(f'\n Toggle requires_grad = {control}: ', controlled_layers, '\n') # Print changes made to active grads
    return model # Return model with changed param activity


def o5_plot_training_history(model_name, model_hyperparameters, file_name_scheme, train_type='loaded'):
    '''
    Purpose:
        - Plot the training and validation loss history for the inputted model
        - Plot lines indicating when layers were activated and deactivated if controlled
        - Save the plot with the name according to the type of training, skip if a loaded version
    Parameters:
        - model_name = name of model, used for title on plot
        - model_hyperparameters = contains the history for plotting
        - file_name_scheme = directory and naming convention for loading
        - train_type = offer control on saving
    Returns:
        - none
    '''
    # Plot training history information
    plt.clf()
    plt.plot(model_hyperparameters['training_loss_history'], label='Training Training Loss')
    plt.plot(model_hyperparameters['validate_loss_history'], label='Validate Training Loss')

    # If deep layer training has started, plot dotted lines for start and finish
    if model_hyperparameters['epoch_on']:
        plt.vlines(
            colors = 'black',
            x = model_hyperparameters['epoch_on'],
            ymin = min(model_hyperparameters['training_loss_history']),
            ymax = max(model_hyperparameters['training_loss_history'][3:]),
            linestyles = 'dotted',
            label = 'Deep Layers Activated'
        ).set_clip_on(False)
        plt.vlines(
            colors = 'black',
            x = (model_hyperparameters['epoch_on'] + model_hyperparameters['running_count']),
            ymin = min(model_hyperparameters['training_loss_history']),
            ymax = max(model_hyperparameters['training_loss_history'][3:]),
            linestyles = 'dotted',
            label = 'Deep Layers Deactivated'
        ).set_clip_on(False)

    # Plot title and labels
    plt.title(model_name)
    plt.ylabel('Total Loss')
    plt.xlabel('Total Epoch ({})'.format(len(model_hyperparameters['training_loss_history'])))
    plt.legend(frameon=False)

    # If the plot was not loaded, save the plot using the naming convention
    if train_type != 'loaded':
        plt.savefig(file_name_scheme + '_training_history_' + train_type + '.png')
        print('Saved', train_type, 'training history to project directory')

    # Show plot and unblock to allow function continuation, pause to load image and avoid from freezing
    plt.show(block=False)
    plt.pause(2)
    plt.clf()
    plt.close()

def o6_predict_data(model, data_loader, dict_data_labels, dict_class_labels, topk=5):
    '''
    Purpose:
        - Compute probabilities for various classes for an image using a model
    Parameters:
        - model = trained deep neural net for computation
        - data_loader = generator for data items to be iterated through for parallel prediction
        - dict_data_labels = dictionary containing the names of each class for the data indexes
        - dict_class_labels = dictionary containing the class indexes for the data indexes
        - topk = number of class outputs
    Returns:
        - dict_prediction_results = dictionary containing predictions and probabilities for data keys
    '''
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # set device to GPU if available
    torch.cuda.empty_cache() # refresh GPU memory before starting
    model.to(device) # move model to device
    dict_prediction_results = {} # initialize the prediction results dictionary
    model.eval() # set model to evaluate mode to deactivate generalizing operations such as dropout and leverage full model
    if len(dict_class_labels) < topk: # confirm there are enough different classes to satisfy results ranking
        topk = len(dict_class_labels) # replace number of ranked results to the number of classes if not

    with torch.no_grad(): # turn off gradient tracking and calculation for computational efficiency
        for image, filenames in data_loader: # cycle through data for inference
            image = image.to(device) # move data to GPU

            log_out = model(image) # run images through model to get logarithmic probability
            model_output = torch.exp(log_out) # obtain probability from the logarithmic probability
            probabilities, class_indexes = model_output.topk(topk, dim=1) # obtain the top results

            for index in np.arange(len(filenames)):# iterate through filenames batch with index
                # Find the class prediction name by comparing the class label dictionary to the data label dictionary
                if dict_data_labels:
                    class_prediction = [dict_data_labels[dict_class_labels[value]] for value in class_indexes.tolist()[index]]
                else:
                    class_prediction = [dict_class_labels[value] for value in class_indexes.tolist()[index]]
                # Then add this filename to the prediction results dictionary with the corresponding results and
                dict_prediction_results[filenames[index]] = [class_prediction, probabilities.tolist()[index]]
    return dict_prediction_results # Return


def o7_show_prediction(data_dir, dict_prediction_results):
    '''
    Purpose:
        - Randomly choose a piece of data from the predict folder
        - Display the chosen data and display the class outputs and corresponding probabilities
    Parameters:
        - data_dir = pathway to the data directory containing the data of interest
        - dict_prediction_results = dictionary of the prediction results on the dataset of interest
    Returns:
        - none
    '''
    # Randomly select an example file to conduct a prediction
    example_prediction = random.choice(list(dict_prediction_results.keys()))

    # Open and show the prediction
    plt.imshow(Image.open(data_dir + 'predict/' + example_prediction)); # no need to process and inverse transform, our data is coming from the same path, I'll just open the original
    plt.show(block=False)
    plt.pause(2)
    plt.close()

    # Plot the predicted class, the probabilities, and use the data's filename for the title
    plt.bar(dict_prediction_results[example_prediction][0], dict_prediction_results[example_prediction][1])
    plt.title(example_prediction)
    plt.xticks(rotation=20);
    plt.show(block=False)
    plt.pause(3)
    plt.close()


## Main Function

In [8]:
def main():
    '''
    # Retrieve command line arguments to dictate model type, training parameters, and data
    # Load image datasets, process the image data, and convert these into data generators
    # Create a default naming structure to save and load information at a specified directory
    # Download a pretrained model using input arguments and attach new fully connected output Layers
    # Define criterion for loss, if training is required by the input arg, execute the following:
    #    o Prompt user for overfit training, if yes, initiate training against pretrained features
    #    o Prompt user for complete training, if yes, initiate training against pretrained features
    #    o Save the hyperparameters, training history, and training state for the overfit and full models
    # If training is no requested by the input arg, execute the following:
    #    o Load in a pretrained model's state dict and it's model_hyperparameters
    #    o Display the training history for this model
    # Provide prompt to test the model and perform and display performance if requested
    # Provide prompt to apply the model towards inference and put model to work if requested
    # Show an example prediction from the inference
    '''
    # Call ArgumentParser for user arguments and store in arg
    data_dir = os.path.expanduser('~') + '/Programming Data/' + argdir + '/'

    # Call data processor to return a dictionary of datasets, the data labels, and the class labels
    dict_datasets, dict_data_labels, dict_class_labels = u2_load_processed_data(data_dir)

    # Call data iterator to convert dictionary of datasets to dictionary of dataloaders
    dict_data_loaders = u4_data_iterator(dict_datasets)

    #Create file pathway and naming convention saving and loading files in program
    file_name_scheme =  data_dir + 'models/' + os.path.basename(os.path.dirname(data_dir))\
                    + '_' + argmodel + '_' + str(arglayer) + 'lay'
    print(file_name_scheme)
    # Call create classifier to return a model leveraging a desired pretrained architecture, define loss criterion
    model = m1_create_classifier(argmodel, arglayer, len(dict_datasets['train_data'].classes))
    criterion = nn.NLLLoss()

    # Define start condition hyperparameters and key running information such as elapsed training time
    # epoch_on and running_count refer to the epoch in which deeper layers started training and for how long
    model_hyperparameters = {'learnrate': arglearn,
                         'training_loss_history': [],
                         'validate_loss_history': [],
                         'epoch_on': [],
                         'running_count': 0,
                         'weightdecay' : 0.00001,
                         'training_time' : 0}

    # If user requests load, call load checkpoint to return model and hyperparameters, then plot loaded information
    if argload == 'y':
        model, model_hyperparameters = m3_load_model_checkpoint(model, file_name_scheme)
        o5_plot_training_history(argmodel, model_hyperparameters, file_name_scheme)

    # If user requests train, first display an example piece of data from the processed training set
    if argtrain == 'y':
        # NOTE 1: Processed data is tensor shape [xpixel, ypixel, colour], matplotlib takes order [c, x, y], so we transpose
        # NOTE 2: Plotted images blocks function continuation, unblock requires pause to load image or image will freeze
        print('Displaying an example processed image from the training set..\n')
        plt.imshow(random.choice(dict_datasets['train_data'])[0].numpy().transpose((1, 2, 0))) # NOTE: 1
        plt.show(block=False) # NOTE: 2
        plt.pause(2)
        plt.close()

        # Call train model with model and training dataset to return trained model and hyperparameters, then plot and save
        model, model_hyperparameters = o1_train_model(model, dict_data_loaders['train_loader'],
                        dict_data_loaders['valid_loader'], argepoch, 0.6, model_hyperparameters, criterion)
        o5_plot_training_history(argmodel, model_hyperparameters, file_name_scheme, 'complete')

        # Prompt user to save, save the model and its hyperparameters per the naming convention
        if u5_time_limited_input('Would you like to save the model?'):
            m2_save_model_checkpoint(model, file_name_scheme, model_hyperparameters)

    # If user requests no load and no train, prompt to run an overfit training exercise and execute if requested
    # NOTE: Same as training but on an overfit dataset. overfit_model metadata references the same data as the model metadata
    if argtrain == 'n' and argload == 'n':
        if u5_time_limited_input('Check model can overfit small dataset?'):
            overfit_model, overfit_model_hyperparameters = o1_train_model(model, dict_data_loaders['overfit_loader'],
                            dict_data_loaders['valid_loader'], argepoch, 0.9, model_hyperparameters, criterion)
            o5_plot_training_history(argmodel, overfit_model_hyperparameters, file_name_scheme, 'overfit')

    # If user requests load, or has requested training and training has completed, the model is ready for predictions
    if argtrain == 'y' or argload == 'y':
        print('The model is ready to provide predictions\n')

        # Prompt to test the model's performance
        # Gives the testing data loader to the validation function and returns performance
        if u5_time_limited_input('Would you like to test the model?'):
            t0 = time.time()
            test_count_correct, ave_test_loss = o3_model_no_backprop(model, dict_data_loaders['test_loader'], criterion)
            print('\nTesting Loss: {:.3f}.. '.format(ave_test_loss),
                'Testing Accuracy: {:.3f}'.format(test_count_correct / len(dict_data_loaders['test_loader'].dataset)),
                'Runtime - {:.0f} seconds\n'.format((time.time() - t0)))

        # Prompt the user to use the model for inference
        # Gives an unlabeled dataloader to a predict function and returns predictions
        if u5_time_limited_input('Would you like to use the model for inference?'):
            t1 = time.time()
            dict_prediction_results = o6_predict_data(model, dict_data_loaders['predict_loader'],
                            dict_data_labels, dict_class_labels)
            o7_show_prediction(data_dir, dict_prediction_results)
            print('Runtime - {:.0f} seconds\n'.format((time.time() - t1)),
                            [dict_prediction_results[key][0][0] for key in dict_prediction_results])


if __name__ == "__main__":
    main()

C:\Users\lukea/Programming Data/Flower_data/models/Flower_data_googlenet_2lay

Using  googlenet  with the following attached  2  layer classifier:
 Classifier(
  (fc1): Linear(in_features=1024, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=102, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)
Check model can overfit small dataset?: 'y' for yes, 'n' for no (10 seconds to choose): y
User input = Yes

Using GPU = NVIDIA GeForce GTX 1050 4 GB


  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Epoch: 1/50.. Train Loss: 0.576.. Valid Loss: 0.041.. Valid Accy: 0.01.. Mem: 0.03GB.. Time: 0min


KeyboardInterrupt: 