# CIFAR 10 Classification using Neural networks

CIFAR 10 is a popular image classification dataset. It provides labeled images across different classes. The challenge is to correctly classify them in their respective classes. More information about the dataset can be found at their website -  https://www.cs.toronto.edu/~kriz/cifar.html

Neural networks are the state of the art in most of the computer vision challenges. In this tutorial, we present detailed steps to create a simple network for the classification task. 

The entire notebook has been created in Python. We start by importing some libraries. 


### Import the required libraries

Get the required python libraries for performing the classification task. Numpy provides support for performing scientific computation. Matplotlib is used to display the plots created during the task. 

Apart from them, we also import the libraries provided by Azure Machine learning. The core library from AzureML helps us lay the foundations for carrying out the task on the Azure platform. We import *Workspace* in the below step and display the Azure ML SDK version installed on your machine.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

import azureml.core
from azureml.core import Workspace

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

### Import Pytorch libraries provided with the Azure platform

In this tutorial, we will create a script using Pytorch. Azure Machine learning platform supports running pytorch scripts. To do the same, we import the corresponding module from the Deep Neural Network library provided by Azure.

In [None]:
from azureml.train.dnn import PyTorch

### Configure the workspace

Once we have all the libraries with us, the first step we take is to load the workspace where our actual work will happen. A workspace is associated with an Azure subscription. This workspace provides us with all the tools that we need to complete machine learning tasks. 

In this tutorial, it is assumed that the workspace has already been setup before. We load the workspace details from the locally saved configuration file. We then print the details here just to make sure that everything is in order. 

In [None]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')


### Create the experiment

An experiment is exactly what is stands for. In performing a task, we will need to perform several experiments to get to the best machine learning model. In doing so, we will be playing with parameters to perform several *runs* of the experiment. 

Similarly, we create an experiment here which is used to perform different runs. The experiment is created in the workspace. We provide a suitable name to store the experiment. This name is visible in the azure portal along with other required details. These two are provided as arguments to the Experiment class which is again imported from the AzureML core library. 

In [None]:
experiment_name = 'cifar10-classification'

from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)

### Utilize the workspace compute resources

Once we are ready to perform an experiment, we need computational resources. We use one of the clusters provided with the subscription to run the experiment. 

In the snippet below, we configure and utilize the cpucluster for this experiment. After importing the compute libraries, we choose a name for the cluster. We also provide the method to create the cluster in case one is not available.

In [None]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpucluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6/STANDARD_D2_V2
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_STANDARD_NC6", "STANDARD_D2_V2")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

### Upload the dataset from your local folder to the Azure datastore

The next step is to have the data ready to feed to the model which will be created. The workspace provides a default datastore where we can upload our dataset. We print the details of the default datastore to cross-check the same. Once we are ready, we upload the dataset from our local directory to the cloud datastore. 

In the upload command, we specify the following - 

* src_dir - Path to the local directory where the dataset is kept
* target_path - Location on the datastore relative to the root of the datastore
* overwrite - Flag to replace the existing data
* show_progress - Time remaining in the upload process

The upload step took ~3 mins for the CIFAR10 dataset for us.

In [None]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

#ds.upload(src_dir='../682/assignment2/cs682/datasets', target_path='cifar10', overwrite=True, show_progress=True)

### Initialize the Estimator

Now that we have the dataset and the compute resources associated with our experiment, we need to perform the training step. For the training step, we run the script using an Estimator class. This class makes it convenient for us to connect everything that we have setup till now. The estimator initializes the code written by us on the compute target and streamlines the dataset loading process for the training step. 

In the below cell, we initialize a dictionary to provide the command line parameters to our training script. The *script_params* dictionary contains the argument pointing it to the datastore on the workspace where our dataset was uploaded previously. 

We then use the pytorch estimator imported earlier in the notebook. These are the parameters accepted by the estimator - 

* source_directory - The local directory which contains the training script 
* script_params - Dictionary containing the command line arguments for the training script
* compute_target - The compute resources provided along with the subscription to utilize for the experiment
* entry_script - The script which starts the training process
* use_gpu - Flag indicating whether GPU resources are to be used during the training

In [None]:
script_params = {
    '--data-folder': ds.as_mount()
}

pt_est = PyTorch(source_directory='./your_code',
                 script_params=script_params,
                 compute_target=compute_target,
                 entry_script='2layerfcnet.py',
                 use_gpu=True)

### Submit the Experiment

With the initialization of the estimator, we are all set to run the experiment at hand. We submit the same with the estimator created in the previous step. 

This starts off the sequence of events at the end of which we have a model with some results to show. We can close everything off and depending on the complexity of the model go for a jog or decide to call it a day. 

In [None]:
run = exp.submit(pt_est)

### Visualizing the submission

However, we also have the option to track what is happening during the course of the experiment. To do that, we again import some libraries and with the submitted experiment's name, we extract its details using the below command - 

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

### Moving forward - Automated Hyperparameter search

Our basic model consisting of fully connected layers achieves an accuracy of ~49%. However, is that the best that can be achieved with this model? Hmmmm.....

In the training script, we fixed the learning rate arbitrarily. We could instead choose to explore the effect of changing this hyperparameter to our accuracy. 

We can use the neat hyperparameter search feature provided along with the Azure platform to perform this sub-task. The main advantage of doing this is that we can run several training jobs parallely to select the best hyperparameter for the dataset. We will provide the bounds over which this search takes place along with the parameter sampling policy. 

In the current tutorial, we perform random sampling from a uniform distribution over the min and max values for the learning rate. The required classes are imported from the azureml.train.hyperdrive library and initialized accordingly. 

In [None]:
from azureml.train.hyperdrive import RandomParameterSampling,normal,uniform,choice
param_sampling = RandomParameterSampling( {
                    "learning_rate": uniform(0.0001, 0.1),
}
)

### Configuring the hyperparameter sweep

Azure platform will run several parallel jobs for the purpose of extracing the most suitable hyperparameters. We can speed this up by terminating some runs which perform poorly. This helps free up resources where waiting jobs can be run. This in turn reduces the time taken to get the *learning_rate* parameter. 

We demonstrate the bandit policy as the termination criteria here. More information regarding termination policies can be found here - https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py

We then assign the estimator to be run for the hyperparameter sweep task which is the same one we have worked with earlier. The termination policy is specified along with the sampling policy to be used by the HyperDriveRunConfig class to optimize the specified primary metric. 

In [None]:
from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5)

from azureml.train.hyperdrive import HyperDriveRunConfig,PrimaryMetricGoal
hyperdrive_run_config = HyperDriveRunConfig(estimator=pt_est,
                                           hyperparameter_sampling=param_sampling,
                                           policy=early_termination_policy,
                                           primary_metric_name="best_val_accuracy",
                                           primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                           max_total_runs=100)

### Submit the job

We submit the new run to the Azure platform again. 

In [None]:
hyperdrive_run = exp.submit(hyperdrive_run_config)

### Visualizing the Hyperparameter sweep submission

We use the previous module with the new argument to visualize this run of the experiment here. We may alternatively view this in the Azure portal within our workspace. 

In [None]:
RunDetails(hyperdrive_run).show()

### Analysis

The best model achieved an accuracy of ~57% after optimizing the hyperparameters. Instead of manually performing trial and error of hyperparameter values, we were able to achieve the same with much less effort using the automated hyperparameter feature described here.  