In [None]:
%matplotlib inline


SET-UP: Training a Classifier with PyTorch
==================================

What about data?
----------------

Generally, when you have to deal with image, text, audio or video data,
you can use standard python packages that load data into a numpy array.
Then you can convert this array into a ``torch.*Tensor``.

-  For images, packages such as Pillow, OpenCV are useful
-  For audio, packages such as scipy and librosa
-  For text, either raw Python or Cython based loading, or NLTK and
   SpaCy are useful

Specifically for vision, we have created a package called
``torchvision``, that has data loaders for common datasets such as
Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz.,
``torchvision.datasets`` and ``torch.utils.data.DataLoader``.

This provides a huge convenience and avoids writing boilerplate code.

For this tutorial, we will use the CIFAR10 dataset.
It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,
‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of
size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

.. figure:: /_static/img/cifar10.png
   :alt: cifar10

   cifar10


Training an image classifier
----------------------------

We will do the following steps in order:

1. Load and normalizing the CIFAR10 training and test datasets using
   ``torchvision``
2. Define a Convolutional Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

1. Loading and normalizing CIFAR10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using ``torchvision``, it’s extremely easy to load CIFAR10.

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms

The output of torchvision datasets are PILImage images of range [0, 1].
We transform them to Tensors of normalized range [-1, 1].



In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
mytrainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
mytestloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 
           'dog', 'frog', 'horse', 'ship', 'truck')

Let us show some of the training images, for fun.



In [None]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
mydataiter = iter(mytrainloader)
myimages, mylabels = mydataiter.next()

# show images
imshow(torchvision.utils.make_grid(myimages))
# print labels
print(' '.join('%5s' % classes[mylabels[j]] for j in range(4)))

2. Define a Convolutional Neural Network
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy the neural network from the Neural Networks section before and modify it to
take 3-channel images (instead of 1-channel images as it was defined).



In [None]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


mynet = Net()

3. Define a Loss function and optimizer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's use a Classification Cross-Entropy loss and SGD with momentum.



In [None]:
import torch.optim as optim

mycriterion = nn.CrossEntropyLoss()
myoptimizer = optim.SGD(mynet.parameters(), lr=0.001, momentum=0.9)

4. Train the network
^^^^^^^^^^^^^^^^^^^^

This is when things start to get interesting.
We simply have to loop over our data iterator, and feed the inputs to the
network and optimize.



In [None]:
%%time

def train(trainloader):

    for epoch in range(2):  # loop over the dataset multiple times

        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs
            inputs, labels = data

            # zero the parameter gradients
            myoptimizer.zero_grad()

            # forward + backward + optimize
            outputs = mynet(inputs)
            loss = mycriterion(outputs, labels)
            loss.backward()
            myoptimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 2000 == 1999:    # print every 2000 mini-batches
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 2000))
                running_loss = 0.0

    print('Finished Training')
    return outputs

myoutputs = train(mytrainloader)

5. Test the network on the test data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have trained the network for 2 passes over the training dataset.
But we need to check if the network has learnt anything at all.

We will check this by predicting the class label that the neural network
outputs, and checking it against the ground-truth. If the prediction is
correct, we add the sample to the list of correct predictions.

Okay, first step. Let us display an image from the testset and compare the predictions to the ground truth.


In [None]:
mydataiter = iter(mytestloader)
myimages, mylabels = mydataiter.next()
myoutputs = mynet(myimages)
_, mypredicted = torch.max(myoutputs, 1)

# print images
imshow(torchvision.utils.make_grid(myimages))
print('GroundTruth: ', ' '.join('%5s' % classes[mylabels[j]] for j in range(4)))
print('Predicted  : ', ' '.join('%5s' % classes[mypredicted[j]] for j in range(4)))

The outputs are energies for the 10 classes.
The higher the energy for a class, the more the network
thinks that the image is of the particular class.
So, let's get the index of the highest energy:



Now, let's look at how the network performs on the entire dataset.

In [None]:
%%time

def getAccuracy(testloader):

    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = mynet(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: %d %%' % (
        correct / total * 100.))
    return correct / total * 100.
    
myaccuracy = getAccuracy(mytestloader)

While those results are far from perfect (we will try to make them better in the next exercises), they still look better than random chance, which is 10% accuracy (randomly picking a class out of 10 classes).
It looks like the network learnt something.

Hmmm, what are the classes that performed well, and the classes that did
not perform well:



In [None]:
%%time

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in mytestloader:
        images, labels = data
        outputs = mynet(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

EXERCISE 1: Supervised Learning & Learning Curves
===========================================

Now, let's build a first learning curve to see how the classifier's accuracy improves with the amount of data that is used to train it. To do that, we will add data progressively, batch after after, and plot the distribution of the accuracy vs. the number of records used for training. 

In [None]:
%%time
%run ActiveLearningClass.ipynb

num_steps = 5     # <---- Play with those values. CIFAR-10 has 50,000 records for training, and 10,000 for testing
maximum = 10000   # <----
nps = int(maximum / num_steps)

myActiveStrategy = ActiveStrategy(mynet, nsteps=num_steps)

myActiveStrategy.verbose = False

myActiveStrategy.init_loaders()
myActiveStrategy.incremental_supervised()
mySupervised = myActiveStrategy.run_experiment(num_steps, maximum)
    
plt.plot([nps*i for i in range(1, num_steps + 1)], mySupervised, '--b', marker="o")
plt.show()

As expected, the accuracy increases (overall) when more training data is added to train the model. 

QUESTION: What additional experiment(s) would you run to gain more understanding of the training set? What is the purpose of running such an experiment?

The question now is, can we do better? That's the question that Active Learning is trying to address...

In the next exercise, we will start with the easiest, most intuitive Active Learning strategy: confidence-level.


EXERCISE 2: Uncertainty-Based Strategies
===================================

Pooling Approach with Uncertainty-Based Strategy
------------------------------------------------------------

Let's start with the very first loop. We will select go through the following steps: 
1. We will select the first training set randomly.
2. We will train the model with that sample.
3. We will use the trained model for inferrence on the rest of the dataset.
4. We will select the 'nps' records with the lowest confidence and retrain the model with it.

Below, we perfect step 1. and 2.:

In [None]:
%%time
import random

unlabeled = [i for i in range(len(trainset))]
labeled   = []

to_be_labeled = # generate a random sample of the data, size nps (defined previously)
unlabeled = # update the unlabeled array once you defined to_be_labeled

myActiveStrategy.run_one(to_be_labeled)

Let's start with the very first loop. We will select go through the following steps: 
1. We will select the first training set randomly.
2. We will train the model with that sample.
3. We will use the train model for inferrence on the rest of the dataset.

We just trained the model using a random sample of size 'nps' (the size we choose for each step), to initialize our Active Learning process.

Now, let's move on to step 3. and 4.:

In [None]:
myResults = myActiveStrategy.infer(unlabeled)

# Below, we update the to_be_labeled and unlabeled arrays, in preparation for the next loop...
# the strategy is to select the records that were inferred with the lowest confidence in the previous loop...
sorted_by_conf = # fill here; remember that the size of the array will be 'nps'

# update to_be_labeled:
to_be_labeled.extend(''' the data that should be added to the rest before retraining ''')
unlabeled = # update unlabeled

# And run one more loop...
accuracy = myActiveStrategy.run_one(to_be_labeled)

Now, let's implement the whole process and run all the loops.

In [None]:
%%time

def confidenceAL(nsteps, stepSize):
    
    results = []
    
    unlabeled = [i for i in range(len(trainset))]
    labeled   = []

    to_be_labeled = # ???
    unlabeled = # ???
    myres = myActiveStrategy.run_one(to_be_labeled)
    results.append(myres)
    
    for n in range(1, nsteps):
        # Fill algorithm here getting inspiration from the previous exercise
        
    return results
    
myConfidenceAL = confidenceAL(num_steps, nps)

Aren't you curious to know how good those results are? We can now compare the learning curve that we obtain with the one we got for the supervised approach.

In [None]:
plt.plot([nps*i for i in range(1, num_steps + 1)], mySupervised,   '--b',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL, '--r',
         marker="o")

Now, the ActiveStrategy Class already comes with a method you can use to "onboard" any querying strategy you would like. Here is how it works:


Implement the confidence level based strategy using this method.
Then, run the process; you can move on to the next exercise while training is in progress here.

In [None]:
%%time

def update_function(inferred_res, nRec):
    ranked = # Sort and select the best nRec elements
    selected = [rec[0] for rec in ranked]
    return selected
    
myConfidenceAL2 = myActiveStrategy.run_ConfidenceAL(update_function, num_steps, maximum)
plt.plot([nps*i for i in range(1, num_steps + 1)], mySupervised,    '--b',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL,  '--r',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL2, '--g',
         marker="o")

Now, for our strategy, we have been selecting the records for which the lowest confidence was observed in inferrence, assuming that these data points would be the ones that the model is the most confused about. While this is a sound strategy with a fairly clean dataset, in real life, it could lead to injecting more and more spam/noise into the training set.

To help, let's start by drawing the distribution of the confidence level for the last inferences we ran:

In [None]:
%%time

inferred_CL = [ c[3].item() for c in myResults ]
print(inferred_CL[:10])
print(len(inferred_CL))

n, bins, patches = plt.hist(x=inferred_CL, bins=30, color='#0504aa',
                            alpha=0.7, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('Confidence of Prediction')
plt.ylabel('Frequency')
plt.title('Confidence Level')
maxfreq = n.max()

plt.ylim(top=(np.ceil(maxfreq / 10) * 10 * 1.05) if maxfreq % 10 else (maxfreq + 10) * 1.05)

QUESTION: Before moving on, discuss what a good choice for nps might be here (so far, we have taken that value arbitrarily). What experient would you design to automatically choose an optimal "nps" value. Please discuss.

A Better Confidence Level-Based Strategy
--------------------------------------------------

Implement an alternative confidence level-based strategy where very low confidence records get cut out, and where you selected the lowest confidence records from the "medium" confidence sample. Experiment with different cutoffs and compare the results.

In [1]:
%%time

def update_CL_improved(inferred_res, nRec, nFilter=500): # <--- we can play with nFilter value later
    ranked = # ???
    selected = # ???
    return selected

myConfidenceAL3 = myActiveStrategy.run_ConfidenceAL(update_CL_improved, num_steps, maximum)
plt.plot([nps*i for i in range(1, num_steps + 1)], mySupervised,    '--b',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL2, '--g',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL3, '--p',
         marker="o")

SyntaxError: invalid syntax (<unknown>, line 3)

Optional Exercise:
----------------------
WARNING: THIS IS VERY LONG TO RUN!!! You're invited to try it after the session, or after you are done with the other steps.

Let's draw the final accuracy as a function of the 'nFilter' value:

In [None]:
%%time

finalAccuracies = []
stepValues = [500, 1000, 2000, 5000] # <-- customize this to experiment different values
for nf in stepValues:
    print(">>>>> Running algorithm for {0}".format(nf))
    myAcc = # Fill here
    finalAccuracies.append(myAcc)

plt.plot(stepValues, [a[-1] for a in finalAccuracies] , '--b', marker="o")

QUESTION: What are the learning from the plot that you obtained? What further studies does this inspire to you?

Streaming Approach with Uncertainty-Based Strategy
---------------------------------------------------------------

Now, instead of a pooling approach, let's use a streaming approach. Instead of using a fixed number of records at each step, you can now use a rule; for instance, instead of selecting 'n' best records, use a threshold in confidence level. What are the benefits and challenges with each methodology?

In [None]:
%%time

def update_streaming(inferred_res, threshold=0.075):
    next_loop = # Fill here
    return next_loop
    
myStreamingAL, myStepSizes = myActiveStrategy.run_StreamingAL(update_streaming, num_steps, maximum)
plt.plot(myStepSizes, myStreamingAL , '--b', marker="o")

Now, let's define the threshold based a certain percentile value for the inferred array (in other term, we choose what percentage of the data we keep at each step). Does that remind you of something? :-)

In [None]:
def update_streaming_perc(inferred_res, threshold=5): # <-- here, threshold is the amount of data we want to keep
    # Using the np.percentile function, fill the function
    
myStreamingAL2, myStepSizes2 = myActiveStrategy.run_StreamingAL(update_streaming_perc, num_steps, maximum)
plt.plot(myStepSizes2, myStreamingAL2 , '--b', marker="o")

QUESTION: In the code above, a major approximation was made. Do you see what it is, and how would you change it?

Margin Sampling-Based Strategy
---------------------------------------

In [None]:
%%time
import operator

# Explain what this function does...
def update_margin(inferred_res, nRec):
    for k in len(inferred_res):
        inferred_res[k].extend(zip(*sorted(enumerate(inferred_res[k]), key=operator.itemgetter(1)))[0][-2:])
    ranked = sorted(inferred_res, key=lambda x: x[5] - x[4], reverse=False)[:nRec]
    selected = [rec[0] for rec in ranked]
    return selected

myConfidenceAL4 = myActiveStrategy.run_ConfidenceAL(update_function, num_steps, maximum)
plt.plot([nps*i for i in range(1, num_steps + 1)], mySupervised,    '--b',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL3, '--g',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL4, '--p',
         marker="o")

Now, let's try a different type of strategy, called:


EXERCISE 3: Query-By-Committee Strategies
=====================================

For Query-By-Committee strategies, we will take an approach similar to ensemble methods in Supervised Learning.
Instead of training only one classifier, we will train several algorithms, and decide on which data to select for the next loop based on the level of disagreement between them.

For our case, we could use several variations (with slightly different hyperparameters) of the same model and start by initializing different similar classifiers.

Using the code below, try to run a QbC strategy by making the necessary adjustements on the model. What changes are necessary, and does the model actually need to be modified? Please comment...

In [None]:
%%time

unlabeled = [i for i in range(len(trainset))]
labeled   = []

to_be_labeled = random.sample(unlabeled, nps)
unlabeled = [i for i in range(len(unlabeled)) if i not in to_be_labeled]

myAccuracy1 = myActiveStrategy.run_one(to_be_labeled)
myResults1  = myActiveStrategy.infer(unlabeled)
myAccuracy2 = myActiveStrategy.run_one(to_be_labeled)
myResults2  = myActiveStrategy.infer(unlabeled)
myAccuracy3 = myActiveStrategy.run_one(to_be_labeled)
myResults3  = myActiveStrategy.infer(unlabeled)

disagreement = []

for r in range(len(unlabeled)):
    dis = 0
    if myResults1[r][2] != myResults2[r][2]:
        dis += 1
    if myResults2[r][2] != myResults3[r][2]:
        dis += 1
    if myResults1[r][2] != myResults3[r][2]:
        dis += 1
    disagreement.append(dis)
    
print(disagreement[0:10])
    
print("No        disagreements: {0} \n".format(disagreement.count(0)), 
      "One (1)   disagreement : {0} \n".format(disagreement.count(1)), 
      "Two (2)   disagreements: {0} \n".format(disagreement.count(2)), 
      "Three (3) disagreements: {0} \n".format(disagreement.count(3))

Now, let's implement use the previous approach as our querying strategy. Note that this takes a long time to run, so you're encouraged that you reduce your 'num_steps' and 'maximum' variables.

In [None]:
def QbCAL(nsteps, stepSize):
    
    results = []
    
    unlabeled = [i for i in range(len(trainset))]
    labeled   = []

    # Randomly sample what is to be labeled first...
    to_be_labeled = # ???
    unlabeled = # ???
    
    for n in range(0, 4):
        # Fill the function by following the same logic that previously within a loop
        
    return results
    
myQbCAL = QbCAL(num_steps, nps)

plt.plot([nps*i for i in range(1, num_steps + 1)], mySupervised,    '--b',
         [nps*i for i in range(1, num_steps + 1)], myConfidenceAL4, '--g',
         [nps*i for i in range(1, num_steps + 1)], myQbCAL,         '--p',
         marker="o")

EXERCISE 4: Build-Your-Own Strategy
===============================

Now, using what you have learned, develop the best querying strategy you can!