# Convolutional Neural Networks! #

Today we'll explore convnets on the [Cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. First we start with some magic incantations (and check to make sure we're using the GPU):


In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

## Getting the Cifar Data
Is pretty easy, we can just use the built in dataset functionality. This time we might use some data augmentation.

**`RandomCrop(32, padding=4)`** - This means we'll take random 32x32 crops out of the image zero padded with 4 pixels per size. Since our image was 32x32 this means we first zero pad to make it 40x40 (adding 4 pixels per side) and then take a 32x32 crop out of that. This means that the network sees slightly shifted around every time so it is harder to overfit to specific pixels in specific places. This forces the network to learn more robust filters and reduces overfitting.

**`RandomHorizontalFlip()`** - This means half the time we will flip the image horizontally. Same basically as above, the network sees shifted versions of the data so it's harder to overfit.

**Note:** data augmentation is turned off by default. We'll try to train the network normally and then see what affect data augmentation has.

In [None]:
def get_cifar10_data(augmentation=0):
  # Data augmentation transformations. Not for Testing!
  if augmentation:
    transform_train = transforms.Compose([
      transforms.RandomCrop(32, padding=4, padding_mode='edge'), # Take 32x32 crops from 40x40 padded images
      transforms.RandomHorizontalFlip(),    # 50% of time flip image along y-axis
      transforms.ToTensor(),
    ])
  else: 
    transform_train = transforms.ToTensor()

  transform_test = transforms.Compose([
    transforms.ToTensor(),
  ])

  trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
                                        transform=transform_train)
  trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True,
                                            num_workers=32)

  testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True,
                                      transform=transform_test)
  testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False,
                                          num_workers=32)
  classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
  return {'train': trainloader, 'test': testloader, 'classes': classes}

data = get_cifar10_data()

----------------
**Handy tip:** in colab you can run commands in the underlying virtual machine (outside of python) by prefixing with an exclamation mark, like this:

    !ls ./data

You can use that syntax to run arbitrary commands in the underlying virtual machine. For instance you can run your homework projects here:

    !git clone https://github.com/pjreddie/uwimg
    !cd uwimg; ls; make; ./uwimg test hw0

Note every time you call a command with `!` colab spawns a new shell so commands like `!cd` don't presist between lines.

I wouldn't do the C homework here or anything but this can be useful for installing dependencies.

-----------------

In [None]:
!ls ./data

Looks like our data is in the right folder! For CIFAR10 the training set is 50,000 images and the test set is 10,000:

In [None]:
print(data['train'].__dict__)
print(data['test'].__dict__)

### Visualizing Some Data ###
Our handy visualizations from last time

In [None]:
dataiter = iter(data['train'])
images, labels = next(dataiter)
print(images.size())

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print("Labels:" + ' '.join('%9s' % data['classes'][labels[j]] for j in range(8)))


flat = torch.flatten(images, 1)
print(images.size())
print(flat.size())

## Defining the Networks!##
We'll try out the SimpleNet from last time. Note: the number of inputs has changed since our input images is now a 32x32 RGB image (32x32x3 = 3072)

In [None]:
class SimpleNet(nn.Module):
    def __init__(self, inputs=3072, hidden=512, outputs=10):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(inputs, hidden)
        self.fc2 = nn.Linear(hidden, outputs)

    def forward(self, x):
        x = torch.flatten(x, 1) # Takes image-like to vector-like
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

### Defining the CNN

Our convolutional neural network is pretty simple to start. It has 3 convolutional layers followed by a fully connected layer.

**conv1**
 - Input: `32 x 32 x 3` image
 - 16 filters, size `3 x 3`, stride 2
 - Output: `16 x 16 x 16` image

**conv2**
 - Input: `16 x 16 x 16` image
 - 32 filters, size `3 x 3`, stride 2
 - Output: `8 x 8 x 32` image

**conv3**
 - Input: `8 x 8 x 32` image
 - 64 filters, size `3 x 3`, stride 2
 - Output: `4 x 4 x 64` image

**fc1**
 - Input: 1024 vector
 - Output: 10 vector (unnormalized class probabilities)

**Note:** after the 3rd convolutional layer we have to convert the feature map between tensor formats. It's in an image-like format (NxCxHxW) but fully-connected layers need it to be in a vector-like format (NxM)

To do that we just call our normal `x = torch.flatten(x,1)` on the feature map.

You can also see in the `forward` function we use the `relu` activation function after each convolutional layer.

In [None]:
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__() # https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
        # Input 32x32x3 image
        # 16 filters
        # 3x3 filter size (they also have 3 channels)
        # stride 2 (downsampling by factor of 2)
        # Output image: 16x16x16
        self.conv1 = nn.Conv2d(3, 16, 3, stride=2, padding=1)

        # Input 16x16x16 image
        # 32 filters
        # 3x3x16 filter size (they also have 16 channels)
        # stride 2 (downsampling by factor of 2)
        # Output image: 8x8x32
        self.conv2 = nn.Conv2d(16, 32, 3, stride=2, padding=1)


        # Exercise left to the reader
        # Output image: 4x4x64 -> 1024 neurons
        self.conv3 = nn.Conv2d(32, 64, 3, stride=2, padding=1)

        self.fc1 = nn.Linear(1024, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.conv3(x)
        x = F.relu(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        return x


### Training Code

Not much to see here... It's the same as last time I think.

In [None]:

def train(net, dataloader, epochs=1, lr=0.01, momentum=0.9, decay=0.0, verbose=1):
  net.to(device)
  losses = []
  criterion = nn.CrossEntropyLoss()
  optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum, weight_decay=decay)
  for epoch in range(epochs):
    sum_loss = 0.0
    for i, batch in enumerate(dataloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = batch[0].to(device), batch[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize 
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()  # autograd magic, computes all the partial derivatives
        optimizer.step() # takes a step in gradient direction

        # print statistics
        losses.append(loss.item())
        sum_loss += loss.item()
        if i % 100 == 99:    # print every 100 mini-batches
            if verbose:
              print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, sum_loss / 100))
            sum_loss = 0.0
  return losses

def accuracy(net, dataloader):
  correct = 0
  total = 0
  with torch.no_grad():
      for batch in dataloader:
          images, labels = batch[0].to(device), batch[1].to(device)
          outputs = net(images)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()
  return correct/total

def smooth(x, size):
  return np.convolve(x, np.ones(size)/size, mode='valid')

## Train the networks! ##

It's time.

First we'll start with SimpleNet:

### SimpleNet on Cifar10

In [None]:
net = SimpleNet(inputs=3072)

losses = train(net, data['train'], epochs=5, lr=.01)
plt.plot(smooth(losses,50))

print("Training accuracy: %f" % accuracy(net, data['train']))
print("Testing  accuracy: %f" % accuracy(net, data['test']))

### ConvNet on CIFAR 10

In [None]:
conv_net = ConvNet()

conv_losses = train(conv_net, data['train'], epochs=15, lr=.01)
plt.plot(smooth(conv_losses, 50))

print("Training accuracy: %f" % accuracy(conv_net, data['train']))
print("Testing  accuracy: %f" % accuracy(conv_net, data['test']))

In [None]:
plt.plot(smooth(losses,50), 'r-')
plt.plot(smooth(conv_losses, 50), 'b-')

### Simulated Annealing

It can be useful to slowly lower the learning rate over time so that the network converges to a better local optimum. Let's try it!

In [None]:
anneal_net = ConvNet()

anneal_losses =  train(anneal_net, data['train'], epochs=5, lr=.1)
anneal_losses += train(anneal_net, data['train'], epochs=5, lr=.01)
anneal_losses += train(anneal_net, data['train'], epochs=5, lr=.001)

plt.plot(smooth(anneal_losses, 50))

print("Training accuracy: %f" % accuracy(anneal_net, data['train']))
print("Testing  accuracy: %f" % accuracy(anneal_net, data['test']))

### Batch Normalization!

Training is better and faster with batchnorm. Let's add it in to our network:

In [None]:
class ConvBNNet(nn.Module):
    def __init__(self):
        super(ConvBNNet, self).__init__() # https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
        self.conv1 = nn.Conv2d(3, 16, 3, stride=2, padding=1)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(16, 32, 3, stride=2, padding=1)
        self.bn2 = nn.BatchNorm2d(32)
        self.conv3 = nn.Conv2d(32, 64, 3, stride=2, padding=1)
        self.bn3 = nn.BatchNorm2d(64)
        self.fc1 = nn.Linear(1024, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = F.relu(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        return x



In [None]:
norm_net = ConvBNNet()

norm_losses = train(norm_net, data['train'], epochs=15, lr=.01)

plt.plot(smooth(norm_losses, 50))

print("Training accuracy: %f" % accuracy(norm_net, data['train']))
print("Testing  accuracy: %f" % accuracy(norm_net, data['test']))

In [None]:
plt.plot(smooth(losses,50), 'r-')
plt.plot(smooth(conv_losses, 50), 'b-')
plt.plot(smooth(norm_losses, 50), 'g-')


In [None]:
lr_net = ConvBNNet()

lr_losses = train(lr_net, data['train'], epochs=15, lr=.1)

plt.plot(smooth(lr_losses, 50))

print("Training accuracy: %f" % accuracy(lr_net, data['train']))
print("Testing  accuracy: %f" % accuracy(lr_net, data['test']))

In [None]:
#plt.plot(smooth(losses,50), 'r-')
#plt.plot(smooth(conv_losses, 50), 'b-')
plt.plot(smooth(norm_losses, 50), 'g-')
plt.plot(smooth(lr_losses, 50), 'r-')

In [None]:
anneal2_net = ConvBNNet()

anneal2_losses =  train(anneal2_net, data['train'], epochs=5, lr=.1)
anneal2_losses += train(anneal2_net, data['train'], epochs=5, lr=.01)
anneal2_losses += train(anneal2_net, data['train'], epochs=5, lr=.001)


plt.plot(smooth(anneal2_losses, 50))

print("Training accuracy: %f" % accuracy(anneal2_net, data['train']))
print("Testing  accuracy: %f" % accuracy(anneal2_net, data['test']))

### Weight Decay

We can try adding in some weight decay now because we are overfitting to the data quite a bit

In [None]:
decay_net = ConvBNNet()

decay_losses =  train(decay_net, data['train'], epochs=5, lr=.1  , decay = .0005)
decay_losses += train(decay_net, data['train'], epochs=5, lr=.01 , decay = .0005)
decay_losses += train(decay_net, data['train'], epochs=5, lr=.001, decay = .0005)


plt.plot(smooth(decay_losses, 50))

print("Training accuracy: %f" % accuracy(decay_net, data['train']))
print("Testing  accuracy: %f" % accuracy(decay_net, data['test']))

In [None]:
#plt.plot(smooth(losses,50), 'r-')
#plt.plot(smooth(conv_losses, 50), 'r-')
#plt.plot(smooth(norm_losses, 50), 'g-')
plt.plot(smooth(anneal2_losses, 50), 'b-')
plt.plot(smooth(decay_losses, 50), 'm-')

#### Data Augmentation ####

Our training accuracy is much higher than our testing accuracy which indicates overfitting. Let's add in data augmentation

In [None]:
data_aug = get_cifar10_data(augmentation=1)
data_net = ConvBNNet()

data_losses =  train(data_net, data_aug['train'], epochs=5, lr=.1  , decay=.0005)
data_losses += train(data_net, data_aug['train'], epochs=5, lr=.01 , decay=.0005)
data_losses += train(data_net, data_aug['train'], epochs=5, lr=.001, decay=.0005)


plt.plot(smooth(data_losses, 50))

print("Training accuracy: %f" % accuracy(data_net, data_aug['train']))
print("Testing  accuracy: %f" % accuracy(data_net, data_aug['test']))

In [None]:
plt.plot(smooth(decay_losses, 50), 'r-')
plt.plot(smooth(data_losses, 50), 'g-')

In [None]:
final_net = ConvBNNet()

final_losses =  train(final_net, data_aug['train'], epochs=15, lr=.1  , decay=.0005)
final_losses += train(final_net, data_aug['train'], epochs=5, lr=.01 , decay=.0005)
final_losses += train(final_net, data_aug['train'], epochs=5, lr=.001, decay=.0005)


plt.plot(smooth(final_losses, 50))

print("Training accuracy: %f" % accuracy(final_net, data_aug['train']))
print("Testing  accuracy: %f" % accuracy(final_net, data_aug['test']))

In [None]:
plt.plot(smooth(decay_losses, 50), 'r-')
plt.plot(smooth(data_losses, 50), 'g-')
plt.plot(smooth(final_losses, 50), 'b-')