# Part V. CIFAR-10 open-ended challenge


### Things you might try:
- **Filter size**: Above we used 5x5; would smaller filters be more efficient?
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these, but don't miss the fun if you have time!

- Alternative optimizers: you can try Adam, Adagrad, RMSprop, etc.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

### Have fun and happy training! 

In [2]:
import logging

import torch
from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision.datasets as dset
import torchvision.transforms as T

from cs231n.hyperopt import *

fs = '%(asctime)s %(levelname)s:%(message)s'
ds = '%b  %-d %H:%M:%S'
logging.basicConfig(format=fs, datefmt=ds, level=logging.DEBUG)

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

dtype = torch.float32
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
logger.debug("%s: device=%s" % (__name__, device))

NUM_TRAIN = 49000
NUM_VAL = 1000

transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, NUM_TRAIN+NUM_VAL)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Sep  29 20:06:48 DEBUG:__main__: device=cuda


Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


In [None]:
%load_ext autoreload
%autoreload 2

torch.backends.cudnn.deterministic = True
torch.manual_seed(999)

im_shape = (3, 32, 32)
num_classes = 10

choices = {
        "Architecture": ["batchnorm-relu-conv"],
        "FilterSize": [3, 5],
        "FilterCount": [8, 32, 64],
        "Stride": [1, 2],
        "N": [3, 5, 10],
        "M": [1, 2],
        "HiddenSize": [1000],
        "Dropout": [0.05, 0.5, 0.95]
}

ho = HyperOpt(choices, construct_model, train, loader_train, loader_val,
              max_active=3, coarse_its=100, fine_epochs=5, verbose=True,
              visualize=True, port=6006, device=device)

ho.optimize()

plt.close('all') # play nice with Jupyter

Sep  29 20:06:55 INFO:Visdom successfully connected to server
Sep  29 20:06:55 DEBUG:optimize: starting
Sep  29 20:06:55 DEBUG:coarse_step: starting
Sep  29 20:06:55 DEBUG:coarse_step:Sequential(
  (0): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU()
  (2): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (3): Dropout2d(p=0.5)
  (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (7): Dropout2d(p=0.5)
  (8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (9): ReLU()
  (10): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (11): Dropout2d(p=0.5)
  (12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (13): ReLU()
  (14): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (15): Dropout2d(p=0.5)
  (16): 

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Sep  29 20:06:55 DEBUG:train: starting
Sep  29 20:07:02 INFO:train: Epoch 1/1, It 1/100, loss = 2.3566
Sep  29 20:07:02 INFO:train: Train acc = 9.03
Sep  29 20:07:02 INFO:train: Val acc = 10.70

Sep  29 20:07:43 INFO:train: Epoch 1/1, It 100/100, loss = nan
Sep  29 20:07:43 INFO:train: Train acc = 9.90
Sep  29 20:07:43 INFO:train: Val acc = 8.70

Sep  29 20:07:43 DEBUG:train: ending
Sep  29 20:07:43 DEBUG:coarse_step:Sequential(
  (0): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU()
  (2): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (3): Dropout2d(p=0.5)
  (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (7): Dropout2d(p=0.5)
  (8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (9): ReLU()
  (10): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)

perc_dec = nan


Sep  29 20:07:48 INFO:train: Epoch 1/1, It 1/100, loss = 2.3199
Sep  29 20:07:48 INFO:train: Train acc = 10.16
Sep  29 20:07:48 INFO:train: Val acc = 11.90

Sep  29 20:08:30 INFO:train: Epoch 1/1, It 100/100, loss = 2.2653
Sep  29 20:08:30 INFO:train: Train acc = 10.33
Sep  29 20:08:30 INFO:train: Val acc = 10.20

Sep  29 20:08:30 DEBUG:train: ending
Sep  29 20:08:30 DEBUG:coarse_step:Sequential(
  (0): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU()
  (2): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (3): Dropout2d(p=0.5)
  (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (7): Dropout2d(p=0.5)
  (8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (9): ReLU()
  (10): Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (11): Dropout2d(p=0.5)
  (12)

perc_dec = 2.3535255095009524


Sep  29 20:08:35 INFO:train: Epoch 1/1, It 1/100, loss = 2.2971
Sep  29 20:08:35 INFO:train: Train acc = 9.03
Sep  29 20:08:35 INFO:train: Val acc = 7.80

Sep  29 20:09:17 INFO:train: Epoch 1/1, It 100/100, loss = 2.3471
Sep  29 20:09:17 INFO:train: Train acc = 9.98
Sep  29 20:09:17 INFO:train: Val acc = 11.30

Sep  29 20:09:17 DEBUG:train: ending
Sep  29 20:09:17 DEBUG:coarse_step: starting


perc_dec = -2.177018187434197


Sep  29 20:09:18 DEBUG:coarse_step:Sequential(
  (0): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU()
  (2): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): Dropout2d(p=0.05)
  (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): Dropout2d(p=0.05)
  (8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (9): ReLU()
  (10): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): Dropout2d(p=0.05)
  (12): Flatten()
  (13): Linear(in_features=65536, out_features=1000, bias=True)
  (14): Linear(in_features=1000, out_features=10, bias=True)
)
Sep  29 20:09:18 DEBUG:train: starting
Sep  29 20:09:19 INFO:train: Epoch 1/1, It 1/100, loss = 2.3138
Sep  29 20:09:19 INFO:train: Train acc = 12.24
Sep  29 20:09:19 INFO:train: Val acc = 10.20

Sep  29 20:09:2

perc_dec = nan


Sep  29 20:09:29 DEBUG:coarse_step:Sequential(
  (0): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU()
  (2): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (3): Dropout2d(p=0.05)
  (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): Dropout2d(p=0.05)
  (8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (9): ReLU()
  (10): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): Dropout2d(p=0.05)
  (12): Flatten()
  (13): Linear(in_features=65536, out_features=1000, bias=True)
  (14): Linear(in_features=1000, out_features=10, bias=True)
)
Sep  29 20:09:29 DEBUG:train: starting
Sep  29 20:09:30 INFO:train: Epoch 1/1, It 1/100, loss = 2.2947
Sep  29 20:09:30 INFO:train: Train acc = 7.81
Sep  29 20:09:30 INFO:train: Val acc = 9.80

Sep  29 20:09:40 

perc_dec = 39.161203722732985


Sep  29 20:09:41 INFO:train: Epoch 1/1, It 1/100, loss = 2.3136
Sep  29 20:09:41 INFO:train: Train acc = 10.50
Sep  29 20:09:41 INFO:train: Val acc = 10.90

Sep  29 20:09:43 INFO:train: Epoch 1/1, It 100/100, loss = 1.6816
Sep  29 20:09:43 INFO:train: Train acc = 40.28
Sep  29 20:09:43 INFO:train: Val acc = 43.50

Sep  29 20:09:43 DEBUG:train: ending
Sep  29 20:09:43 INFO:coarse_step:lr = 1.00E-01
Sep  29 20:09:43 DEBUG:coarse_step: starting
Sep  29 20:09:43 DEBUG:coarse_step:Sequential(
  (0): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): ReLU()
  (2): Conv2d(3, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
  (3): Dropout2d(p=0.05)
  (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU()
  (6): Conv2d(64, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
  (7): Dropout2d(p=0.05)
  (8): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (9): ReLU()
  (10): Co

perc_dec = 27.31800232086966


Sep  29 20:09:44 INFO:train: Epoch 1/1, It 1/100, loss = 2.3025
Sep  29 20:09:44 INFO:train: Train acc = 10.16
Sep  29 20:09:44 INFO:train: Val acc = 10.70

Sep  29 20:09:47 INFO:train: Epoch 1/1, It 100/100, loss = 2.1827
Sep  29 20:09:47 INFO:train: Train acc = 21.44
Sep  29 20:09:47 INFO:train: Val acc = 23.20

Sep  29 20:09:47 DEBUG:train: ending
Sep  29 20:09:47 INFO:coarse_step:lr = 1.00E-01
Sep  29 20:09:47 DEBUG:fine_step: starting
Sep  29 20:09:47 DEBUG:train: starting


perc_dec = 5.203759158703927


Sep  29 20:09:49 INFO:train: Epoch 1/5, It 1/766, loss = 1.6019
Sep  29 20:09:49 INFO:train: Train acc = 43.66
Sep  29 20:09:49 INFO:train: Val acc = 47.50

Sep  29 20:09:58 INFO:train: Epoch 1/5, It 100/766, loss = 1.3735
Sep  29 20:09:58 INFO:train: Train acc = 50.43
Sep  29 20:09:58 INFO:train: Val acc = 48.70

Sep  29 20:10:08 INFO:train: Epoch 1/5, It 200/766, loss = 1.3544
Sep  29 20:10:08 INFO:train: Train acc = 53.39
Sep  29 20:10:08 INFO:train: Val acc = 51.50

Sep  29 20:10:18 INFO:train: Epoch 1/5, It 300/766, loss = 1.6619
Sep  29 20:10:18 INFO:train: Train acc = 50.17
Sep  29 20:10:18 INFO:train: Val acc = 52.60

Sep  29 20:10:27 INFO:train: Epoch 1/5, It 400/766, loss = 1.3452
Sep  29 20:10:27 INFO:train: Train acc = 56.94
Sep  29 20:10:27 INFO:train: Val acc = 53.80

Sep  29 20:10:37 INFO:train: Epoch 1/5, It 500/766, loss = 1.1134
Sep  29 20:10:37 INFO:train: Train acc = 59.20
Sep  29 20:10:37 INFO:train: Val acc = 58.50

Sep  29 20:10:47 INFO:train: Epoch 1/5, It 600/7

## Describe what you did 

In the cell below you should write an explanation of what you did, any additional features that you implemented, and/or any graphs that you made in the process of training and evaluating your network.

TODO: Describe what you did

## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [None]:
best_model = model
check_accuracy_part34(loader_test, best_model)