# Part V. CIFAR-10 open-ended challenge


### Things you might try:
- **Filter size**: Above we used 5x5; would smaller filters be more efficient?
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these, but don't miss the fun if you have time!

- Alternative optimizers: you can try Adam, Adagrad, RMSprop, etc.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

### Have fun and happy training! 

In [25]:
import logging

import torch
from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision.datasets as dset

from cs231n.hyperopt import *

fs = '%(asctime)s %(levelname)s:%(message)s'
ds = '%b  %-d %H:%M:%S'
logging.basicConfig(format=fs, datefmt=ds)

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

dtype = torch.float32
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
logger.debug("%s: device=%s" % (__name__, device))

NUM_TRAIN = 49000
NUM_VAL = 1000

transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, NUM_TRAIN+NUM_VAL)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

DEBUG:__main__:__main__: device=cpu


Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


In [None]:
%load_ext autoreload
%autoreload 2

torch.backends.cudnn.deterministic = True
torch.manual_seed(999)

im_shape = (3, 32, 32)
num_classes = 10

choices = {
        "Architecture": ["batchnorm-relu-conv"],
        "FilterSize": [3, 5],
        #"FilterCount": [8, 32, 64],
        "FilterCount": [8, 32],
        #"Stride": [1, 2],
        "Stride": [1, 2],
        #"N": [3, 5, 10],
        "N": [3],
        #"M": [1, 2],
        "M": [1],
        "HiddenSize": [1000],
        "Dropout": [0.25, 0.5, 0.95]
}

# This one should sample exactly the network from assignment.
#"""
choices = {
        "Architecture": ["batchnorm-relu-conv"],
        "FilterSize": [3],
        "FilterCount": [32],
        "Stride": [1],
        "N": [2],
        "M": [1],
        "HiddenSize": [1000],
        "Dropout": [0.]
}
#"""

ho = HyperOpt(choices, construct_model, train, loader_train, loader_val,
              max_active=3, coarse_its=100, fine_epochs=5, verbose=True,
              visualize=True, port=6006, device=device)

ho.optimize()

INFO:visdom:Visdom successfully connected to server
DEBUG:cs231n.hyperopt:optimize:
DEBUG:cs231n.hyperopt:coarse_step:
DEBUG:cs231n.train_utils:train:
INFO:cs231n.train_utils:train: It 0, loss = 2.2912
INFO:cs231n.train_utils:train: Train acc 	= 11.28
INFO:cs231n.train_utils:train: Val acc 	= 11.90

INFO:cs231n.train_utils:train: It 100, loss = nan
INFO:cs231n.train_utils:train: Train acc 	= 9.90
INFO:cs231n.train_utils:train: Val acc 	= 8.70

INFO:cs231n.train_utils:train: It 101, loss = nan
INFO:cs231n.train_utils:train: Train acc 	= 9.46
INFO:cs231n.train_utils:train: Val acc 	= 8.70

DEBUG:cs231n.train_utils:train:
INFO:cs231n.train_utils:train: It 0, loss = 2.2951
INFO:cs231n.train_utils:train: Train acc 	= 8.59
INFO:cs231n.train_utils:train: Val acc 	= 9.90

INFO:cs231n.train_utils:train: It 100, loss = 1.9424
INFO:cs231n.train_utils:train: Train acc 	= 35.33
INFO:cs231n.train_utils:train: Val acc 	= 35.70

INFO:cs231n.train_utils:train: It 101, loss = 1.8279
INFO:cs231n.train_ut

## Describe what you did 

In the cell below you should write an explanation of what you did, any additional features that you implemented, and/or any graphs that you made in the process of training and evaluating your network.

TODO: Describe what you did

## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [None]:
best_model = model
check_accuracy_part34(loader_test, best_model)