In [2]:
import torch
import numpy as np
import matplotlib.pyplot as plt

# Create a Convolutional Neural Network in Pytorch

Let's define a convolutional neural network for classifying MNIST digits.  

Each image is 28 x 28.  Each pixel just has one intensity, no colors.  The number of values in each pixel is called the number of channels.  Our black and white, or grayscale, images have one channel. Color images have 3 channels, for red, green, and blue values.  Pytorch assumes the channels are in the first dimension, so our MNIST digits are each of size 1 x 28 x 28.

Let's create a torch tensor or zeros with the shape of two MNIST digit images.

In [6]:
z = torch.zeros((2, 1, 28, 28))
z.shape

torch.Size([2, 1, 28, 28])

Now let's create a layer of 5 convolutional units in Pytorch.  If we want to apply each unit to all image patches that are 4 x 4, we need to specify the kernel size to be 4 and the stride to be 1.

The first argument to `torch.nn.Conv2d` is the number of channels in the input images, which is 1 in our case.

In [10]:
layer1 = torch.nn.Conv2d(1, 5, 4, 1)
layer1

Conv2d(1, 5, kernel_size=(4, 4), stride=(1, 1))

We can pass our two images in `z` through this layer to see what the size of the result it.

`torch.nn` objects can act as functions, because they have implemented the `__call__` function. See [this explanation](https://realpython.com/python-callable-instances/) from [Real Python](https://realpython.com/f).

In [15]:
output_layer1 = layer1(z)
output_layer1.shape

torch.Size([2, 5, 25, 25])

Does this shape make sense?
* 2 is the number of samples in `z`,
* 5 is the number of units in this layer,
* 25, 25 is the shape (25 x 25) of the resulting "image" produced by each of the 5 units.

How can we calculate the 25?  Easy. We are striding by 1 with kernel size 4, so once we have used 28 - 4 columns, we have one more patch to make.  So we get 28 - 4 + 1, or 25.

However, if we stride by 2, then we must divide by 2, to get  (28 - 4) // 2 + 1, or 13.  

Let's check this.

In [20]:
layer1 = torch.nn.Conv2d(1, 5, 4, 2)
output_layer1 = layer1(z)
output_layer1.shape

torch.Size([2, 5, 13, 13])

Yay.  We are correct!

In general, the calculation will be

        (input_size - kernel_size) // 2 + stride

What if we want another convolutional layer, say with 4 units, kernel size of 3, and stride of 1?

        layer2 = torch.nn.Conv2d(?, 4, 3, 1)

What should we use for "?" ?   Remember, this first argument is the number of channels, or values, for each pixel. Our `layer1` has 5 units. Each of these 5 units produces a 13 x 13 image, so `layer2` gets inputs with 5 channels.

In [22]:
layer2 = torch.nn.Conv2d(5, 4, 3, 1)
layer2

Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))

In [24]:
output_layer2 = layer2(layer1(z))
output_layer2.shape

torch.Size([2, 4, 11, 11])

Humm...why is the result for each unit in this layer an 11 x 11 image?

In [25]:
(13 - 3) // 1 + 1

11

To build multiple layers in Pytorch, we need to combine them in a data structure that Pytorch understands, and can do things like collect all of the weights from all of the layers to be used in its optimization functions.  We can do this with `torch.nn.Sequential`.

In [30]:
nnet = torch.nn.Sequential(layer1, layer2)
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
)

In [32]:
nnet(z).shape

torch.Size([2, 4, 11, 11])

Okay. Let's review what we have just done, and use variables for the quantities we used.

In [33]:
n_input_channels = 1
input_size_1 = 28
n_units_1 = 5
kernel_1 = 4
stride_1 = 2

n_units_2 = 4
kernel_2 = 3
stride_2 = 1

output_size_1 = (input_size_1 - kernel_1) // stride_1 + 1
output_size_2 = (output_size_1 - kernel_2) // stride_2 + 1

output_size_1, output_size_2

(13, 11)

In [34]:
nnet = torch.nn.Sequential(torch.nn.Conv2d(n_input_channels, n_units_1, kernel_1, stride_1),
                           torch.nn.Conv2d(n_units_1, n_units_2, kernel_2, stride_2))
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
)

In [35]:
z.shape, nnet(z).shape

(torch.Size([2, 1, 28, 28]), torch.Size([2, 4, 11, 11]))

Since we want to classify MNIST digits, we need an output layer containing 10 units, each one connected to all of the output values from the second convolutional layer.  We can specifiy that with

        torch.nn.Linear(n_inputs, n_outputs)

but for this we need to know how many inputs we will have if we concatenate all of the outputs of the second convolutional layer for each MNIST digit into a one-dimensional vector.  How many will this be?

Well, let's see.  The second convolutional layer outputs an 11 x 11 image for each of its units, and the layer has 4 units.  So this will result in 11 * 11 * 4, or 484 values.  This is usually called a fully-connected, or dense, layer. Okay.  We can just append this new layer onto our `torch.nn.Sequential` list.

BUT, first we must add a component that actually does the concatenation for us, or, in Pytorch jargon, that flattens it.

In [36]:
11 * 11 * 4

484

In [40]:
torch.nn.Flatten?

[0;31mInit signature:[0m [0mtorch[0m[0;34m.[0m[0mnn[0m[0;34m.[0m[0mFlatten[0m[0;34m([0m[0mstart_dim[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m1[0m[0;34m,[0m [0mend_dim[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;34m-[0m[0;36m1[0m[0;34m)[0m [0;34m->[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Flattens a contiguous range of dims into a tensor. For use with :class:`~nn.Sequential`.
See :meth:`torch.flatten` for details.

Shape:
    - Input: :math:`(*, S_{\text{start}},..., S_{i}, ..., S_{\text{end}}, *)`,'
      where :math:`S_{i}` is the size at dimension :math:`i` and :math:`*` means any
      number of dimensions including none.
    - Output: :math:`(*, \prod_{i=\text{start}}^{\text{end}} S_{i}, *)`.

Args:
    start_dim: first dim to flatten (default = 1).
    end_dim: last dim to flatten (default = -1).

Examples::
    >>> input = torch.randn(32, 1, 5, 5)
    >>> # With default parameters
    >>> m = nn.Flatten()
    >>> ou

In [41]:
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
)

In [43]:
nnet(z).shape

torch.Size([2, 4, 11, 11])

In [44]:
nnet.append(torch.nn.Flatten())
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
  (2): Flatten(start_dim=1, end_dim=-1)
)

In [45]:
nnet(z).shape

torch.Size([2, 484])

Cool!  It left the first dimension (index 0) alone.  Pytorch assumes the first dimension is for samples. We don't want to flatten that one, but we do want to flatten all others. So this flattened the 4 x 11 x 11 arrays into 484-dimensional vectors.  Just what we want as input to our fully-connected layer.

Now we can add our `torch.nn.Linear` layer.

In [46]:
nnet.append(torch.nn.Linear(484, 10))
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
  (2): Flatten(start_dim=1, end_dim=-1)
  (3): Linear(in_features=484, out_features=10, bias=True)
)

What shape will our output be?  What do we want it to be?

Well, the input to our network is 2, 28 x 28, digit images.  So we want one output value for each of the 10 digit classes, for each of the 2 images, or a 2 x 10 output tensor.

In [47]:
nnet(z).shape

torch.Size([2, 10])

If we want two fully-connected layers after the convolutional layers, we will need to add a nonlinear activation function.  We can use

        torch.nn.Tanh()

for example.

Okay.  Let's do this, starting again with the quantities specified in variables.  Let's put 20 units in the first fully-connected layer, which will be `layer_3` now, and our 10-unit output layer will be `layer_4`.

In [53]:
n_input_channels = 1
input_size_1 = 28
n_units_1 = 5
kernel_1 = 4
stride_1 = 2

n_units_2 = 4
kernel_2 = 3
stride_2 = 1

output_size_1 = (input_size_1 - kernel_1) // stride_1 + 1
output_size_2 = (output_size_1 - kernel_2) // stride_2 + 1

n_units_3 = 20
n_units_4 = 10

In [54]:
nnet = torch.nn.Sequential(torch.nn.Conv2d(n_input_channels, n_units_1, kernel_1, stride_1),
                           torch.nn.Conv2d(n_units_1, n_units_2, kernel_2, stride_2),
                           torch.nn.Flatten(),
                           torch.nn.Linear(output_size_2 ** 2 * n_units_2, n_units_3),
                           torch.nn.Tanh(),
                           torch.nn.Linear(n_units_3, n_units_4))
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
  (2): Flatten(start_dim=1, end_dim=-1)
  (3): Linear(in_features=484, out_features=20, bias=True)
  (4): Tanh()
  (5): Linear(in_features=20, out_features=10, bias=True)
)

In [55]:
z.shape, nnet(z).shape

(torch.Size([2, 1, 28, 28]), torch.Size([2, 10]))

# Train it

Okay, we made the net.  Now how do we train it?

Relatively easily!  We will use the pytorch autograd, loss, and optimize facilities!!

Pytorch provides a number of loss functions. We can use `torch.nn.NLLLoss` which you already know about from implementing the `neg_log_likelihood` function in assignments.  We can also use `torch.nn.CrossEntropyLoss`.  The first one assumes you have used a softmax (actually log softmax) as an activation function on the output layer.  The second one calculates this for us, so the simple linear output layer is all we need.  

See [PyTorch CrossEntropyLoss vs. NLLLoss](https://jamesmccaffrey.wordpress.com/2020/06/11/pytorch-crossentropyloss-vs-nllloss-cross-entropy-loss-vs-negative-log-likelihood-loss) for a little more information, and [this Wikipedia page](https://en.wikipedia.org/wiki/Cross-entropy) about cross entropy in general.

In [56]:
nnet

Sequential(
  (0): Conv2d(1, 5, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(5, 4, kernel_size=(3, 3), stride=(1, 1))
  (2): Flatten(start_dim=1, end_dim=-1)
  (3): Linear(in_features=484, out_features=20, bias=True)
  (4): Tanh()
  (5): Linear(in_features=20, out_features=10, bias=True)
)

In [110]:
def percent_correct(Yclasses, T):
    return (Yclasses == T).float().mean().item() * 100

In [119]:
def train(nnet, X, T, n_epochs, learning_rate):
    
    optimizer = torch.optim.Adam(nnet.parameters(), lr=learning_rate)
    loss_func = torch.nn.CrossEntropyLoss()

    for epoch in range(n_epochs):
    
        Y = nnet(X)
        
        loss = loss_func(Y, T)
        loss.backward()
        
        optimizer.step() 
        optimizer.zero_grad()

        pc_train = percent_correct(use(nnet, Xtrain), Ttrain)
        pc_val = percent_correct(use(nnet, Xval), Tval)
        pc_test = percent_correct(use(nnet, Xtest), Ttest)
        
        print(f'Epoch {epoch + 1} %correct: Train {pc_train:.1f} Val {pc_val:.1f} Test {pc_test:.1f}')

In [120]:
torch.argmax?

[0;31mDocstring:[0m
argmax(input) -> LongTensor

Returns the indices of the maximum value of all elements in the :attr:`input` tensor.

This is the second value returned by :meth:`torch.max`. See its
documentation for the exact semantics of this method.

.. note:: If there are multiple maximal values then the indices of the first maximal value are returned.

Args:
    input (Tensor): the input tensor.

Example::

    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 1.3398,  0.2663, -0.2686,  0.2450],
            [-0.7401, -0.8805, -0.3402, -1.1936],
            [ 0.4907, -1.3948, -1.0691, -0.3132],
            [-1.6092,  0.5419, -0.2993,  0.3195]])
    >>> torch.argmax(a)
    tensor(0)

.. function:: argmax(input, dim, keepdim=False) -> LongTensor
   :noindex:

Returns the indices of the maximum values of a tensor across a dimension.

This is the second value returned by :meth:`torch.max`. See its
documentation for the exact semantics of this method.

Args:
    input (Tensor): the i

In [121]:
def use(nnet, X):
    Y = nnet(X)
    class_index = torch.argmax(Y, dim=1)  # not axis=1 as we did in numpy!
    return class_index

In [122]:
import pickle
import gzip

with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

def X_as_torch(X):
    return torch.from_numpy(X.reshape(-1, 1, 28, 28).astype(np.float32))
def T_as_torch(T):
    return torch.from_numpy(T.astype(np.int64))
    
Xtrain = X_as_torch(train_set[0])
Ttrain = T_as_torch(train_set[1])

Xval = X_as_torch(valid_set[0])
Tval = T_as_torch(valid_set[1])

Xtest = X_as_torch(test_set[0])
Ttest = T_as_torch(test_set[1])

print(f'{Xtrain.shape=} {Ttrain.shape=}\n    {Xtrain.dtype=} {Ttrain.dtype=}')
print(f'{Xval.shape=} {Tval.shape=}\n   {Xval.dtype=} {Tval.dtype=}')
print(f'{Xtest.shape=} {Ttest.shape=}\n   {Xtest.dtype=} {Ttest.dtype=}')

Xtrain.shape=torch.Size([50000, 1, 28, 28]) Ttrain.shape=torch.Size([50000])
    Xtrain.dtype=torch.float32 Ttrain.dtype=torch.int64
Xval.shape=torch.Size([10000, 1, 28, 28]) Tval.shape=torch.Size([10000])
   Xval.dtype=torch.float32 Tval.dtype=torch.int64
Xtest.shape=torch.Size([10000, 1, 28, 28]) Ttest.shape=torch.Size([10000])
   Xtest.dtype=torch.float32 Ttest.dtype=torch.int64


In [123]:
percent_correct(use(nnet, Xval), Tval)

10.289999842643738

In [124]:
train(nnet, Xtrain, Ttrain, 10, 0.01)

Epoch 1 %correct: Train 10.0 Val 10.4 Test 10.1
Epoch 2 %correct: Train 10.4 Val 10.8 Test 10.5
Epoch 3 %correct: Train 12.6 Val 12.0 Test 12.6
Epoch 4 %correct: Train 11.4 Val 10.7 Test 11.4
Epoch 5 %correct: Train 11.4 Val 10.7 Test 11.4
Epoch 6 %correct: Train 11.5 Val 10.8 Test 11.4
Epoch 7 %correct: Train 10.2 Val 10.1 Test 10.5
Epoch 8 %correct: Train 14.8 Val 15.0 Test 14.9
Epoch 9 %correct: Train 15.3 Val 15.6 Test 15.5
Epoch 10 %correct: Train 15.7 Val 15.8 Test 15.8


In [128]:
n_input_channels = 1
input_size_1 = 28
n_units_1 = 50
kernel_1 = 4
stride_1 = 2

n_units_2 = 20
kernel_2 = 3
stride_2 = 1

output_size_1 = (input_size_1 - kernel_1) // stride_1 + 1
output_size_2 = (output_size_1 - kernel_2) // stride_2 + 1

n_units_3 = 20
n_units_4 = 10

In [135]:
nnet = torch.nn.Sequential(torch.nn.Conv2d(n_input_channels, n_units_1, kernel_1, stride_1),
                           torch.nn.Conv2d(n_units_1, n_units_2, kernel_2, stride_2),
                           torch.nn.Flatten(),
                           torch.nn.Linear(output_size_2 ** 2 * n_units_2, n_units_3),
                           torch.nn.Tanh(),
                           torch.nn.Linear(n_units_3, n_units_4))
nnet

Sequential(
  (0): Conv2d(1, 50, kernel_size=(4, 4), stride=(2, 2))
  (1): Conv2d(50, 20, kernel_size=(3, 3), stride=(1, 1))
  (2): Flatten(start_dim=1, end_dim=-1)
  (3): Linear(in_features=2420, out_features=20, bias=True)
  (4): Tanh()
  (5): Linear(in_features=20, out_features=10, bias=True)
)

In [136]:
train(nnet, Xtrain, Ttrain, 50, 0.001)

Epoch 1 %correct: Train 37.9 Val 38.7 Test 38.8
Epoch 2 %correct: Train 61.9 Val 63.7 Test 62.9
Epoch 3 %correct: Train 64.6 Val 66.6 Test 65.0
Epoch 4 %correct: Train 66.7 Val 68.5 Test 66.8
Epoch 5 %correct: Train 68.7 Val 70.4 Test 69.2
Epoch 6 %correct: Train 69.5 Val 71.5 Test 70.6
Epoch 7 %correct: Train 70.1 Val 71.9 Test 71.1
Epoch 8 %correct: Train 71.2 Val 73.1 Test 72.1
Epoch 9 %correct: Train 72.5 Val 74.6 Test 73.4
Epoch 10 %correct: Train 73.9 Val 75.9 Test 74.8
Epoch 11 %correct: Train 75.0 Val 77.2 Test 75.9
Epoch 12 %correct: Train 76.0 Val 78.3 Test 77.0
Epoch 13 %correct: Train 76.9 Val 79.4 Test 77.8
Epoch 14 %correct: Train 77.8 Val 80.3 Test 78.5
Epoch 15 %correct: Train 78.7 Val 81.0 Test 79.3
Epoch 16 %correct: Train 79.4 Val 81.6 Test 79.9
Epoch 17 %correct: Train 79.9 Val 82.2 Test 80.5
Epoch 18 %correct: Train 80.4 Val 82.5 Test 81.1
Epoch 19 %correct: Train 81.0 Val 83.2 Test 81.5
Epoch 20 %correct: Train 81.4 Val 83.6 Test 82.1
Epoch 21 %correct: Train 81.9

If you were to put this code into a new `NeuralNetworkTorch` class, how would you specify the arguments to the constructor, so that the constructor can build the neural network?

We need 
* the number of input channels and shape of the input images,
* for each convolutional layer we need the number of units, kernel size, and stride,
* for each fully-connected hidden layer we need number of units,
* for the output layer we need the number of units.

Seems easy enough---four arguments:
* list of three ints, for the number of input channels and shape of the two-dimensional input images,
* list of lists of ints, one sublist for each convolutional layer containing the number of units, kernel size, and stride,
* list of ints for each fully-connected hidden layer we need number of units,
* int, for the number of units in the output layer

  

If we will be wanting to see the outputs of each layer, we need a different structure to the net.  Let's define each layer separately and include the layer's activation function.

In [153]:
n_input_channels = 1
input_size = 28
n_units_1 = 50
kernel_1 = 4
stride_1 = 2

n_units_2 = 20
kernel_2 = 3
stride_2 = 1

output_size_1 = (input_size - kernel_1) // stride_1 + 1
output_size_2 = (output_size_1 - kernel_2) // stride_2 + 1

n_units_3 = 20
n_units_4 = 10

In [154]:
def calculate_flattened_size(n_input_channels, input_size, conv_layers):
    z = torch.zeros((1, n_input_channels, input_size, input_size))
    out = conv_layers(z)
    return 
    in_size = input_size
    for conv in conv_layers:
        z = conv(z)

    out = conv_

In [186]:
conv_layers = torch.nn.Sequential(
    torch.nn.Sequential(torch.nn.Conv2d(n_input_channels, n_units_1, kernel_1, stride_1),
                        torch.nn.Tanh()),
    torch.nn.Sequential(torch.nn.Conv2d(n_units_1, n_units_2, kernel_2, stride_2),
                        torch.nn.Tanh()) )

In [187]:
z = torch.zeros(1, 1, 28, 28)
y = conv_layers(z)

In [188]:
y.shape

torch.Size([1, 20, 11, 11])

In [189]:
y.reshape(1, -1).shape

torch.Size([1, 2420])

In [190]:
output_size_2 ** 2 * n_units_2

2420

In [199]:
fc_layers = torch.nn.Sequential(
            torch.nn.Sequential(torch.nn.Linear(output_size_2 ** 2 * n_units_2, n_units_3),
                                torch.nn.Tanh()),
            torch.nn.Linear(n_units_3, n_units_4))

def forward(conv_layers, fc_layers, X):
    n_samples = X.shape[0]
    Zs = [X]
    for conv_layer in conv_layers:
        # Zs.append(activation_function(conv_layer(Ys[-1])))
        Zs.append(conv_layer(Zs[-1]))

    Zs[-1] = Zs[-1].reshape(n_samples, -1)
            
    for fc_layer in fc_layers:
        Zs.append(fc_layer(Zs[-1]))
        
    return Zs

In [202]:
list(conv_layers.parameters()) + list(fc_layers.parameters())

[Parameter containing:
 tensor([[[[-7.9822e-02, -2.3267e-01, -2.0843e-01, -9.5234e-02],
           [ 9.9745e-02,  1.8729e-01,  3.8440e-02, -1.0873e-01],
           [-9.7491e-02,  1.2590e-01,  7.6633e-03, -2.2126e-01],
           [-2.8589e-02, -2.1833e-01, -1.1838e-01, -1.2064e-01]]],
 
 
         [[[ 6.9133e-02, -2.3409e-01,  1.7648e-01,  6.2355e-02],
           [-1.0319e-01,  1.9576e-01, -2.2655e-01,  2.1083e-02],
           [-1.0862e-01,  2.4989e-01,  7.8159e-02, -1.6312e-01],
           [-1.3490e-01, -1.4565e-01,  2.4302e-01, -1.3308e-01]]],
 
 
         [[[ 2.0420e-01,  1.4778e-01, -4.4402e-02, -2.4015e-01],
           [-2.7844e-02, -4.4514e-02, -1.5794e-01, -2.4825e-01],
           [-1.5558e-01,  5.6637e-02, -1.1044e-01, -1.8110e-01],
           [ 1.9186e-01,  2.2550e-01, -2.2078e-01,  1.1631e-01]]],
 
 
         [[[ 1.2246e-01,  1.6026e-01,  4.6630e-02,  5.6052e-02],
           [-2.0488e-01,  1.3242e-01,  2.4871e-01,  8.9424e-02],
           [-1.3369e-02,  1.9422e-01,  1.6588e-01

In [203]:
def train(conv_layers, fc_layers, X, T, n_epochs, learning_rate):
    
    optimizer = torch.optim.Adam(list(conv_layers.parameters()) + list(fc_layers.parameters()),
                                 lr=learning_rate)
    loss_func = torch.nn.CrossEntropyLoss()

    for epoch in range(n_epochs):
    
        Y = forward(conv_layers, fc_layers, X)
        
        loss = loss_func(Y[-1], T)
        loss.backward()
        
        optimizer.step() 
        optimizer.zero_grad()

        pc_train = percent_correct(use(conv_layers, fc_layers, Xtrain), Ttrain)
        pc_val = percent_correct(use(conv_layers, fc_layers, Xval), Tval)
        pc_test = percent_correct(use(conv_layers, fc_layers, Xtest), Ttest)
        
        print(f'Epoch {epoch + 1} %correct: Train {pc_train:.1f} Val {pc_val:.1f} Test {pc_test:.1f}')

def use(conv_layers, fc_layers, X):
    Y = forward(conv_layers, fc_layers, X)
    class_index = torch.argmax(Y[-1], axis=1) 
    return class_index

In [205]:
train(conv_layers, fc_layers, Xtrain, Ttrain, 10, 0.001)

Epoch 1 %correct: Train 45.7 Val 47.2 Test 47.8
Epoch 2 %correct: Train 65.7 Val 67.5 Test 67.4
Epoch 3 %correct: Train 71.4 Val 73.9 Test 73.0
Epoch 4 %correct: Train 66.7 Val 69.6 Test 68.0
Epoch 5 %correct: Train 63.0 Val 65.7 Test 64.3
Epoch 6 %correct: Train 63.4 Val 66.2 Test 64.8
Epoch 7 %correct: Train 66.9 Val 69.3 Test 68.5
Epoch 8 %correct: Train 71.0 Val 73.2 Test 72.5
Epoch 9 %correct: Train 73.9 Val 76.3 Test 75.2
Epoch 10 %correct: Train 75.8 Val 78.2 Test 77.0


In [206]:
use(conv_layers, fc_layers, Xval)

tensor([3, 8, 6,  ..., 5, 6, 8])