## Exercise 9

**Tip** This is a very small dataset (number of observations) compared to the number of features.
This means that overfitting may be an issue, and sometimes fancy tricks won't do any good. 
Keep that in mind, and always start simple.

**3.1) Improve the network**, and get as high a validation score as you can. 
When trying to improve the network nothing is sacred. You can try various learning rates, batch sizes, validation sizes, etc. 
And most importantly, the validation set is very small (only 1 sample per class), etc.

To get you off to a good start we have created a list of **things you might want to try**:
* Add more layers (mostly fully connected and convolutional)
* Increase or decrease the batch size 
* Use dropout (a lot - e.g. between the convolutional layers)
* Use batch normalization (a lot)
* Try with L2 regularization (weight decay)
* Use only the image for training (with CNN) - comment on the increased time between iterations.
* Change the image size to be bigger or smaller
* Try other combinations of FFN, CNN, RNN parts in various ways (bigger is not always better)

If your network is not performing as well as you would like it to, [here](http://theorangeduck.com/page/neural-network-not-working) is a great explanation of what might have gone wrong.


**3.2) Improve Kaggle score**. Once happy try to get the best score on Kaggle for this dataset as you can (**upload** instructions below)
You can upload your solution multiple times as you progress.
A very good implementation would get a score between $0.04$ to $0.06$ (the smaller the better), try and see if you can get there, and explain what might have gone wrong if you can't. 


**3.3) Reflect on the process**, and how you got to your final design and discuss your final results. 
What worked, and what didn't?
Include at least the following: 
* Description of the final architecture
* Description of the training parameters
* Description of the final results (Kaggle and validation)

**Answer:**


# Move to the directory

In [55]:
cd ~/Documents/dl/LeafClassification

/zhome/68/a/154632/Documents/dl/LeafClassification


In [56]:
ls

6_1_EXE_Kaggle_Leaf_Challenge.ipynb  [0m[38;5;27m__pycache__[0m/   sample_submission.csv
NN.ipynb                             data_utils.py  test.csv
NN_1.ipynb                           [38;5;27mimages[0m/        train.csv
Preprocessing.ipynb                  my_process.md
Submission.ipynb                     [38;5;27mpickles[0m/


# Libraries

In [57]:
%matplotlib inline
import matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import os
import time

from IPython.display import clear_output
from skimage.io import imread
from skimage.transform import resize

import data_utils

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.optim as optim
from torch.nn import Linear, GRU, Conv2d, Dropout, MaxPool2d, BatchNorm1d
from torch.nn.functional import relu, elu, relu6, sigmoid, tanh, softmax
from skimage import io
from torchvision.io import read_image
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms

# Check if CUDA device is available

In [58]:
use_cuda = torch.cuda.is_available()
print("Running GPU.") if use_cuda else print("No GPU available.")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  

def get_variable(x):
    """ Converts tensors to cuda, if available. """
    if use_cuda:
        return x.cuda()
    return x


def get_numpy(x):
    """ Get numpy array for both cuda and not. """
    if use_cuda:
        return x.cpu().data.numpy()
    return x.data.numpy()

Running GPU.


# Load data

In [59]:
import pickle
with open('pickles/data.pickle', 'rb') as f:
    data = pickle.load(f)

# Utils

In [60]:
class SelectItem(nn.Module):
    def __init__(self, item_index):
        super(SelectItem, self).__init__()
        self._name = 'selectitem'
        self.item_index = item_index

    def forward(self, inputs):
        return inputs[self.item_index]

In [61]:
def conv_size(img_size, kernel_size, stride, padding, channels):
    W = img_size
    K = kernel_size
    P = padding
    S = stride

    output_size = int((W-K+(2*P))/S)+1

    return (output_size, output_size, channels)

In [62]:
def pool_size(img_size, kernel_size, stride, channels):
    I = img_size
    F = kernel_size
    S = stride

    output_size = (((I - F) / S) + 1)
    
    return (output_size, output_size, channels)

In [63]:
print(conv_size(128, 3, 2, 1, 8))
print(pool_size(64, 2, 2, 8))

(64, 64, 8)
(32.0, 32.0, 8)


# Parameters

In [64]:
IMAGE_SHAPE = (128, 128, 1)
NUM_CLASSES =  99
batch_size = 32
# For all three features types margin, shape, and texture, we have NUM_FEATURES for each type.
NUM_FEATURES = 64  # <-- Your answer here

# Simple model

## Network

In [69]:
rnn_input_size = 64 # must be the same as the x_shape channels
    
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        self.convolutional = nn.Sequential(
            nn.Conv2d(in_channels=channels,
                    out_channels=conv_out_channels,
                    kernel_size=kernel_size,
                    stride=conv_stride,
                    padding=conv_pad),
            nn.ReLU()
        )

        self.fc1 = nn.Sequential(
            nn.Linear(in_features=32768,
                    out_features=768,
                    bias=False)
        )

        self.fc2 = nn.Sequential(
            nn.Linear(in_features=128,
                    out_features=128,
                    bias=False)
        )

        # Exercise: Add a recurrent unit like and RNN or GRU
        # >> YOUR CODE HERE <<
        self.recurrent = nn.Sequential(
            nn.GRU(input_size=rnn_input_size, # The number of expected features in the input x
                    hidden_size=128, # The number of features in the hidden state h
                    num_layers=2), # Number of recurrent layers
            SelectItem(0)
        )

        self.l_out = nn.Sequential(
            nn.Linear(in_features=features_cat_size,
                        out_features=NUM_CLASSES,
                        bias=False)
        )
        
        
    def forward(self, x_img, x_margin, x_shape, x_texture):
        features = []
        out = {}

        # Change layer order in images
        x_img = x_img.permute(0, 3, 1, 2)
        
        ## Convolutional layer ##
        # - Change dimensions to fit the convolutional layer 
        # - Apply Conv2d
        # - Use an activation function
        # - Change dimensions s.t. the features can be used in the final FFNN output layer
        
        # >> YOUR CODE HERE <<
        ## 1st way, distort image
        # print("Convolutional...")
        # print(x_img.shape)
        x_img = self.convolutional(x_img)
        # print(x_img.shape)
        x_img = x_img.reshape(x_img.size(0), -1)
        # print(x_img.shape)
        features_img = self.fc1(x_img)
        # print(features_img.shape)

        # Append features to the list "features"
        features.append(features_img)
        
        ## Use concatenated leaf features for FFNN ##
        # print("\nFeed Forward...")
        # print(x_margin.shape, x_texture.shape)
        x = torch.cat((x_margin, x_texture), dim=1)  # if you want to use features as feature vectors
        # print(x.shape)
        x = self.fc2(x)
        features_vector = x
        # print(features_vector.shape)
        
        ## Use concatenated leaf features for RNN ##
        # - Chage dimensions to fit GRU
        # - Apply GRU
        # - Change dimensions s.t. the features can be used in the final FFNN output layer

        # >> YOUR CODE HERE <<
        # print("\nRecurrent...")
        # print(x_shape.shape)
        features_rnn = self.recurrent(x_shape)
        features.append(features_rnn)
        # print(features_rnn.shape)
        
        # Append features to the list "features"
        features.append(features_rnn)
        
        ## Output layer where all features are in use ##
        # print("\Features...")
        features_final = torch.cat(features, dim=1)
        # print(features_final.shape)
        
        # print("\nOutput...")
        out['out'] = self.l_out(features_final)
        # print(out['out'].shape) 
        # print(out)
        return out

net = Net()
if use_cuda:
    net.cuda()
print(net)

NameError: name 'conv_out_channels' is not defined

## Test network

In [None]:
_img_shape = tuple([batch_size] + list(IMAGE_SHAPE))
_feature_shape = (batch_size, NUM_FEATURES)

def randnorm(size):
    return np.random.normal(0, 1, size).astype('float32')

# dummy data
_x_image = get_variable(Variable(torch.from_numpy(randnorm(_img_shape))))
_x_margin = get_variable(Variable(torch.from_numpy(randnorm(_feature_shape))))
_x_shape = get_variable(Variable(torch.from_numpy(randnorm(_feature_shape))))
_x_texture = get_variable(Variable(torch.from_numpy(randnorm(_feature_shape))))

# test the forward pass
print(f"Image shape: {_x_image.shape}")
print(f"Image margin: {_x_margin.shape}")
print(f"Image texture: {_x_texture.shape}")
print(f"Image shape: {_x_shape.shape}\n")
output = net(x_img=_x_image, x_margin=_x_margin, x_shape=_x_shape, x_texture=_x_texture)

## Train parameters

In [None]:
LEARNING_RATE = 0.001
num_epochs = 20
criterion = nn.CrossEntropyLoss()

# weight_decay is equal to L2 regularization
optimizer = optim.Adam(net.parameters(), lr=LEARNING_RATE) # , momentum=0.9)

def accuracy(ys, ts):
    predictions = torch.max(ys, 1)[1]
    correct_prediction = torch.eq(predictions, ts)
    return torch.mean(correct_prediction.float())

## Train the network

In [None]:
# Setup settings for training 
VALIDATION_SIZE = 0.3 # 0.1 is ~ 100 samples for validation
max_iter = 1000
log_every = 100
eval_every = 100

# Function to get label
def get_labels(batch):
    return get_variable(Variable(torch.from_numpy(batch['ts']).long()))

# Function to get input
def get_input(batch):
    return {
        'x_img': get_variable(Variable(torch.from_numpy(batch['images']))),
        'x_margin': get_variable(Variable(torch.from_numpy(batch['margins']))),
        'x_shape': get_variable(Variable(torch.from_numpy(batch['shapes']))),
        'x_texture': get_variable(Variable(torch.from_numpy(batch['textures'])))
    }

# Initialize lists for training and validation
train_iter = []
train_loss, train_accs = [], []
valid_iter = []
valid_loss, valid_accs = [], []

# Generate batches
batch_gen = data_utils.batch_generator(data,
                                       batch_size=batch_size,
                                       num_classes=NUM_CLASSES,
                                       num_iterations=max_iter,
                                       seed=42,
                                       val_size=VALIDATION_SIZE)

# Train network
net.train()
for i, batch_train in enumerate(batch_gen.gen_train()):
    # print(_['x_img'].shape, _['x_margin'].shape, _['x_texture'].shape, _['x_shape'].shape)
    if i % eval_every == 0:
        
        # Do the validaiton
        net.eval()
        val_losses, val_accs, val_lengths = 0, 0, 0
        for batch_valid, num in batch_gen.gen_valid():
            output = net(**get_input(batch_valid))
            labels_argmax = torch.max(get_labels(batch_valid), 1)[1]
            val_losses += criterion(output['out'], labels_argmax) * num
            val_accs += accuracy(output['out'], labels_argmax) * num
            val_lengths += num

        # Divide by the total accumulated batch sizes
        val_losses /= val_lengths
        val_accs /= val_lengths
        valid_loss.append(get_numpy(val_losses))
        valid_accs.append(get_numpy(val_accs))
        valid_iter.append(i)
        print("Valid, it: {} loss: {:.2f} accs: {:.2f}\n".format(i, valid_loss[-1], valid_accs[-1]))
        net.train()
    
    # Train network
    output = net(**get_input(batch_train))
    labels_argmax = torch.max(get_labels(batch_train), 1)[1]
    batch_loss = criterion(output['out'], labels_argmax)
    
    train_iter.append(i)
    train_loss.append(float(get_numpy(batch_loss)))
    train_accs.append(float(get_numpy(accuracy(output['out'], labels_argmax))))
    
    optimizer.zero_grad()
    batch_loss.backward()
    optimizer.step()
    
    # Log i figure
    if i % log_every == 0:
        fig = plt.figure(figsize=(12,4))
        plt.subplot(1, 2, 1)
        plt.plot(train_iter, train_loss, label='train_loss')
        plt.plot(valid_iter, valid_loss, label='valid_loss')
        plt.legend()

        plt.subplot(1, 2, 2)
        plt.plot(train_iter, train_accs, label='train_accs')
        plt.plot(valid_iter, valid_accs, label='valid_accs')
        plt.legend()
        plt.show()
        # clear_output(wait=True)
        print("Train, it: {} loss: {:.2f} accs: {:.2f}".format(i, train_loss[-1], train_accs[-1]))
        
    if max_iter < i:
        break

# Advanced Model

## Data Loader

In [65]:
class LeafDataset(Dataset):
    """Leaf dataset."""

    def __init__(self, csv_file, root_dir, transform=None, train=False, test=False):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.leafs_df = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
        self.train = train
        self.test = test

    def __len__(self):
        return len(self.leafs_df)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        # parse the image
        img_name = str(self.leafs_df.iloc[idx, 0]) + '.jpg'
        img_path = os.path.join(self.root_dir, img_name)
        image = io.imread(img_path)

        # no matter what happens, we need to padd all the images to the same dimensions, so that we can resize them without distorting them
        image = data_utils.pad2square(image)  # Make the image square
        image = resize(image, output_shape=(128, 128), mode='reflect', anti_aliasing=True)  # resizes the image

        # augment the image if chosen to
        if self.transform:
            image = self.transform(image)

        # if we have the trainset, then we have a label
        if self.train:
            # parse the rest of the data
            id = self.leafs_df.iloc[idx, 0]
            species = self.leafs_df.iloc[idx, 1]
            margins = self.leafs_df.iloc[idx, 2:66]
            shapes = self.leafs_df.iloc[idx, 66:130]
            textures = self.leafs_df.iloc[idx, 130:]

            # convert to tuple and return
            sample = {'image': image, 'id': id, 'species': species, 'margins': margins, 'shapes': shapes, 'textures': textures}
            X = {'image': image, 
                'margins': margins, 
                'shapes': shapes, 
                'textures': textures}
            y = species
            return X, y

        if self.test:
            # parse the rest of the data
            id = self.leafs_df.iloc[idx, 0]
            margins = self.leafs_df.iloc[idx, 1:65]
            shapes = self.leafs_df.iloc[idx, 65:129]
            textures = self.leafs_df.iloc[idx, 129:]

            # convert to tuple and return
            sample = {'image': image, 
                    'id': id, 
                    'margins': margins, 
                    'shapes': shapes, 
                    'textures': textures}
            X = {'image': image, 'margins': margins, 'shapes': shapes, 'textures': textures}
            return X

In [None]:
leafs_frame = pd.read_csv('train.csv')

n = 65
row = leafs_frame.iloc[n, :]
id = leafs_frame.iloc[n, 0]
img_name = str(id) + '.jpg'
species = leafs_frame.iloc[n, 1]
margins = leafs_frame.iloc[n, 2:66]
shapes = leafs_frame.iloc[n, 66:130]
textures = leafs_frame.iloc[n, 130:]

print(f'Row shape: {row.shape}')
print(f'ID: {id}')
print(f'Species: {species}')
print(f'Margins\' shape: {margins.shape}')
print(f'Shapes\' shape: {shapes.shape}')
print(f'Textures\' shape: {textures.shape}')

## Load trainset and testset using the Dataloader

In [None]:
train_transform = transforms.Compose([transforms.ToTensor()])
test_transform = transforms.Compose([transforms.ToTensor()])
train_csv = 'train.csv'
test_csv = 'test.csv'
root_dir = 'images/'

batch_size = 32
trainset = LeafDataset(train_csv, root_dir, transform=train_transform, train=False, test=False)
train_loader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=0)
testset = LeafDataset(test_csv, root_dir, transform=train_transform, train=False, test=False)
test_loader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=0)

## Test dataloader

In [None]:
def show_leafs(species, image, margins, shapes, textures, id=None):
    """Show image and margin"""
    fig = plt.figure(figsize=(20,3))
    ax1 = fig.add_subplot(141)
    ax1.imshow(image)
    ax2 = fig.add_subplot(142)
    ax2.plot(margins)
    ax3 = fig.add_subplot(143)
    ax3.plot(shapes)
    ax4 = fig.add_subplot(144)
    ax4.plot(textures)
    plt.suptitle(f"id: {id}, species: {species}")
    plt.pause(0.001)  # pause a bit so that plots are updated
    plt.show()

# img_path = os.path.join('images/', img_name)
# img = io.imread(img_path)
# show_leafs(species, img, margins, textures, shapes, id)

In [None]:
leaf_dataset = LeafDataset(csv_file='train.csv',
                            root_dir='images/',
                            train=True)

for i in range(len(leaf_dataset)):
    X, y = leaf_dataset[i]
    print(i, len(X), X['image'].shape, X['margins'].shape, X['shapes'].shape, X['textures'].shape)

    print(f'sample #{i}')
    show_leafs(y, **X)

    if i == 1:
        break

## Network

In [None]:
# The image shape should be of the format (height, width, channels)
IMAGE_SHAPE = (128, 128, 1)   # <-- Your answer here
NUM_CLASSES =  99  # <-- Your answer here 

# For all three features types margin, shape, and texture, we have NUM_FEATURES for each type.
NUM_FEATURES = 64  # <-- Your answer here

In [None]:
height, width, channels = IMAGE_SHAPE
batch_size = 32

# Keep track of features to output layer
features_cat_size = 1024 # <-- Number of features concatenated before output layer

rnn_input_size = 64 # must be the same as the x_shape channels

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        self.convolutional = nn.Sequential(
            nn.Conv2d(in_channels = 1,
                    out_channels = 8,
                    kernel_size = (3, 3),
                    stride = 2,
                    padding = 1),
            nn.BatchNorm2d(8),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2, 
                         stride = 2),
            nn.Dropout(0.2)
        )

        self.fc1 = nn.Sequential(
            nn.Linear(in_features=32*32*8,
                    out_features=768,
                    bias=False),
            nn.Dropout(0.5)
        )

        self.fc2 = nn.Sequential(
            nn.Linear(in_features=128,
                    out_features=128,
                    bias=False),
            nn.Dropout(0.5)
        )

        # Exercise: Add a recurrent unit like and RNN or GRU
        # >> YOUR CODE HERE <<
        self.recurrent = nn.Sequential(
            nn.GRU(input_size=rnn_input_size, # The number of expected features in the input x
                    hidden_size=128, # The number of features in the hidden state h
                    num_layers=2), # Number of recurrent layers
            SelectItem(0)
        )

        self.l_out = nn.Sequential(
            nn.Linear(in_features=features_cat_size,
                        out_features=NUM_CLASSES,
                        bias=False)
        )
        
        
    def forward(self, x_img, x_margin, x_shape, x_texture):
        features = []
        out = {}

        # Change layer order in images
        x_img = x_img.permute(0, 3, 1, 2)
        
        ## Convolutional layer ##
        # - Change dimensions to fit the convolutional layer 
        # - Apply Conv2d
        # - Use an activation function
        # - Change dimensions s.t. the features can be used in the final FFNN output layer
        
        # >> YOUR CODE HERE <<
        ## 1st way, distort image
        # print("Convolutional...")
        # print(x_img.shape)
        x_img = self.convolutional(x_img)
        # print(x_img.shape)
        x_img = x_img.reshape(x_img.size(0), -1)
        # print(x_img.shape)
        features_img = self.fc1(x_img)
        # print(features_img.shape)

        # Append features to the list "features"
        features.append(features_img)
        
        ## Use concatenated leaf features for FFNN ##
        # print("\nFeed Forward...")
        # print(x_margin.shape, x_texture.shape)
        x = torch.cat((x_margin, x_texture), dim=1)  # if you want to use features as feature vectors
        # print(x.shape)
        x = self.fc2(x)
        features_vector = x
        # print(features_vector.shape)
        
        ## Use concatenated leaf features for RNN ##
        # - Chage dimensions to fit GRU
        # - Apply GRU
        # - Change dimensions s.t. the features can be used in the final FFNN output layer

        # >> YOUR CODE HERE <<
        # print("\nRecurrent...")
        # print(x_shape.shape)
        features_rnn = self.recurrent(x_shape)
        features.append(features_rnn)
        # print(features_rnn.shape)
        
        # Append features to the list "features"
        features.append(features_rnn)
        
        ## Output layer where all features are in use ##
        # print("\Features...")
        features_final = torch.cat(features, dim=1)
        # print(features_final.shape)
        
        # print("\nOutput...")
        out['out'] = self.l_out(features_final)
        # print(out['out'].shape) 
        # print(out)
        return out

net = Net()
if use_cuda:
    net.cuda()
print(net)

## Test network

In [None]:
_img_shape = tuple([batch_size] + list(IMAGE_SHAPE))
_feature_shape = (batch_size, NUM_FEATURES)

def randnorm(size):
    return np.random.normal(0, 1, size).astype('float32')

# dummy data
_x_image = get_variable(Variable(torch.from_numpy(randnorm(_img_shape))))
_x_margin = get_variable(Variable(torch.from_numpy(randnorm(_feature_shape))))
_x_shape = get_variable(Variable(torch.from_numpy(randnorm(_feature_shape))))
_x_texture = get_variable(Variable(torch.from_numpy(randnorm(_feature_shape))))

# test the forward pass
print(f"Image shape: {_x_image.shape}")
print(f"Image margin: {_x_margin.shape}")
print(f"Image texture: {_x_texture.shape}")
print(f"Image shape: {_x_shape.shape}\n")
output = net(x_img=_x_image, x_margin=_x_margin, x_shape=_x_shape, x_texture=_x_texture)
print(f"Output: {output}")

## Train function

In [None]:

def train(model, opt, loss_fn, epochs, train_loader, test_loader):
    epoch_results = []

    for epoch in range(epochs):
        clear_output(wait=True)
        print('* Epoch %d/%d' % (epoch+1, epochs))

        train_loss = 0
        train_correct = 0
        model.train()  # train mode
        for i, data in enumerate(train_loader, 0):
            inputs, labels = data

        # for X_batch, Y_batch in train_loader:
        #     X_batch = X_batch.to(device)
        #     Y_batch = Y_batch.to(device)

            # set parameter gradients to zero
            opt.zero_grad()

            # forward
            outputs = model(inputs)['out']
            labels_argmax = torch.max(get_labels(inputs), 1)[1]
            # calculate loss function
            loss = loss_fn(outputs, labels)
            # back-propagation
            loss.backward()
            # weight update
            opt.step()

            # calculate metrics to show the user
            train_correct += (outputs == y_pred).sum().cpu().item()
            train_loss += loss / len(train_loader)
            
        #print(' - accuracy: %f' % train_accuracy)
        train_accuracy = train_correct/(len(trainset)*128*128)
        print(' - train accuracy: %f' % train_accuracy)
        print(' - train loss: %f' % train_loss)

        # show intermediate results
        # Compute the val accuracy
        model.eval()  # testing mode
        val_correct = 0
        val_loss = 0
        i = 0
        for x_val, y_val in test_loader:
            x_val, y_val = x_val.to(device), y_val.to(device)            
            with torch.no_grad():
                output, y_sigmoid = model(x_val)
                y_pred = torch.where(y_sigmoid > 0.5, 1., 0.)
                
            if i==0:
                x_ = x_val
                output_ = output

            loss = loss_fn(y_val, y_sigmoid)
            val_loss += loss/len(test_loader)
            val_correct += (y_val==y_pred).sum().cpu().item()
            i+=1

        print(' - val loss: %f' % val_loss)
        val_accuracy = val_correct/(len(testset)*128*128)
        print(' - val accuracy: %f' % val_accuracy)
        print("\n")

        y_ = torch.sigmoid(output_).detach().cpu()
        for k in range(6):
            plt.subplot(2, 6, k+1)
            plt.imshow(np.rollaxis(x_[k].cpu().numpy(), 0, 3), cmap='gray')
            plt.title('Real')
            plt.axis('off')

            plt.subplot(2, 6, k+7)
            plt.imshow(y_[k, 0], cmap='gray')
            plt.title('Output')
            plt.axis('off')
        plt.suptitle('%d / %d - loss: %f - acc %f' % (epoch+1, epochs, train_loss, train_accuracy))
        plt.show()
        time.sleep(0.1)

        epoch_results.append([train_accuracy, train_loss.item(), val_accuracy, val_loss.item()])
    
    return epoch_results

## Train parameters

In [None]:
LEARNING_RATE = 0.001
num_epochs = 20
criterion = nn.CrossEntropyLoss()

# weight_decay is equal to L2 regularization
optimizer = optim.Adam(net.parameters(), lr=LEARNING_RATE) # , momentum=0.9)

def accuracy(ys, ts):
    predictions = torch.max(ys, 1)[1]
    correct_prediction = torch.eq(predictions, ts)
    return torch.mean(correct_prediction.float())

## Train the network

In [None]:
results_11 = train(net, optimizer, criterion, num_epochs, train_loader, test_loader)