# HaMLeT

## Session 6: Backpropagation
by Leon Weninger and Raphael Kolk

### Goal of this Session

In this session you will, step by step, implement a the backpropagation algorithm yourself without using any deep learning libraries. You should already be familiar with Python as well as NumPy (a package for scientific computing with Python).

### Given code

**Task 0:** Familiarize yourself briefly with the given code. Pay particural attention to the `Layer` and `Cost` classes, from which you will derive the classes you implement, and the `Sigmoid` layer which is predefined as an example. You'll also need to execute the cells in this section once.

The following code loads the data and trains the network in a similar fashion as in the last session.

In [94]:
import numpy as np
from tqdm import tqdm
from load_mnist import MNIST


def vectorize(j):
    label_vector = np.zeros((1, 10))
    label_vector[0, int(j)] = 1.0
    return label_vector


def load_data():
    mnist = MNIST()
    images, labels = mnist.data, mnist.target

    image_size = images.shape[1]
    label_size = labels.shape[1]

    random_permutation = np.random.permutation(images.shape[0])
    images = images[random_permutation, :]
    labels = labels[random_permutation, :]
    
    images = (images - np.mean(images))/np.std(images)

    return images, labels, image_size, label_size


def train(net, cost_function, number_epochs, batch_size, learning_rate):
    images, labels, image_size, label_size = load_data()
    training_images, validation_images = images[:50000], images[50000:]
    training_labels, validation_labels = labels[:50000], labels[50000:]

    for e in range(number_epochs):
        cost = train_epoch(e, net, training_images, training_labels, cost_function, batch_size, learning_rate)
        accuracy = validate_epoch(e, net, validation_images, validation_labels, batch_size)
        print('cost=%5.6f, accuracy=%2.6f' % (cost, accuracy), flush=True)


def train_epoch(e, net, images, labels, cost_function, batch_size, learning_rate):
    epoch_cost = 0
    for i in tqdm(range(0, len(images), batch_size), ascii=False, desc='training,   e=%i' % e):
        
        batch_images = images[i:min(i + batch_size, len(images)), :]
        batch_labels = labels[i:min(i + batch_size, len(labels)), :]
        
        if(batch_images.shape[0] != batch_size):
            break

        
        # zero the gradients
        net.zero_gradients()

        # forward pass
        prediction = net.forward(batch_images)
        cost = cost_function.estimate(batch_labels, prediction)

        # backward pass
        dprediction = cost_function.gradient(cost)
        net.backward(dprediction)

        # update the parameters using the computed gradients via stochastic gradient descent.
        net.update_parameters(learning_rate)

        epoch_cost += np.mean(cost)
        
    return epoch_cost


def validate_epoch(e, net, images, labels, batch_size):
    n_correct = 0
    n_total = 0

    for i in tqdm(range(0, len(images), batch_size), ascii=False, desc='validation, e=%i' % e):
        batch_images = images[i:min(i + batch_size, len(images)), :]
        batch_labels = labels[i:min(i + batch_size, len(labels)), :]
        
        if(batch_images.shape[0] != batch_size):
            break
        # compute predicted probabilities.
        predictions = net.forward(batch_images)

        # find the most probable class label.
        n_correct += sum(np.argmax(batch_labels, axis=1) == np.argmax(predictions, axis=1))
        n_total += batch_labels.shape[0]

    return n_correct / n_total


Remember the sigmoid function and its derivative you implemented in the previous session.

In [83]:
def sigmoid_function(var):
    return 1.0 / (1.0 + np.exp(-var))


def sigmoid_derivative(z):
    return sigmoid_function(z) * (1 - sigmoid_function(z))

The following abstract classes should serve as parent classes for all the different layers and cost functions which you will implement. 

In [84]:
class Layer:
    def __init__(self):
        # Initialize all member variables of the layer.
        pass

    def forward(self, x_in):
        # Implemets for forward pass of the layer and returns x_out.
        pass

    def backward(self, d_out):
        # Implements the backward pass of the layer and returns d_in.
        pass

    def zero_gradients(self):
        # Sets all gradients of the layer to zero.
        pass

    def update_parameters(self, learning_rate):
        # Update the parameters of the layer with the help of the gradients stored during the backward pass.
        pass


class Cost:
    def __init__(self):
        # Initialize all member variables of the cost function.
        pass

    def estimate(self, target, prediction):
        # Estimates and return the cost with respect to the predicted label and a target label previously set by set_target().
        pass

    def gradient(self, cost):
        # Calculates and returns the gradient with respect to the cost.
        pass

The following class derived from the Layer class implements the forward and backward pass of the sigmoid activation function alread known from the previous session and serves as an example for you. Since it does not have learnable parameters, no `update_parameters` or `zero_gradients` function needs to be implemented.

In [85]:
class Sigmoid(Layer):
    def __init__(self):
        self.x_in = None
    
    def forward(self, x_in):
        self.x_in = x_in
        x_out = sigmoid_function(x_in)
        return x_out
    
    def backward(self, d_out):
        d_in = d_out * sigmoid_derivative(self.x_in)
        return d_in
    
    def zero_gradients(self):
        pass
    
    def update_parameters(self, learning_rate):
        pass

### Theoretical Foundation 

**Task 1a:** Take a peace of paper and a pencil, use your knowledge from the preparation material and the introduction slides and fill in the gaps in the preparation material. Having written the formulas down, please check with the tutor if they are correct.

Before starting to work on the code, think about batched forward- and backward passing. We keep to the convention of having the batch as the first dimension of all tensors. This is consistent with modern Deep Learning Frameworks, as you will get to know in the next session. However, this convention may change when tensors needs to be transposed when performing multiplications and additions.

In [86]:
class Linear(Layer):
    def __init__(self, n_in, n_out, initial_sigma=0.1):
        self.n_in = n_in
        self.n_out = n_out
        self.w = initial_sigma * np.random.randn(n_out, n_in)
        self.b = np.zeros((1, n_out))
        self.zero_gradients()
        self.x_in = None

    def forward(self, x_in):
        self.x_in = x_in
        x_out = np.matmul(x_in, self.w.T) + self.b #np.matmul(x_in.T, self.w ) + self.b
        return x_out

    def backward(self, d_out):
        self.d_in = np.dot(d_out,self.w )
        self.db = d_out
        self.dw = np.dot(self.x_in.T, d_out).T
        return self.d_in

    def zero_gradients(self):
        self.dw = np.zeros((self.n_out, self.n_in))
        self.db = np.zeros((1, self.n_out))
        self.dx = np.empty((0, self.n_in))

    def update_parameters(self, learning_rate):
        self.w = self.w - learning_rate * self.dw
        self.b = self.b - learning_rate * self.db


class MeanSquareError(Cost):
    def __init__(self):
        self.prediction = None
        self.target = None

    def estimate(self, target, prediction):
        self.prediction = prediction
        self.target = target
        cost = 0.5 * np.sum( (target-prediction) ** 2) 
        return cost

    def gradient(self, cost):
        gradient = self.prediction - self.target
        return gradient



    
class Network(Layer):
    def __init__(self, layers):
        self.layers = layers

    #Implement the Network class which can encapsulate multiple layers. It offers the same interface as a layer and is therefore derived
    #from the Layer parent as well. Make sure to implement all member functions needed. 
    #The forward function propagates a given input through all encapsulated layers and returns the final prediction of the network, 
    #whereas the backward function propagates a given gradient through all layers in reversed order. 
    #zero_gradients and update_parameters invoke the respective functions of the encapsulated layers.
    
    def forward(self,x):
        for layer in self.layers:
            x = layer.forward(x)
        return x
    
    def backward(self, d):
        for layer in reversed(self.layers):
            d = layer.backward(d)
        return d
    
    def zero_gradients(self):
        for layer in self.layers:
            layer.zero_gradients()
    
    def update_parameters(self, learning_rate):
        for layer in self.layers:
            layer.update_parameters(learning_rate)

In [103]:
# define hyperparameters
input_size = 28**2
label_size = 10
batch_size = 600
learning_rate = 0.0001
number_epochs = 100

random_input = np.random.rand(batch_size, input_size)
random_label = np.random.rand(batch_size, label_size)

linear_layer = Linear(input_size, label_size)
sigmoid_layer = Sigmoid()
cost_function = MeanSquareError()

# Test your implementation by propagating random input through a linear layer followed by a sigmoid layer, estimating the mean square error to a random target, 
# calculating the gradient and propagating it back through the sigmoid and linear layer. Afterwards update the parameters of the linear layer using the function you implemented.
z = linear_layer.forward(random_input)
a = sigmoid_layer.forward(z)
cost = cost_function.estimate(random_label, a)
gradient = cost_function.gradient(cost)
d_out = sigmoid_layer.backward(gradient)
d_in = linear_layer.backward(d_out)
linear_layer.update_parameters(learning_rate)

net = Network([linear_layer, sigmoid_layer])
out = net.forward(x=random_input)
cost = cost_function.estimate(random_label, out)
gradient = cost_function.gradient(cost)
d_in = net.backward(d=gradient)
net.update_parameters(learning_rate)

print("Task 4 output")
train(net, cost_function, number_epochs, batch_size, learning_rate)

print("Task 5 output")
batch_size = 600
learning_rate = 0.0001
number_epochs = 150
linear_layer1 = Linear(input_size, 300)
sigmoid_layer1 = Sigmoid()
linear_layer4 = Linear(300, label_size)
sigmoid_layer4 = Sigmoid()

cost_function = MeanSquareError()
net2 = Network([linear_layer1, sigmoid_layer1, linear_layer4, sigmoid_layer4])
train(net2, cost_function, number_epochs, batch_size, learning_rate)

training,   e=0:  93%|█████████▎| 78/84 [00:01<00:00, 48.17it/s]
validation, e=0:  91%|█████████ | 31/34 [00:00<00:00, 144.37it/s]

cost=22556.975420, accuracy=0.455707



training,   e=1:  99%|█████████▉| 83/84 [00:01<00:00, 50.41it/s]
validation, e=1:  88%|████████▊ | 30/34 [00:00<00:00, 135.32it/s]

cost=17075.206707, accuracy=0.636566



training,   e=2:  93%|█████████▎| 78/84 [00:01<00:00, 51.19it/s]
validation, e=2:  94%|█████████▍| 32/34 [00:00<00:00, 151.05it/s]

cost=14257.340296, accuracy=0.722475



training,   e=3:  93%|█████████▎| 78/84 [00:01<00:00, 51.19it/s]
validation, e=3:  94%|█████████▍| 32/34 [00:00<00:00, 151.79it/s]

cost=12352.928531, accuracy=0.770909



training,   e=4:  93%|█████████▎| 78/84 [00:01<00:00, 46.67it/s]
validation, e=4:  94%|█████████▍| 32/34 [00:00<00:00, 152.38it/s]

cost=11062.304811, accuracy=0.799545



training,   e=5:  93%|█████████▎| 78/84 [00:01<00:00, 47.24it/s]
validation, e=5:  94%|█████████▍| 32/34 [00:00<00:00, 152.09it/s]

cost=10136.878113, accuracy=0.815253



training,   e=6:  93%|█████████▎| 78/84 [00:01<00:00, 51.90it/s]
validation, e=6:  85%|████████▌ | 29/34 [00:00<00:00, 79.28it/s]

cost=9437.876286, accuracy=0.828333



training,   e=7:  96%|█████████▋| 81/84 [00:01<00:00, 51.97it/s]
validation, e=7:  91%|█████████ | 31/34 [00:00<00:00, 139.22it/s]

cost=8887.994385, accuracy=0.838081



training,   e=8:  93%|█████████▎| 78/84 [00:01<00:00, 49.08it/s]
validation, e=8:  53%|█████▎    | 18/34 [00:00<00:00, 60.50it/s]

cost=8441.634464, accuracy=0.844899



training,   e=9:  98%|█████████▊| 82/84 [00:01<00:00, 49.59it/s]
validation, e=9:  94%|█████████▍| 32/34 [00:00<00:00, 147.69it/s]

cost=8070.351842, accuracy=0.850303



training,   e=10:  93%|█████████▎| 78/84 [00:01<00:00, 52.39it/s]
validation, e=10:  94%|█████████▍| 32/34 [00:00<00:00, 150.92it/s]

cost=7755.483377, accuracy=0.854899



training,   e=11:  93%|█████████▎| 78/84 [00:01<00:00, 51.57it/s]
validation, e=11:  94%|█████████▍| 32/34 [00:00<00:00, 149.41it/s]

cost=7484.237672, accuracy=0.858990



training,   e=12:  94%|█████████▍| 79/84 [00:01<00:00, 45.91it/s]
validation, e=12:  91%|█████████ | 31/34 [00:00<00:00, 140.73it/s]

cost=7247.520195, accuracy=0.862172



training,   e=13:  94%|█████████▍| 79/84 [00:01<00:00, 42.69it/s]
validation, e=13:  94%|█████████▍| 32/34 [00:00<00:00, 99.34it/s]

cost=7038.664050, accuracy=0.865202



training,   e=14:  98%|█████████▊| 82/84 [00:01<00:00, 51.51it/s]
validation, e=14:  94%|█████████▍| 32/34 [00:00<00:00, 153.22it/s]

cost=6852.655344, accuracy=0.867475



training,   e=15:  94%|█████████▍| 79/84 [00:01<00:00, 46.86it/s]
validation, e=15:  94%|█████████▍| 32/34 [00:00<00:00, 152.83it/s]

cost=6685.641225, accuracy=0.870253



training,   e=16:  99%|█████████▉| 83/84 [00:01<00:00, 43.17it/s]
validation, e=16:  91%|█████████ | 31/34 [00:00<00:00, 142.33it/s]

cost=6534.606345, accuracy=0.872475



training,   e=17:  99%|█████████▉| 83/84 [00:01<00:00, 45.84it/s]
validation, e=17:  94%|█████████▍| 32/34 [00:00<00:00, 149.74it/s]

cost=6397.153492, accuracy=0.874444



training,   e=18:  99%|█████████▉| 83/84 [00:01<00:00, 49.99it/s]
validation, e=18:  88%|████████▊ | 30/34 [00:00<00:00, 134.52it/s]

cost=6271.350805, accuracy=0.876212



training,   e=19:  99%|█████████▉| 83/84 [00:01<00:00, 53.61it/s]
validation, e=19:  68%|██████▊   | 23/34 [00:00<00:00, 81.59it/s]

cost=6155.622808, accuracy=0.877879



training,   e=20:  93%|█████████▎| 78/84 [00:01<00:00, 48.93it/s]
validation, e=20:  91%|█████████ | 31/34 [00:00<00:00, 141.30it/s]

cost=6048.671075, accuracy=0.879646



training,   e=21:  93%|█████████▎| 78/84 [00:01<00:00, 52.19it/s]
validation, e=21:  94%|█████████▍| 32/34 [00:00<00:00, 152.51it/s]

cost=5949.415398, accuracy=0.881212



training,   e=22:  93%|█████████▎| 78/84 [00:01<00:00, 51.94it/s]
validation, e=22:  94%|█████████▍| 32/34 [00:00<00:00, 153.53it/s]

cost=5856.949465, accuracy=0.882525



training,   e=23:  93%|█████████▎| 78/84 [00:01<00:00, 51.72it/s]
validation, e=23:  71%|███████   | 24/34 [00:00<00:00, 89.05it/s]

cost=5770.506982, accuracy=0.884141



training,   e=24:  93%|█████████▎| 78/84 [00:01<00:00, 51.48it/s]
validation, e=24:  59%|█████▉    | 20/34 [00:00<00:00, 69.33it/s]

cost=5689.435462, accuracy=0.886061



training,   e=25:  98%|█████████▊| 82/84 [00:01<00:00, 50.78it/s]
validation, e=25:  94%|█████████▍| 32/34 [00:00<00:00, 148.22it/s]

cost=5613.175706, accuracy=0.887424



training,   e=26:  93%|█████████▎| 78/84 [00:01<00:00, 52.12it/s]
validation, e=26:  94%|█████████▍| 32/34 [00:00<00:00, 153.65it/s]

cost=5541.245591, accuracy=0.888535



training,   e=27:  93%|█████████▎| 78/84 [00:01<00:00, 50.75it/s]
validation, e=27:  94%|█████████▍| 32/34 [00:00<00:00, 153.52it/s]

cost=5473.227123, accuracy=0.889444



training,   e=28:  98%|█████████▊| 82/84 [00:01<00:00, 52.92it/s]
validation, e=28:  94%|█████████▍| 32/34 [00:00<00:00, 151.59it/s]

cost=5408.756029, accuracy=0.890505



training,   e=29:  93%|█████████▎| 78/84 [00:01<00:00, 49.18it/s]
validation, e=29:  53%|█████▎    | 18/34 [00:00<00:00, 59.75it/s]

cost=5347.513318, accuracy=0.891465



training,   e=30:  93%|█████████▎| 78/84 [00:01<00:00, 51.70it/s]
validation, e=30:  94%|█████████▍| 32/34 [00:00<00:00, 149.54it/s]

cost=5289.218382, accuracy=0.892222



training,   e=31:  93%|█████████▎| 78/84 [00:01<00:00, 52.32it/s]
validation, e=31:  94%|█████████▍| 32/34 [00:00<00:00, 152.76it/s]

cost=5233.623333, accuracy=0.892879



training,   e=32:  93%|█████████▎| 78/84 [00:01<00:00, 51.75it/s]
validation, e=32:  94%|█████████▍| 32/34 [00:00<00:00, 147.33it/s]

cost=5180.508313, accuracy=0.893535



training,   e=33:  93%|█████████▎| 78/84 [00:01<00:00, 51.09it/s]
validation, e=33:  94%|█████████▍| 32/34 [00:00<00:00, 152.42it/s]

cost=5129.677596, accuracy=0.894545



training,   e=34:  93%|█████████▎| 78/84 [00:01<00:00, 51.08it/s]
validation, e=34:  94%|█████████▍| 32/34 [00:00<00:00, 153.20it/s]

cost=5080.956321, accuracy=0.895253



training,   e=35:  93%|█████████▎| 78/84 [00:01<00:00, 51.02it/s]
validation, e=35:  94%|█████████▍| 32/34 [00:00<00:00, 151.09it/s]

cost=5034.187750, accuracy=0.895960



training,   e=36:  99%|█████████▉| 83/84 [00:01<00:00, 51.42it/s]
validation, e=36:  94%|█████████▍| 32/34 [00:00<00:00, 152.46it/s]

cost=4989.230950, accuracy=0.896667



training,   e=37:  93%|█████████▎| 78/84 [00:01<00:00, 52.46it/s]
validation, e=37:  85%|████████▌ | 29/34 [00:00<00:00, 118.40it/s]

cost=4945.958819, accuracy=0.897323



training,   e=38:  93%|█████████▎| 78/84 [00:01<00:00, 52.22it/s]
validation, e=38:  94%|█████████▍| 32/34 [00:00<00:00, 150.75it/s]

cost=4904.256409, accuracy=0.897980



training,   e=39:  93%|█████████▎| 78/84 [00:01<00:00, 51.64it/s]
validation, e=39:  94%|█████████▍| 32/34 [00:00<00:00, 152.28it/s]

cost=4864.019488, accuracy=0.898232



training,   e=40:  93%|█████████▎| 78/84 [00:01<00:00, 51.92it/s]
validation, e=40:  94%|█████████▍| 32/34 [00:00<00:00, 152.87it/s]

cost=4825.153301, accuracy=0.899091



training,   e=41:  93%|█████████▎| 78/84 [00:01<00:00, 49.91it/s]
validation, e=41:  97%|█████████▋| 33/34 [00:00<00:00, 159.04it/s]

cost=4787.571499, accuracy=0.899444



training,   e=42:  93%|█████████▎| 78/84 [00:01<00:00, 51.71it/s]
validation, e=42:  94%|█████████▍| 32/34 [00:00<00:00, 151.61it/s]

cost=4751.195219, accuracy=0.900051



training,   e=43:  93%|█████████▎| 78/84 [00:01<00:00, 52.08it/s]
validation, e=43:  94%|█████████▍| 32/34 [00:00<00:00, 153.26it/s]

cost=4715.952272, accuracy=0.900758



training,   e=44:  93%|█████████▎| 78/84 [00:01<00:00, 52.60it/s]
validation, e=44:  94%|█████████▍| 32/34 [00:00<00:00, 151.77it/s]

cost=4681.776447, accuracy=0.901111



training,   e=45:  98%|█████████▊| 82/84 [00:01<00:00, 49.82it/s]
validation, e=45:  94%|█████████▍| 32/34 [00:00<00:00, 153.51it/s]

cost=4648.606891, accuracy=0.901313



training,   e=46:  99%|█████████▉| 83/84 [00:01<00:00, 54.15it/s]
validation, e=46:  94%|█████████▍| 32/34 [00:00<00:00, 152.45it/s]

cost=4616.387570, accuracy=0.901869



training,   e=47:  93%|█████████▎| 78/84 [00:01<00:00, 51.16it/s]
validation, e=47:  94%|█████████▍| 32/34 [00:00<00:00, 153.43it/s]

cost=4585.066794, accuracy=0.902273



training,   e=48:  96%|█████████▋| 81/84 [00:01<00:00, 52.18it/s]
validation, e=48:  94%|█████████▍| 32/34 [00:00<00:00, 152.49it/s]

cost=4554.596800, accuracy=0.902475



training,   e=49:  93%|█████████▎| 78/84 [00:01<00:00, 50.01it/s]
validation, e=49:  62%|██████▏   | 21/34 [00:00<00:00, 66.94it/s] 

cost=4524.933375, accuracy=0.903182



training,   e=50:  99%|█████████▉| 83/84 [00:01<00:00, 52.53it/s]
validation, e=50:  97%|█████████▋| 33/34 [00:00<00:00, 104.50it/s]

cost=4496.035529, accuracy=0.903586



training,   e=51:  94%|█████████▍| 79/84 [00:01<00:00, 44.67it/s]
validation, e=51:  94%|█████████▍| 32/34 [00:00<00:00, 151.19it/s]

cost=4467.865199, accuracy=0.903687



training,   e=52:  93%|█████████▎| 78/84 [00:01<00:00, 49.32it/s]
validation, e=52:  94%|█████████▍| 32/34 [00:00<00:00, 150.03it/s]

cost=4440.386991, accuracy=0.904242



training,   e=53:  98%|█████████▊| 82/84 [00:01<00:00, 52.30it/s]
validation, e=53:  94%|█████████▍| 32/34 [00:00<00:00, 149.81it/s]

cost=4413.567944, accuracy=0.904697



training,   e=54:  94%|█████████▍| 79/84 [00:01<00:00, 49.11it/s]
validation, e=54:  94%|█████████▍| 32/34 [00:00<00:00, 150.06it/s]

cost=4387.377317, accuracy=0.905051



training,   e=55:  93%|█████████▎| 78/84 [00:01<00:00, 50.66it/s]
validation, e=55:  91%|█████████ | 31/34 [00:00<00:00, 142.61it/s]

cost=4361.786405, accuracy=0.905404



training,   e=56:  93%|█████████▎| 78/84 [00:01<00:00, 50.50it/s]
validation, e=56:  94%|█████████▍| 32/34 [00:00<00:00, 150.54it/s]

cost=4336.768370, accuracy=0.906010



training,   e=57:  99%|█████████▉| 83/84 [00:01<00:00, 52.19it/s]
validation, e=57:  94%|█████████▍| 32/34 [00:00<00:00, 148.30it/s]

cost=4312.298085, accuracy=0.906717



training,   e=58:  99%|█████████▉| 83/84 [00:01<00:00, 52.13it/s]
validation, e=58:  91%|█████████ | 31/34 [00:00<00:00, 141.19it/s]

cost=4288.351996, accuracy=0.907172



training,   e=59:  99%|█████████▉| 83/84 [00:01<00:00, 53.88it/s]
validation, e=59:  94%|█████████▍| 32/34 [00:00<00:00, 150.05it/s]

cost=4264.908004, accuracy=0.907727



training,   e=60:  93%|█████████▎| 78/84 [00:01<00:00, 50.58it/s]
validation, e=60:  68%|██████▊   | 23/34 [00:00<00:00, 80.04it/s]

cost=4241.945344, accuracy=0.908131



training,   e=61:  99%|█████████▉| 83/84 [00:01<00:00, 49.77it/s]
validation, e=61:  94%|█████████▍| 32/34 [00:00<00:00, 150.56it/s]

cost=4219.444488, accuracy=0.908333



training,   e=62:  93%|█████████▎| 78/84 [00:01<00:00, 50.43it/s]
validation, e=62:  94%|█████████▍| 32/34 [00:00<00:00, 149.75it/s]

cost=4197.387048, accuracy=0.908687



training,   e=63:  98%|█████████▊| 82/84 [00:01<00:00, 49.36it/s]
validation, e=63:  68%|██████▊   | 23/34 [00:00<00:00, 82.61it/s] 

cost=4175.755696, accuracy=0.909495



training,   e=64:  96%|█████████▋| 81/84 [00:01<00:00, 46.07it/s]
validation, e=64:  94%|█████████▍| 32/34 [00:00<00:00, 152.14it/s]

cost=4154.534083, accuracy=0.909697



training,   e=65:  99%|█████████▉| 83/84 [00:01<00:00, 53.78it/s]
validation, e=65:  94%|█████████▍| 32/34 [00:00<00:00, 151.02it/s]

cost=4133.706770, accuracy=0.910152



training,   e=66:  99%|█████████▉| 83/84 [00:01<00:00, 50.80it/s]
validation, e=66:  94%|█████████▍| 32/34 [00:00<00:00, 146.97it/s]

cost=4113.259162, accuracy=0.910859



training,   e=67:  93%|█████████▎| 78/84 [00:01<00:00, 50.48it/s]
validation, e=67:  94%|█████████▍| 32/34 [00:00<00:00, 151.76it/s]

cost=4093.177453, accuracy=0.911212



training,   e=68:  99%|█████████▉| 83/84 [00:01<00:00, 52.16it/s]
validation, e=68:  94%|█████████▍| 32/34 [00:00<00:00, 150.86it/s]

cost=4073.448567, accuracy=0.911667



training,   e=69:  96%|█████████▋| 81/84 [00:01<00:00, 44.26it/s]
validation, e=69:  88%|████████▊ | 30/34 [00:00<00:00, 131.64it/s]

cost=4054.060113, accuracy=0.912020



training,   e=70:  94%|█████████▍| 79/84 [00:01<00:00, 41.44it/s]
validation, e=70:  71%|███████   | 24/34 [00:00<00:00, 90.58it/s] 

cost=4035.000334, accuracy=0.912172



training,   e=71:  96%|█████████▋| 81/84 [00:01<00:00, 47.96it/s]
validation, e=71:  94%|█████████▍| 32/34 [00:00<00:00, 150.71it/s]

cost=4016.258071, accuracy=0.912576



training,   e=72:  93%|█████████▎| 78/84 [00:01<00:00, 48.47it/s]
validation, e=72:  94%|█████████▍| 32/34 [00:00<00:00, 146.89it/s]

cost=3997.822721, accuracy=0.912980



training,   e=73:  95%|█████████▌| 80/84 [00:01<00:00, 45.65it/s]
validation, e=73:  94%|█████████▍| 32/34 [00:00<00:00, 152.17it/s]

cost=3979.684200, accuracy=0.913182



training,   e=74:  98%|█████████▊| 82/84 [00:01<00:00, 47.45it/s]
validation, e=74:  94%|█████████▍| 32/34 [00:00<00:00, 149.61it/s]

cost=3961.832914, accuracy=0.913434



training,   e=75:  93%|█████████▎| 78/84 [00:01<00:00, 48.90it/s]
validation, e=75:  94%|█████████▍| 32/34 [00:00<00:00, 151.04it/s]

cost=3944.259726, accuracy=0.913636



training,   e=76:  93%|█████████▎| 78/84 [00:01<00:00, 48.66it/s]
validation, e=76:  94%|█████████▍| 32/34 [00:00<00:00, 149.72it/s]

cost=3926.955929, accuracy=0.913889



training,   e=77:  99%|█████████▉| 83/84 [00:01<00:00, 53.89it/s]
validation, e=77:  94%|█████████▍| 32/34 [00:00<00:00, 151.05it/s]

cost=3909.913219, accuracy=0.913990



training,   e=78:  93%|█████████▎| 78/84 [00:01<00:00, 50.75it/s]
validation, e=78:  94%|█████████▍| 32/34 [00:00<00:00, 151.13it/s]

cost=3893.123671, accuracy=0.914293



training,   e=79:  93%|█████████▎| 78/84 [00:01<00:00, 49.64it/s]
validation, e=79:  71%|███████   | 24/34 [00:00<00:00, 88.77it/s]

cost=3876.579717, accuracy=0.914495



training,   e=80:  96%|█████████▋| 81/84 [00:01<00:00, 49.35it/s]
validation, e=80:  94%|█████████▍| 32/34 [00:00<00:00, 151.56it/s]

cost=3860.274125, accuracy=0.914949



training,   e=81:  93%|█████████▎| 78/84 [00:01<00:00, 51.21it/s]
validation, e=81:  94%|█████████▍| 32/34 [00:00<00:00, 145.76it/s]

cost=3844.199980, accuracy=0.915354



training,   e=82:  99%|█████████▉| 83/84 [00:01<00:00, 52.80it/s]
validation, e=82:  94%|█████████▍| 32/34 [00:00<00:00, 150.73it/s]

cost=3828.350666, accuracy=0.915455



training,   e=83:  93%|█████████▎| 78/84 [00:01<00:00, 50.38it/s]
validation, e=83:  94%|█████████▍| 32/34 [00:00<00:00, 150.18it/s]

cost=3812.719849, accuracy=0.915707



training,   e=84:  98%|█████████▊| 82/84 [00:01<00:00, 44.90it/s]
validation, e=84:  65%|██████▍   | 22/34 [00:00<00:00, 79.25it/s]

cost=3797.301460, accuracy=0.915859



training,   e=85:  99%|█████████▉| 83/84 [00:01<00:00, 50.58it/s]
validation, e=85:  94%|█████████▍| 32/34 [00:00<00:00, 151.77it/s]

cost=3782.089684, accuracy=0.915960



training,   e=86:  98%|█████████▊| 82/84 [00:01<00:00, 48.37it/s]
validation, e=86:  94%|█████████▍| 32/34 [00:00<00:00, 150.97it/s]

cost=3767.078941, accuracy=0.916061



training,   e=87:  93%|█████████▎| 78/84 [00:01<00:00, 50.87it/s]
validation, e=87:  94%|█████████▍| 32/34 [00:00<00:00, 149.85it/s]

cost=3752.263880, accuracy=0.916313



training,   e=88:  96%|█████████▋| 81/84 [00:01<00:00, 45.10it/s]
validation, e=88:  74%|███████▎  | 25/34 [00:00<00:00, 96.25it/s] 

cost=3737.639362, accuracy=0.916566



training,   e=89:  94%|█████████▍| 79/84 [00:02<00:00, 37.96it/s]
validation, e=89:  94%|█████████▍| 32/34 [00:00<00:00, 149.63it/s]

cost=3723.200448, accuracy=0.916768



training,   e=90:  99%|█████████▉| 83/84 [00:01<00:00, 51.16it/s]
validation, e=90:  53%|█████▎    | 18/34 [00:00<00:00, 58.06it/s]

cost=3708.942396, accuracy=0.917071



training,   e=91:  93%|█████████▎| 78/84 [00:01<00:00, 50.86it/s]
validation, e=91:  94%|█████████▍| 32/34 [00:00<00:00, 153.02it/s]

cost=3694.860641, accuracy=0.917323



training,   e=92:  94%|█████████▍| 79/84 [00:01<00:00, 45.85it/s]
validation, e=92:  94%|█████████▍| 32/34 [00:00<00:00, 149.79it/s]

cost=3680.950796, accuracy=0.917475



training,   e=93:  93%|█████████▎| 78/84 [00:01<00:00, 50.17it/s]
validation, e=93:  88%|████████▊ | 30/34 [00:00<00:00, 132.84it/s]

cost=3667.208634, accuracy=0.917626



training,   e=94:  93%|█████████▎| 78/84 [00:01<00:00, 48.88it/s]
validation, e=94:  94%|█████████▍| 32/34 [00:00<00:00, 151.69it/s]

cost=3653.630087, accuracy=0.917828



training,   e=95:  96%|█████████▋| 81/84 [00:01<00:00, 45.30it/s]
validation, e=95:  94%|█████████▍| 32/34 [00:00<00:00, 150.96it/s]

cost=3640.211236, accuracy=0.917980



training,   e=96:  98%|█████████▊| 82/84 [00:01<00:00, 52.69it/s]
validation, e=96:  94%|█████████▍| 32/34 [00:00<00:00, 146.67it/s]

cost=3626.948302, accuracy=0.918182



training,   e=97:  93%|█████████▎| 78/84 [00:01<00:00, 50.05it/s]
validation, e=97:  94%|█████████▍| 32/34 [00:00<00:00, 152.46it/s]

cost=3613.837642, accuracy=0.918535



training,   e=98:  93%|█████████▎| 78/84 [00:01<00:00, 50.15it/s]
validation, e=98:  94%|█████████▍| 32/34 [00:00<00:00, 147.68it/s]

cost=3600.875743, accuracy=0.918838



training,   e=99:  93%|█████████▎| 78/84 [00:01<00:00, 51.21it/s]
validation, e=99:  94%|█████████▍| 32/34 [00:00<00:00, 149.39it/s]

cost=3588.059213, accuracy=0.919091



training,   e=100:  96%|█████████▋| 81/84 [00:01<00:00, 51.11it/s]
validation, e=100:  94%|█████████▍| 32/34 [00:00<00:00, 149.80it/s]

cost=3575.384776, accuracy=0.919444



training,   e=101:  93%|█████████▎| 78/84 [00:01<00:00, 49.91it/s]
validation, e=101:  91%|█████████ | 31/34 [00:00<00:00, 140.29it/s]

cost=3562.849271, accuracy=0.919545



training,   e=102:  99%|█████████▉| 83/84 [00:01<00:00, 53.18it/s]
validation, e=102:  94%|█████████▍| 32/34 [00:00<00:00, 151.29it/s]

cost=3550.449642, accuracy=0.919798



training,   e=103:  93%|█████████▎| 78/84 [00:01<00:00, 50.45it/s]
validation, e=103:  94%|█████████▍| 32/34 [00:00<00:00, 150.63it/s]

cost=3538.182934, accuracy=0.919899



training,   e=104:  93%|█████████▎| 78/84 [00:01<00:00, 50.34it/s]
validation, e=104:  97%|█████████▋| 33/34 [00:00<00:00, 105.86it/s]

cost=3526.046291, accuracy=0.920000



training,   e=105:  93%|█████████▎| 78/84 [00:01<00:00, 50.32it/s]
validation, e=105:  94%|█████████▍| 32/34 [00:00<00:00, 150.77it/s]

cost=3514.036951, accuracy=0.920101



training,   e=106:  93%|█████████▎| 78/84 [00:01<00:00, 43.45it/s]
validation, e=106:  71%|███████   | 24/34 [00:00<00:00, 89.71it/s]

cost=3502.152240, accuracy=0.920556



training,   e=107:  95%|█████████▌| 80/84 [00:01<00:00, 47.21it/s]
validation, e=107:  94%|█████████▍| 32/34 [00:00<00:00, 150.32it/s]

cost=3490.389569, accuracy=0.920859



training,   e=108:  93%|█████████▎| 78/84 [00:01<00:00, 50.65it/s]
validation, e=108:  68%|██████▊   | 23/34 [00:00<00:00, 84.19it/s]

cost=3478.746435, accuracy=0.921111



training,   e=109:  99%|█████████▉| 83/84 [00:01<00:00, 48.67it/s]
validation, e=109:  71%|███████   | 24/34 [00:00<00:00, 89.67it/s]

cost=3467.220407, accuracy=0.921414



training,   e=110:  99%|█████████▉| 83/84 [00:01<00:00, 48.69it/s]
validation, e=110:  94%|█████████▍| 32/34 [00:00<00:00, 149.55it/s]

cost=3455.809136, accuracy=0.921616



training,   e=111:  95%|█████████▌| 80/84 [00:01<00:00, 46.45it/s]
validation, e=111:  94%|█████████▍| 32/34 [00:00<00:00, 147.91it/s]

cost=3444.510340, accuracy=0.921869



training,   e=112:  94%|█████████▍| 79/84 [00:01<00:00, 46.40it/s]
validation, e=112:  56%|█████▌    | 19/34 [00:00<00:00, 62.42it/s]

cost=3433.321810, accuracy=0.922020



training,   e=113:  98%|█████████▊| 82/84 [00:01<00:00, 50.69it/s]
validation, e=113:  94%|█████████▍| 32/34 [00:00<00:00, 151.03it/s]

cost=3422.241402, accuracy=0.922172



training,   e=114:  93%|█████████▎| 78/84 [00:01<00:00, 48.07it/s]
validation, e=114:  65%|██████▍   | 22/34 [00:00<00:00, 77.61it/s]

cost=3411.267035, accuracy=0.922374



training,   e=115:  93%|█████████▎| 78/84 [00:01<00:00, 50.73it/s]
validation, e=115:  94%|█████████▍| 32/34 [00:00<00:00, 150.25it/s]

cost=3400.396691, accuracy=0.922626



training,   e=116:  94%|█████████▍| 79/84 [00:01<00:00, 48.87it/s]
validation, e=116:  79%|███████▉  | 27/34 [00:00<00:00, 101.45it/s]

cost=3389.628411, accuracy=0.922929



training,   e=117:  99%|█████████▉| 83/84 [00:01<00:00, 50.76it/s]
validation, e=117:  94%|█████████▍| 32/34 [00:00<00:00, 147.76it/s]

cost=3378.960292, accuracy=0.923182



training,   e=118:  98%|█████████▊| 82/84 [00:01<00:00, 50.88it/s]
validation, e=118:  94%|█████████▍| 32/34 [00:00<00:00, 100.24it/s]

cost=3368.390485, accuracy=0.923434



training,   e=119:  93%|█████████▎| 78/84 [00:01<00:00, 50.60it/s]
validation, e=119:  71%|███████   | 24/34 [00:00<00:00, 89.89it/s] 

cost=3357.917195, accuracy=0.923384



training,   e=120:  93%|█████████▎| 78/84 [00:01<00:00, 43.47it/s]
validation, e=120:  94%|█████████▍| 32/34 [00:00<00:00, 151.52it/s]

cost=3347.538677, accuracy=0.923636



training,   e=121:  99%|█████████▉| 83/84 [00:01<00:00, 52.38it/s]
validation, e=121:  94%|█████████▍| 32/34 [00:00<00:00, 148.31it/s]

cost=3337.253234, accuracy=0.923788



training,   e=122:  93%|█████████▎| 78/84 [00:01<00:00, 49.66it/s]
validation, e=122:  94%|█████████▍| 32/34 [00:00<00:00, 151.30it/s]

cost=3327.059217, accuracy=0.923990



training,   e=123:  99%|█████████▉| 83/84 [00:01<00:00, 53.20it/s]
validation, e=123:  94%|█████████▍| 32/34 [00:00<00:00, 151.06it/s]

cost=3316.955022, accuracy=0.924141



training,   e=124:  93%|█████████▎| 78/84 [00:01<00:00, 50.06it/s]
validation, e=124:  94%|█████████▍| 32/34 [00:00<00:00, 151.36it/s]

cost=3306.939086, accuracy=0.924343



training,   e=125:  93%|█████████▎| 78/84 [00:01<00:00, 50.59it/s]
validation, e=125:  94%|█████████▍| 32/34 [00:00<00:00, 149.16it/s]

cost=3297.009891, accuracy=0.924596



training,   e=126:  93%|█████████▎| 78/84 [00:01<00:00, 51.08it/s]
validation, e=126:  94%|█████████▍| 32/34 [00:00<00:00, 149.10it/s]

cost=3287.165959, accuracy=0.924646



training,   e=127:  98%|█████████▊| 82/84 [00:01<00:00, 43.80it/s]
validation, e=127:  94%|█████████▍| 32/34 [00:00<00:00, 149.17it/s]

cost=3277.405850, accuracy=0.924899



training,   e=128:  98%|█████████▊| 82/84 [00:01<00:00, 45.84it/s]
validation, e=128:  94%|█████████▍| 32/34 [00:00<00:00, 147.86it/s]

cost=3267.728162, accuracy=0.925202



training,   e=129:  93%|█████████▎| 78/84 [00:01<00:00, 51.29it/s]
validation, e=129:  94%|█████████▍| 32/34 [00:00<00:00, 152.00it/s]

cost=3258.131528, accuracy=0.925455



training,   e=130:  98%|█████████▊| 82/84 [00:01<00:00, 49.67it/s]
validation, e=130:  94%|█████████▍| 32/34 [00:00<00:00, 148.88it/s]

cost=3248.614620, accuracy=0.925657



training,   e=131:  93%|█████████▎| 78/84 [00:01<00:00, 48.14it/s]
validation, e=131:  59%|█████▉    | 20/34 [00:00<00:00, 68.43it/s]

cost=3239.176139, accuracy=0.925859



training,   e=132:  98%|█████████▊| 82/84 [00:01<00:00, 46.31it/s]
validation, e=132:  94%|█████████▍| 32/34 [00:00<00:00, 151.41it/s]

cost=3229.814821, accuracy=0.926061



training,   e=133:  98%|█████████▊| 82/84 [00:01<00:00, 47.35it/s]
validation, e=133:  91%|█████████ | 31/34 [00:00<00:00, 142.31it/s]

cost=3220.529436, accuracy=0.926162



training,   e=134:  93%|█████████▎| 78/84 [00:01<00:00, 50.85it/s]
validation, e=134:  94%|█████████▍| 32/34 [00:00<00:00, 151.36it/s]

cost=3211.318779, accuracy=0.926212



training,   e=135:  93%|█████████▎| 78/84 [00:01<00:00, 50.55it/s]
validation, e=135:  94%|█████████▍| 32/34 [00:00<00:00, 149.27it/s]

cost=3202.181680, accuracy=0.926566



training,   e=136:  93%|█████████▎| 78/84 [00:01<00:00, 50.10it/s]
validation, e=136:  94%|█████████▍| 32/34 [00:00<00:00, 152.01it/s]

cost=3193.116994, accuracy=0.926717



training,   e=137:  96%|█████████▋| 81/84 [00:01<00:00, 49.38it/s]
validation, e=137:  94%|█████████▍| 32/34 [00:00<00:00, 147.65it/s]

cost=3184.123605, accuracy=0.927071



training,   e=138:  98%|█████████▊| 82/84 [00:01<00:00, 51.14it/s]
validation, e=138:  91%|█████████ | 31/34 [00:00<00:00, 142.87it/s]

cost=3175.200425, accuracy=0.927273



training,   e=139:  93%|█████████▎| 78/84 [00:01<00:00, 49.88it/s]
validation, e=139:  94%|█████████▍| 32/34 [00:00<00:00, 151.07it/s]

cost=3166.346388, accuracy=0.927323



training,   e=140:  93%|█████████▎| 78/84 [00:01<00:00, 50.68it/s]
validation, e=140:  94%|█████████▍| 32/34 [00:00<00:00, 150.09it/s]

cost=3157.560458, accuracy=0.927626



training,   e=141:  93%|█████████▎| 78/84 [00:01<00:00, 48.52it/s]
validation, e=141:  94%|█████████▍| 32/34 [00:00<00:00, 149.89it/s]

cost=3148.841621, accuracy=0.927929



training,   e=142:  93%|█████████▎| 78/84 [00:01<00:00, 50.29it/s]
validation, e=142:  94%|█████████▍| 32/34 [00:00<00:00, 150.45it/s]

cost=3140.188885, accuracy=0.927980



training,   e=143:  96%|█████████▋| 81/84 [00:01<00:00, 50.49it/s]
validation, e=143:  91%|█████████ | 31/34 [00:00<00:00, 143.40it/s]

cost=3131.601282, accuracy=0.928283



training,   e=144:  94%|█████████▍| 79/84 [00:01<00:00, 49.66it/s]
validation, e=144:  71%|███████   | 24/34 [00:00<00:00, 88.47it/s]

cost=3123.077869, accuracy=0.928535



training,   e=145:  98%|█████████▊| 82/84 [00:01<00:00, 51.38it/s]
validation, e=145:  65%|██████▍   | 22/34 [00:00<00:00, 78.70it/s]

cost=3114.617719, accuracy=0.928636



training,   e=146:  98%|█████████▊| 82/84 [00:01<00:00, 48.40it/s]
validation, e=146:  94%|█████████▍| 32/34 [00:00<00:00, 150.59it/s]

cost=3106.219931, accuracy=0.928737



training,   e=147:  98%|█████████▊| 82/84 [00:01<00:00, 47.63it/s]
validation, e=147:  68%|██████▊   | 23/34 [00:00<00:00, 83.81it/s]

cost=3097.883621, accuracy=0.928838



training,   e=148:  94%|█████████▍| 79/84 [00:01<00:00, 44.40it/s]
validation, e=148:  71%|███████   | 24/34 [00:00<00:00, 87.09it/s]

cost=3089.607925, accuracy=0.928889



training,   e=149:  98%|█████████▊| 82/84 [00:02<00:00, 31.50it/s]
validation, e=149:  71%|███████   | 24/34 [00:00<00:00, 89.77it/s]

cost=3081.392000, accuracy=0.928939



