In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import nn_solution as nn
import toolbox as tb
from toolbox import Tensor

## Load Data

Download the following dataset: https://www.kaggle.com/brynja/wineuci
It's the wine data again but instead of wine quality, the wine is separated into three wine types (unnamed though). 
We will train a classification network for wine types today. 

We will also use the implementation you did for Exercise 3. I provided the example solution but use yours if you did this exercise, its more fun! :) 


In [2]:
data = np.loadtxt('./archive/Wine.csv', delimiter=',')
# first look at the data (here or in excel etc)
# What is something you should change in this data loading code before training? 
# TODO
np.random.shuffle(data) # shuffles along the first axis
print(data.shape)

x = data[:, 1:]
y = data[:, 0]
y = np.array([(int(i) - 1) for i in y])

x = x / x.max(axis=0)
x = np.transpose(x)

print(x.shape)

x_train = x[:, 1:150]
x_val = x[:, 150:170]
x_test = x[:, 170:]
y_train = y[1:150]
y_val = y[150:170]
y_test = y[170:]

print(x_train.shape)
print(y_train.shape)

(178, 14)
(13, 178)
(13, 149)
(149,)


## Network Architecture

We will use a very simple classification network, a bit similar to the second lecture. But since you have all backward operations already, feel free to stack more layers and combinations when the basics work! 

For our training examples $(x,y), x \in \mathbb{R}^13, y \in \{ 1, 2, 3 \}$ we want to implement the following function as a network:

$$ f(x) = \text{cross-entropy}(W x + b) $$

What are the modules/layers needed here?

All modules are already implemented from Exercise 3. Implement a new class ClassificationNetwork in nn.py that has the architecture hardcoded (instead of using add_layer, as we did in the last exercise session). Check back in the slides of "Linear classification" for details.

In [3]:
# build network structure

network = nn.ClassificationNetwork()
#network = nn.Network()
#network.add_layer(nn.LinearLayer(np.array([[1., 2.], [3., 4.], [5., 6.]]), \
#                 np.array([[1.], [2.], [3.]])))
loss = nn.CrossEntropyLoss()

def loss_on_dataset(dataset, targets):
    loss_value = 0
    for i in range(len(targets)):
        data_element = Tensor(np.expand_dims(dataset[:,i],axis=1))
        l = loss.forward(network.forward(data_element), targets[i])
        loss_value += l.data
    loss_value = loss_value/len(targets)
    return loss_value
        


In [4]:
# run once for testing
data = Tensor(np.expand_dims(x_train[:,0],axis=1))
target = y_train[0]

prediction = network.forward(data)

l = loss.forward(prediction, target)
print(l.data)

training_loss = loss_on_dataset(x_train, y_train)

print("Average training loss at initialization: ", training_loss)

[1.78626402]
Average training loss at initialization:  [1.16392694]


## Training the Network

Next, we will train the network using our training examples. Implement Gradient Descent. If you think thats too easy, add the stochastic part and some momentum. 

Since we actually see the training data and actively change the output of the network to fit these, the loss on the training data is normally pretty good. That is why we monitor our training success using a validation set that is not used for updating weights. 

It is unlikely that you will get a loss of zero anywhere. 

In [11]:
learning_rate = 1e-2

# you can run this block multiple times to improve results, and play around with decreasing the learning rate later on
for i in range(1000):
    # Calculate Gradient using backwards operations and update parameters
    # Don't forget to zero the gradient afterwards! 
    
    network.zero_grad()
    
    element = np.random.randint(len(y_train), size=1)
    element = element[0]
    
    data = Tensor(np.expand_dims(x_train[:,element], axis=1))
    target = y_train[element]
    
    prediction = network.forward(data)
    l = loss.forward(prediction, target)
    
    l.backward()
    
    network.linear.W = Tensor(network.linear.W.data - learning_rate * network.linear.W.grad, requires_grad=True)
    network.linear.b = Tensor(network.linear.b.data - learning_rate * network.linear.b.grad, requires_grad=True)
    
    # Print results
    training_loss = loss_on_dataset(x_train, y_train)
    validation_loss = loss_on_dataset(x_val, y_val)

    print("Iteration: ", i, " Training loss: ", training_loss, " Validation loss: ", validation_loss)

Iteration:  0  Training loss:  [0.49841909]  Validation loss:  [0.48072646]
Iteration:  1  Training loss:  [0.49872245]  Validation loss:  [0.48051493]
Iteration:  2  Training loss:  [0.49816563]  Validation loss:  [0.48245953]
Iteration:  3  Training loss:  [0.49835724]  Validation loss:  [0.48244622]
Iteration:  4  Training loss:  [0.49773522]  Validation loss:  [0.48460575]
Iteration:  5  Training loss:  [0.49776]  Validation loss:  [0.48222953]
Iteration:  6  Training loss:  [0.4978184]  Validation loss:  [0.48000264]
Iteration:  7  Training loss:  [0.49731499]  Validation loss:  [0.48277411]
Iteration:  8  Training loss:  [0.49741099]  Validation loss:  [0.48046945]
Iteration:  9  Training loss:  [0.49714999]  Validation loss:  [0.48196879]
Iteration:  10  Training loss:  [0.49729668]  Validation loss:  [0.47963689]
Iteration:  11  Training loss:  [0.49720179]  Validation loss:  [0.48192142]
Iteration:  12  Training loss:  [0.49724727]  Validation loss:  [0.48407639]
Iteration:  1

Iteration:  121  Training loss:  [0.48952498]  Validation loss:  [0.48206274]
Iteration:  122  Training loss:  [0.48946554]  Validation loss:  [0.47767316]
Iteration:  123  Training loss:  [0.48919291]  Validation loss:  [0.48031934]
Iteration:  124  Training loss:  [0.48905105]  Validation loss:  [0.48194333]
Iteration:  125  Training loss:  [0.48903454]  Validation loss:  [0.48344612]
Iteration:  126  Training loss:  [0.48895103]  Validation loss:  [0.48304854]
Iteration:  127  Training loss:  [0.48916575]  Validation loss:  [0.48565497]
Iteration:  128  Training loss:  [0.48894328]  Validation loss:  [0.48322948]
Iteration:  129  Training loss:  [0.48877112]  Validation loss:  [0.48024011]
Iteration:  130  Training loss:  [0.48867959]  Validation loss:  [0.4776106]
Iteration:  131  Training loss:  [0.48841413]  Validation loss:  [0.47615741]
Iteration:  132  Training loss:  [0.48840273]  Validation loss:  [0.47389268]
Iteration:  133  Training loss:  [0.48855444]  Validation loss:  

Iteration:  229  Training loss:  [0.48321023]  Validation loss:  [0.47103888]
Iteration:  230  Training loss:  [0.48219221]  Validation loss:  [0.46902153]
Iteration:  231  Training loss:  [0.48213531]  Validation loss:  [0.46651326]
Iteration:  232  Training loss:  [0.48213282]  Validation loss:  [0.4648134]
Iteration:  233  Training loss:  [0.48235295]  Validation loss:  [0.46118032]
Iteration:  234  Training loss:  [0.48241313]  Validation loss:  [0.4628464]
Iteration:  235  Training loss:  [0.48309147]  Validation loss:  [0.46590284]
Iteration:  236  Training loss:  [0.48243405]  Validation loss:  [0.46466384]
Iteration:  237  Training loss:  [0.48266185]  Validation loss:  [0.46649694]
Iteration:  238  Training loss:  [0.48347438]  Validation loss:  [0.46927061]
Iteration:  239  Training loss:  [0.48479306]  Validation loss:  [0.47276835]
Iteration:  240  Training loss:  [0.48390608]  Validation loss:  [0.47160374]
Iteration:  241  Training loss:  [0.48245195]  Validation loss:  [

Iteration:  336  Training loss:  [0.47461339]  Validation loss:  [0.45350396]
Iteration:  337  Training loss:  [0.47445652]  Validation loss:  [0.45499384]
Iteration:  338  Training loss:  [0.47435486]  Validation loss:  [0.45418881]
Iteration:  339  Training loss:  [0.47478661]  Validation loss:  [0.4536643]
Iteration:  340  Training loss:  [0.47572816]  Validation loss:  [0.45360317]
Iteration:  341  Training loss:  [0.47637677]  Validation loss:  [0.453427]
Iteration:  342  Training loss:  [0.47634791]  Validation loss:  [0.45164465]
Iteration:  343  Training loss:  [0.47741545]  Validation loss:  [0.45199543]
Iteration:  344  Training loss:  [0.47783578]  Validation loss:  [0.45209707]
Iteration:  345  Training loss:  [0.47791172]  Validation loss:  [0.44981362]
Iteration:  346  Training loss:  [0.47816349]  Validation loss:  [0.44763942]
Iteration:  347  Training loss:  [0.47690045]  Validation loss:  [0.44876641]
Iteration:  348  Training loss:  [0.47819966]  Validation loss:  [0

Iteration:  448  Training loss:  [0.46746883]  Validation loss:  [0.45547374]
Iteration:  449  Training loss:  [0.46756367]  Validation loss:  [0.45247895]
Iteration:  450  Training loss:  [0.46778882]  Validation loss:  [0.45461857]
Iteration:  451  Training loss:  [0.46771243]  Validation loss:  [0.45301101]
Iteration:  452  Training loss:  [0.46767745]  Validation loss:  [0.45036501]
Iteration:  453  Training loss:  [0.46796597]  Validation loss:  [0.45211981]
Iteration:  454  Training loss:  [0.4672218]  Validation loss:  [0.4509485]
Iteration:  455  Training loss:  [0.46747658]  Validation loss:  [0.45262297]
Iteration:  456  Training loss:  [0.46797167]  Validation loss:  [0.45525111]
Iteration:  457  Training loss:  [0.46856319]  Validation loss:  [0.45708644]
Iteration:  458  Training loss:  [0.46847923]  Validation loss:  [0.45488076]
Iteration:  459  Training loss:  [0.46829148]  Validation loss:  [0.45174754]
Iteration:  460  Training loss:  [0.46835469]  Validation loss:  [

Iteration:  555  Training loss:  [0.46930741]  Validation loss:  [0.46742622]
Iteration:  556  Training loss:  [0.46764796]  Validation loss:  [0.46522564]
Iteration:  557  Training loss:  [0.46912994]  Validation loss:  [0.46918798]
Iteration:  558  Training loss:  [0.46801861]  Validation loss:  [0.46780273]
Iteration:  559  Training loss:  [0.4691477]  Validation loss:  [0.47003751]
Iteration:  560  Training loss:  [0.47122379]  Validation loss:  [0.47340073]
Iteration:  561  Training loss:  [0.4733099]  Validation loss:  [0.47841007]
Iteration:  562  Training loss:  [0.4700146]  Validation loss:  [0.47398998]
Iteration:  563  Training loss:  [0.46839806]  Validation loss:  [0.47185799]
Iteration:  564  Training loss:  [0.47082918]  Validation loss:  [0.47597683]
Iteration:  565  Training loss:  [0.47239199]  Validation loss:  [0.47908307]
Iteration:  566  Training loss:  [0.47393235]  Validation loss:  [0.48188698]
Iteration:  567  Training loss:  [0.47589125]  Validation loss:  [0

Iteration:  670  Training loss:  [0.45336027]  Validation loss:  [0.44696487]
Iteration:  671  Training loss:  [0.45304025]  Validation loss:  [0.44463453]
Iteration:  672  Training loss:  [0.45269856]  Validation loss:  [0.44381741]
Iteration:  673  Training loss:  [0.45287929]  Validation loss:  [0.44534953]
Iteration:  674  Training loss:  [0.45251675]  Validation loss:  [0.44225147]
Iteration:  675  Training loss:  [0.45270293]  Validation loss:  [0.44342088]
Iteration:  676  Training loss:  [0.45236239]  Validation loss:  [0.44241218]
Iteration:  677  Training loss:  [0.45268402]  Validation loss:  [0.44443544]
Iteration:  678  Training loss:  [0.45245966]  Validation loss:  [0.44246801]
Iteration:  679  Training loss:  [0.45308129]  Validation loss:  [0.44490651]
Iteration:  680  Training loss:  [0.45257606]  Validation loss:  [0.44400568]
Iteration:  681  Training loss:  [0.45207869]  Validation loss:  [0.44233051]
Iteration:  682  Training loss:  [0.45191451]  Validation loss: 

Iteration:  788  Training loss:  [0.44627603]  Validation loss:  [0.43153976]
Iteration:  789  Training loss:  [0.44627143]  Validation loss:  [0.43598182]
Iteration:  790  Training loss:  [0.44643536]  Validation loss:  [0.43951594]
Iteration:  791  Training loss:  [0.44648317]  Validation loss:  [0.4408877]
Iteration:  792  Training loss:  [0.44660855]  Validation loss:  [0.44237568]
Iteration:  793  Training loss:  [0.44655142]  Validation loss:  [0.44147041]
Iteration:  794  Training loss:  [0.44660896]  Validation loss:  [0.44279792]
Iteration:  795  Training loss:  [0.4458912]  Validation loss:  [0.43776253]
Iteration:  796  Training loss:  [0.44595408]  Validation loss:  [0.43920751]
Iteration:  797  Training loss:  [0.44591492]  Validation loss:  [0.43888329]
Iteration:  798  Training loss:  [0.44598285]  Validation loss:  [0.44022557]
Iteration:  799  Training loss:  [0.44628829]  Validation loss:  [0.44247252]
Iteration:  800  Training loss:  [0.44610345]  Validation loss:  [

Iteration:  904  Training loss:  [0.43969655]  Validation loss:  [0.41182373]
Iteration:  905  Training loss:  [0.43946053]  Validation loss:  [0.4129783]
Iteration:  906  Training loss:  [0.43930448]  Validation loss:  [0.41463035]
Iteration:  907  Training loss:  [0.43966803]  Validation loss:  [0.41161584]
Iteration:  908  Training loss:  [0.43940929]  Validation loss:  [0.41575037]
Iteration:  909  Training loss:  [0.43926635]  Validation loss:  [0.41547996]
Iteration:  910  Training loss:  [0.43934786]  Validation loss:  [0.41348581]
Iteration:  911  Training loss:  [0.43974488]  Validation loss:  [0.41278803]
Iteration:  912  Training loss:  [0.44013398]  Validation loss:  [0.41257174]
Iteration:  913  Training loss:  [0.44021949]  Validation loss:  [0.41112165]
Iteration:  914  Training loss:  [0.44178652]  Validation loss:  [0.41103031]
Iteration:  915  Training loss:  [0.44238604]  Validation loss:  [0.41109545]
Iteration:  916  Training loss:  [0.44161378]  Validation loss:  

## Evaluating on the Test Set

After you are happy with your validation loss, you can test your model on the test data. In reality you are not supposed to retrain or further refine your model after looking at the test data. In practice, many people do it anyway. However, this is extremely bad practice, and means your test data is validation data only. 

In [12]:
test_loss = loss_on_dataset(x_test, y_test)
print("Test loss: ", test_loss)

Test loss:  [0.39556643]


Since this dataset is rather easy, there will probably not be much difference in training/validation/test performance. And because we shuffle the data in the beginning, the exact results on each set will vary a little bit if you restart.