# Excercise 1

1. What is overfitting?

Overfitting is the situation when a model tries too hard to fit the training data or even starts to memorize it, thus losing the ability to generalise. Performance on training data is still good, but performance on test data drops
2. Explain how we can determine the best number of training iterations to avoid overfitting.

We can split our dataset into three sets instead of two - the training, validation and testing set. After each iteration (or epoch), we test the model on the validation set. When training is finished, we pick the model with the best result on the validation set, and test it on the testing set to obtain actual results (validation results could be biased, because we could have simply chosen the model that best reflects the validation set)
3. What are the methods used to prevent a neural network from overfitting? 

Other than picking the model based on validation results:
* Using dropout to randomly switch off some neurons
* Use a regularization technique to prevent the model from getting too complex
* Simplify the model manually - reduce number of neurons and/or hidden layers

These techniques could help stop the model from memorizing the training data

4. Is it possible to represent a XOR Boolean function with a single layer perceptron? Why/Why not? 

It's not possible, because XOR function is not linearly-separable 
5. What is the advantage of multi-layer neural networks over single layer neural networks?

Multi-layer networks can learn arbitrarily complex functions, thanks to combining functions learned by the previous layer. Sigle layer networks can only learn linear separable functions

# Excercise 2

Generating input samples for each function

In [7]:
import numpy as np
from itertools import product

In [41]:
outputs = {}
inputs = list(product([False, True], repeat=2))
outputs['and'] = np.array([x and y for x,y in inputs], dtype=float)
outputs['or'] = np.array([x or y for x,y in inputs], dtype=float)
outputs['nor'] = np.array([not (x or y) for x,y in inputs], dtype=float)
inputs = np.array(inputs, dtype=float)

Calculating the gradient by hand

loss = (output - expected) ^ 2 =

= (X * W + b  - expected) ^ 2

dW/dloss:

2(X * W + b - expected) * X

db/dloss:

2(X * W + b - expected)


In [64]:
class Perceptron:
    def __init__(self, input_size=2):
        self.input_size = input_size
        self.weigths = np.random.rand(input_size)
        self.bias = np.random.rand(1)
    
    def forward(self, X: np.array) -> np.array:
        return np.dot(X, self.weigths) + self.bias
    
    def train(self, X: np.array, y: np.array, epochs: int, learning_rate: float) -> None:
        for epoch in range(epochs):
            print('a-\n\nEpoch:', epoch, '\n\n--------------------')
            for inp, expected in zip(X, y):
                output = self.forward(inp)
                loss = (output - expected) ** 2
                print('Loss:', loss.item())
                W_grad = 2 * (output - expected) * inp
                b_grad = 2 * (output - expected)
                self.weigths -= W_grad * learning_rate
                self.bias -= b_grad * learning_rate
    
    def predict(self, X: np.array, threshold=.5) -> bool:
        return self.forward(X).item() > threshold

Training the models

In [71]:
models = {}
for function, out in outputs.items():
    print('------------------------\n\n', function, '\n\n--------------------------')
    model = Perceptron()
    model.train(inputs, out, epochs=25, learning_rate=.05)
    models[function] = model

------------------------

 and 

--------------------------
---------------

Epoch: 0 

--------------------
Loss: 0.3721609037310162
Loss: 0.3240473198262275
Loss: 0.3621477052696172
Loss: 0.3083457220114267
---------------

Epoch: 1 

--------------------
Loss: 0.23762695200629422
Loss: 0.2093367235987494
Loss: 0.24799117346109292
Loss: 0.3950792314532717
---------------

Epoch: 2 

--------------------
Loss: 0.1648573082258114
Loss: 0.16107055540127674
Loss: 0.1965717552458642
Loss: 0.42188914641743486
---------------

Epoch: 3 

--------------------
Loss: 0.11965137680737786
Loss: 0.13841948037270135
Loss: 0.17040538138113417
Loss: 0.4176151910276388
---------------

Epoch: 4 

--------------------
Loss: 0.08847896477338583
Loss: 0.12663552397279892
Loss: 0.1553593468003593
Loss: 0.3995637551726372
---------------

Epoch: 5 

--------------------
Loss: 0.06549414823697795
Loss: 0.11978640049559461
Loss: 0.14557682742137404
Loss: 0.37650326743983953
---------------

Epoch: 6 

-----

Testing the models

In [74]:
for function, model in models.items():
    print(function)
    for inp in inputs:
        print(inp, model.predict(inp), model.forward(inp).item())

and
[0. 0.] False -0.14306237665075458
[0. 1.] False 0.2810046663522525
[1. 0.] False 0.31654991759843887
[1. 1.] True 0.7406169606014459
or
[0. 0.] False 0.3247140293650548
[0. 1.] True 0.7530823573954817
[1. 0.] True 0.7431239285141116
[1. 1.] True 1.1714922565445387
nor
[0. 0.] True 0.5651669640126383
[0. 1.] False 0.21781740870969946
[1. 0.] False 0.2287803584978617
[1. 1.] False -0.11856919680507716
