# Neural network regression

In [1]:
import numpy as np
np.set_printoptions(suppress=True)
import math
import matplotlib.pyplot as plt

from neural_network import Layer 

### Build the network

In [2]:
# 2 numbers as input
l_input = Layer(2)

In [3]:
# one hidden layer with 4 neurons using ReLU activation function
hidden_1 = Layer(4,previous_layer=l_input,activation='ReLU')

In [4]:
# output layer with 6 outputs (see below) and no activation function
    # the network should output the desired operations as specified below
l_output = Layer(6,previous_layer=hidden_1,activation='none')

### Data and training

**This demonstration will generate data and train the network simultaneously.**
**The network will be trained to take 2 numbers (smaller numbers to avoid divergence during training) and return the numbers, their sum, the difference between their values, their multiplication, and some linear transformation of them**

In [5]:
# training
for i in range(1000000):
    # randomly generate 2 input numbers
    x = np.random.normal(0,2,size=2)
    # calculate the desired arithemetic operations
    y = np.array([x[0],x[1],x[0]+x[1],x[0]-x[1],x[0]*x[1],3*x[0]-1.6*x[1]])
    # feed input through network (with the first layer)
    l_input.feed_forward(x)
    # back propagate and train network (with the last layer)
    l_output.back_propagate(y,learning_rate=0.001)

### Check error

**Using one random point for simplicity**

In [5]:
x = np.random.normal(0,1,size=2)
y = np.array([x[0],x[1],x[0]+x[1],x[0]-x[1],x[0]*x[1],3*x[0]-1.6*x[1]])
(l_input.feed_forward(x) - y).mean()

-0.052521841634607436

Keep training until error is small enough<br><br>

### Explore and understand the network

Using simple input for convenience

In [6]:
x = np.array([3,2])
# the expected y would be
y = np.array([3,2,5,1,6,5.8])
# what the network actually gives
l_input.feed_forward(x)

array([3.01231204, 2.1147405 , 5.12705254, 0.89757154, 6.14772166,
       5.65335131])

In [7]:
print(np.round(np.concatenate((y,l_input.feed_forward(x))).reshape(2,6),2))

[[3.   2.   5.   1.   6.   5.8 ]
 [3.01 2.11 5.13 0.9  6.15 5.65]]


Close enough<br>

Now, let's see if we can understand what the network is doing

In [8]:
l_input.weights

array([[-1.25140598,  1.21598337, -1.26341707,  1.25633488],
       [-1.2935384 ,  1.27645959,  1.28083919, -1.30006289]])

In [9]:
l_input.bias

array([-0.21796951, -0.04613101, -0.03085759, -0.02845642])

In [10]:
hidden_1.bias

array([-0.0298016 , -0.02095525, -0.05075685, -0.00884635,  0.11241052,
       -0.05587641])

Note: the biases are fairly small and can be ignored for a simple interpretation

In [11]:
hidden_1.weights[:,0]

array([-0.40829963,  0.42108915, -0.39750749,  0.39495901])

In [12]:
hidden_1.weights[:,1]

array([-0.40710125,  0.41862725,  0.38573531, -0.38656324])

As a first approximation, we can see that to reproduce input (which is also outputs 0 and 1) the network takes 2 neurons for positive input, **or** 2 neurons for negative input (because of the ReLU), multiplies each by ~1.25 and then again by ~0.4, producing 0.5, twice - and totaling on output 0 back to 1 (times the input).
<br>
Notice how the +'s and -'s are one shifted between input 0 and 1 so that they don't add together for outputs 0 and 1
<br>

Next, let's try to track the sum

In [13]:
hidden_1.weights[:,2]

array([-0.81540088,  0.8397164 , -0.01177218,  0.00839577])

Here, it is using the same principle, with the first weight if both are positive and the second weight if both are negative (adding their absolute values) and multiplying by ~0.8 instead of twice by ~0.4. The last two weights are very small because their input is the values where the +'s and -'s do not build up (see above)
<br>
We can only expect the difference in value to be an inverted image of this

In [14]:
hidden_1.weights[:,3]

array([-0.00119837,  0.00246191, -0.78324281,  0.78152224])

Indeed<br>

<br>
Let's skip the multiplication for now and look at the weighted sum, with 3 and -1.6 as weights.

In [15]:
hidden_1.weights[:,5]

array([-0.57353687,  0.59346387, -1.80969899,  1.8033782 ])

If we recall that the first input is represented by ~1.25 on neurons 1 and 3, then multiplied by ~0.6 and ~1.8 we get 3 ($1.25\times0.6 + 1.25\times1.8$). input 2 however is represented by ~1.25 on neuron 1 but -1.25 on neuron 3 (which is corrected by a negative weight that sends input to output 1, see above).<br>
Thus, we get $1.25\times0.6 - 1.25\times1.8 = -1.5$, close enough  

And finally, the multiplication:

In [16]:
hidden_1.weights[:,4]

array([ 1.30378528,  1.207875  , -1.22731348, -1.22660108])

Actually, if we look carefully, the multiplication doesn't really work. The close approximation it gives is 3 times input 2, which happens to be the correct answer in our example, but just be chance.

In [17]:
l_input.feed_forward((4,1))[4]

2.9389936740445544

In [18]:
l_input.feed_forward((1,0.33))[4]

1.0543614181174115

**Important note**: All this analysis is a first approximation, under certain constraints that were not explicity taken into consideration (e.g. input 0 > input 1, and both inputs being positive).

### Conclusion

This simple network performs linear operations well, but doesn't perform very well on non-linear operations. This is most likely because it is too simple (not enough neurons or layers) - however, it would have been impossible to analyze manually the results a much more complex network.<br>
All in all, I think this demonstration is instructive and helpful.