# Neural Networks

This material is based on the Machine learning class in https://github.com/jyurko/INFSCI_2595_Fall_2019

A neural network is a series of functional transformations, where the ouput is modeled via a set of unobserved variables called hidden units.

The hidden units are linear combinations of inputs that are transformed into a non-linear function.

Hidden units:

$$\beta_{0,1} + \sum_{d=1}^{D}x_d\beta_{d,1}$$

pass through a non-linear function $g$

$$h_1 = g(\beta_{0,1} + \sum_{d=1}^{D}x_d\beta_{d,1})$$

where D is the number of inputs ($x_1, x_2, ..., x_D$). The parameters $\beta_{d,h}$ are also known as the weights, and $\beta_{0,h}$ as the bias of each hidden unit $h$, also commonly called as neurons.

The output $f$ is a linear combination of the hidden units

$$f(\mathbf{x}) = \alpha_0 + \sum_{k=1}^{K} \alpha_kh_k$$

The design matrix $\mathbf{X}$ is of size $N \times (D+1)$, where N is the number of training points. Each column is a variable plus the intercept column of ones.

The linear combination of the inputs for the k-th hidden unit:

$$\eta_k = \mathbf{X}\boldsymbol\beta_k$$

The non-linear transformation function is also called activation function. Common functions might be the relu function, or the sigmoid function. We will use the logistic function, which is a type of sigmoid:

$$g(u) = \frac{exp(u)}{exp(u)+1} = logit^{-1}(u)$$

The k-th hidden unit can be written as

$$h_k = logit^{-1}(\eta_k)$$

and in matrix form:

$$\mathbf{H} = logit^{-1}(\mathbf{X}_{(N \times (D+1))}\mathbf{B}_{((D+1) \times H)})$$

And the output layer:

$$\mathbf{f} = \alpha_0 + H_{(H \times 1)} \boldsymbol \alpha_{(1 \times H)}$$

# Creating functions

In [60]:
import numpy as np

class Layer():
    """
        class characterizing a layer and its processes
    """

    def __init__(self, input_size, output_size):
        self.weights = np.random.rand(input_size, output_size)
        self.bias = np.random.rand(1, output_size)
        self.input = None
        self.output = None
        self.eta = None

    def activation(x):
        g = np.exp(x) / (np.exp(x) + 1)
        return g

    def forward_propagation(self, input_data):
        self.input = input_data
        self.eta = np.dot(self.input, self.weights) + self.bias
        self.output = activation(self.eta)
        return self.output

    def backward_propagation(self, output_error, learning_rate):
        self.step_size = output_error * learning_rate
        self.weights = self.weights - self.step_size
        self.bias = self.bias - self.step_size
        return self.weights, self.bias

In [62]:
np.random.seed(123)
dat = np.random.rand(1,4)
y = np.square(dat)

H = Layer(input_size=4, output_size=4)

output = Layer.forward_propagation(H, input_data=dat)

error = np.mean(np.square(y - output))
de_dy = (2/4)*(y - output)

new_weight, new_bias = Layer.backward_propagation(H, output_error=de_dy, learning_rate=0.1)

In [63]:
new_weight

array([[0.73589269, 0.45885402, 1.02162361, 0.71227537],
       [0.49735563, 0.42786508, 0.38403743, 0.75649534],
       [0.45499597, 0.09542546, 0.43890367, 0.76544104],
       [0.19891546, 0.21119932, 0.57241079, 0.55927322]])

In [64]:
new_bias

array([[0.65082468, 0.88517936, 0.76531474, 0.63846914]])

In [66]:
H.weights = new_weight
H.bias = new_bias

output2 = Layer.forward_propagation(H, input_data=output)
output2

array([[0.9012055 , 0.86557953, 0.94113844, 0.95085632]])

In [67]:
error2 = np.mean(np.square(y - output2))
de_dy2 = (2/4)*(y - output2)
de_dy2

array([[-0.20806809, -0.39185191, -0.44483843, -0.32345417]])

In [68]:
de_dy - de_dy2

array([[0.04383084, 0.03437626, 0.03624431, 0.04899785]])