### Forward Propagation

In [1]:
import numpy as np
def feed_forward(inputs, outputs, weights):
    # Matrix multiplication of inputs and weights       
    pre_hidden = np.dot(inputs,weights[0])+ weights[1]
    # Pass through activation function
    hidden = 1/(1+np.exp(-pre_hidden))
    # Output layer
    pred_out = np.dot(hidden, weights[2]) + weights[3]
    # Return error MSE
    mean_squared_error = np.mean(np.square(pred_out - outputs))
    return mean_squared_error

### Activation function Example

In [3]:
def tanh(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

In [4]:
def relu(x):
    return np.where(x>0, x, 0)

In [5]:
def linear(x):
    return x

In [6]:
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x))

### Loss Function

<b> Mean squared error: </b> The mean squared error is the squared difference between the actual and the predicted values of the output. We take a square of the error, as the error can be positive or negative (when the predicted value is greater than the actual value and vice versa). Squaring ensures that positive and negative errors do not offset each other. We calculate the mean of the squared error so that the error over two different datasets is comparable when the datasets are not of the same size.

The mean squared error is typically used when trying to predict a value that
is continuous in nature.

In [7]:
def mse(p, y):
    return np.mean((p-y)**2)

<b>Mean absolute error:</b> The mean absolute error works in a manner that is
very similar to the mean squared error. The mean absolute error ensures
that positive and negative errors do not offset each other by taking an
average of the absolute difference between the actual and predicted values
across all data points.

Similar to the mean squared error, the mean absolute error is generally
employed on continuous variables. Further, in general, it is preferable to
have a mean absolute error as a loss function when the outputs to predict
have a value less than 1, as the mean squared error would reduce the
magnitude of loss considerably (the square of a number between 1 and -1 is
an even smaller number) when the expected output is less than 1.

In [None]:
def mae(p, y):
    return np.mean(np.abs(p-y))

<b>Binary cross-entropy: </b>Cross-entropy is a measure of the difference between
two different distributions: actual and predicted. Binary cross-entropy is
applied to binary output data, unlike the previous two loss functions that
we discussed (which are applied during continuous variable prediction).

Note that binary cross-entropy loss has a high value when the predicted
value is far away from the actual value and a low value when the predicted
and actual values are close.

In [8]:
def binary_cross_entropy(p, y):
    return -np.mean((y * np.log(p) + (1 - y) * np.log(1 - p)))

<b>Categorical cross-entropy:</b> Categorical cross-entropy between an array of
predicted values ( p ) and an array of actual values ( y ) is implemented as
follows:

In [11]:
def categorical_cross_entropy(p, y):
    return -np.mean(np.log(p[np.arange(len(y)), y]))