# Artificial Neural Network

(a type of deep learning)

In a human brain, neurons communicate by sending signals to each other through complex connections known as networks.

An ANN is based on the same principle to simulate the learning process of a human brain by using complex algorithms.

The most common ANN structure consists of 3 components: <br>
(1) An input layer; <br>
(2) One or more hidden layers; and <br>
(3) An output layer.

Building an artificial neural network system is simply setting up these 3 layers in 3 steps.

## Step 1: Define Input Layers (Neuron = Input + Weight)

An input layer is simply our regression, consisting of **inputs**/signals/independent variables {$x_{1,t}, x_{2,t}$, ...} and their corresponding **weights**/parameters/coefficients {$\beta_1, \beta_2$, ...}.


Together, they form a **neuron/regression**:
$\beta_1x_{1,t} + \beta_2x_{2,t} + ... $

Positive weights activate the neuron (= increase the dependent variable’s values) while negative weights inhibit it (= decrease the dependent variable’s values). The net signal is sent to the hidden layer for processing.

## Step 2: Set up Hidden Layers (Transfer function + Bias)

A hidden layer consists of a **transfer function** squashing the unbounded net signal between -$\infty$ and $\infty$ into a bounded output inside (0, 1) or (–1, 1).

A transfer function $f(x)$ acts as a **filter** such that when the net signal reaches a critical point, it will be transferred to a different state of outcome. 

In practice, we usually use a **sigmoid** (S-shaped) function as the transfer function. A common sigmoid function is the **logistic function**.

In neural networks, $\beta_0$ is not called intercept but **bias**.

Its role is the same as in a regression to shift the neurons so that they centre around the mean signals.

If we forecast a variable taking positive and negative values (e.g. returns), the logistic function is not appropriate (why? - because the bound is (0,1)). An alternative sigmoid function squashes signals inside (–1, 1) is **hyperbolic tangent (tanh)**.

Once a transfer function has been selected (say using tanh), we need to determine the number of squashers and the number of layers in our network.

The more transfer functions and layers we add to the network, the more accurate our forecasts will be. However, like ARIMA modelling, adding too many squashers and layers can result in **over-fitting**, leading to bad out-of-sample forecasts.

Unlike ARIMA modelling, there are no ‘rules’ to determine the number of squashers and layers. Typically, they are selected by experiment.


## Step 3: Mapping Processed Signals into Output Layer

An output layer consists of a target/dependent variable yt.

Recall that the squashers squeeze the unbounded signals ($–\infty, \infty$) into bounded processed signals inside (–1, 1). The architecture/specification of sending the processed signals into the output layer depends on if we want our outputs/forecasts to be bounded or unbounded.


## Methods to Improve the forecast performance of an ANN

The two ANNs in this example is far too simple. We can substantially improve the forecast performance by:
- Adding more **neurons** (= adding variables like bond yields, USD swap rates)
- Increasing the network’s memory (= including more lags)
- **Training the network harder** (= **adding more hidden layers**)

While a complex ANN structure can theoretically make it an extremely powerful forecasting machine, empirically it is subject to the following two limitations:
- Finding a good set of starting values become increasingly difficult when the complexity increases.
- In a noisy environment like the capital market, a complex ANN model can over-react (= **over-fit**), resulting in poor financial decisions.


https://realpython.com/python-ai-neural-network/

In [1]:
input_vector = [1.72, 1.23]
weights_1 = [1.26, 0]
weights_2 = [2.17, 0.32]

In [3]:
# Computing the dot product of input_vector and weights_1
first_indexes_mult = input_vector[0]*weights_1[0]
second_indexes_mult = input_vector[1]*weights_1[1]
dot_product_0 = first_indexes_mult + second_indexes_mult
print(f"The dot product is:{dot_product_0}")

The dot product is:2.1672


In [1]:
import numpy as np

In [5]:
dot_product_1 = np.dot(input_vector, weights_1)
print(f"The dot product is:{dot_product_1}")

The dot product is:2.1672


In [6]:
dot_product_2 = np.dot(input_vector, weights_2)
print(f"The dot product is:{dot_product_2}")

The dot product is:4.1259999999999994


In [18]:
input_vector = np.array([1.66, 1.56])
weights_1 = np.array([1.45, -0.66])
bias = np.array([0.0])

In [14]:
def sigmoid(x):
    return 1/ (1 + np.exp(-x))

In [15]:
def make_prediction(input_vector, weights, bias):
    layer_1 = np.dot(input_vector, weights) + bias
    layer_2 = sigmoid(layer_1)
    return layer_2

In [16]:
prediction = make_prediction(input_vector, weights_1, bias)

In [17]:
print(f"The prediction result is: {prediction}")

The prediction result is: [0.87101915]


In [20]:
input_vector = np.array([2, 1.5])

In [21]:
prediction = make_prediction(input_vector, weights_1, bias)

In [22]:
print(f"The prediction result is: {prediction}")

The prediction result is: [0.87101915]


# Train the Neural Network

In [23]:
target = 0
mse = np.square(prediction - target)

In [26]:
print(f"Prediction:{prediction}; Error:{mse}")

Prediction:[0.87101915]; Error:[0.75867436]


Remember that the error expression is error = np.square(prediction - target). When you treat (prediction - target) as a single variable x, the derivative of the error is 2 * x. By taking the derivative of this function, you want to know in what direction should you change x to bring the result of error to zero, thereby reducing the error.

When it comes to your neural network, the derivative will tell you the direction you should take to update the weights variable. If it’s a positive number, then you predicted too high, and you need to decrease the weights. If it’s a negative number, then you predicted too low, and you need to increase the weights.

In [32]:
derivative = 2 * (prediction - target)

In [33]:
print(f"The derivative is {derivative}")

The derivative is [1.7420383]


In [34]:
# Updating the weights
weights_1 = weights_1 - derivative

In [35]:
prediction = make_prediction(input_vector, weights_1, bias)

In [36]:
error = (prediction - target)**2

In [37]:
print(f"Prediction: {prediction}; Error: {error}")

Prediction: [0.01496248]; Error: [0.00022388]
