# 1. Overview

## 1.1. Basic concepts

### Nodes and edges
Going back to familiar binary Logistic Regression, we visualize, let's say, a model trained on the dataset having 3 features and a single label. On the graph, each feature/label ($\mathbf{x}$ or $\mathbf{y}$) is represented by a *node* and each model weight ($w_1,w_2,w_3$) is represented by a *colored edge*. The bias $w_0$ (or sometimes denoted $b$) is not showing on the graph, but we know that it is attached to the output node. This is the most basic architecture of a Neural Network with 4 parameters (3 weights + 1 bias).

<img src='image/mlp_linear_simple.png' style='height:180px; margin:20px auto;'>

### Layers
Now, we extend the problem to a Stacking model, where 5 base models and the meta model are all Linear Regression. Beside an input layer and an output layer, there is a new layer between them, called the *hidden layer*. We can add more hidden layers for multilevel stacking design. By doing this, our Neural Network becomes *deeper* and can capture complicated relationship the our data.

<img src='image/mlp_linear_stacking.png' style='height:300px; margin:20px auto;'>

### Multiple outputs
In the two examples above, the Neural Network is designed for a binary classification problem, where the target is a vector storing the probabilities of being classified to the positive class. For a multi-class classification problem, we need to contruct a vector of probabilities for each class. Below is an example Neural Network architecture with 2 hidden layers for the Iris data which has 4 features and 3 classes.

<img src='image/mlp_iris.png' style='height:300px; margin:20px auto;'>

An important thing to notice is the number of parameters (weights and biases). A large number of parameters leads to high training time. Here are the numbers of parameters for each layer:
- Layer 1: $4\times6=24$ weights and $6$ biases or $30$ in total
- Layer 2: $6\times6=36$ weights and $6$ biases or $42$ in total
- Layer 3: $6\times3=18$ weights and $3$ biases or $21$ in total

### Implementation
Now, let's construct a Neural Network using TensorFlow.

In [1]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow.keras import layers

In [2]:
model = keras.Sequential()
model.add(layers.Dense(units=6))
model.add(layers.Dense(units=6))
model.add(layers.Dense(units=3, activation='softmax'))
model.compile(loss='categorical_crossentropy')

model.build(input_shape=(None,4))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 6)                 30        
                                                                 
 dense_1 (Dense)             (None, 6)                 42        
                                                                 
 dense_2 (Dense)             (None, 3)                 21        
                                                                 
Total params: 93
Trainable params: 93
Non-trainable params: 0
_________________________________________________________________


## 1.2. Activation functions


## 1.3. Backpropagation

## 1.4. Ispiration

In [1]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [5]:
model = keras.Sequential()
model.add(layers.Dense(units=1))
model.compile(tf.optimizers.Adam(learning_rate=0.1), loss='mean_absolute_error')
model.build(input_shape=(None,3))
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 1)                 4         
                                                                 
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [2]:
x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
y = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0], dtype=float)

In [14]:
model1.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_7 (Dense)             (None, 1)                 2         
                                                                 
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


In [42]:
normalizer = layers.Normalization(input_shape=(1,), axis=None)
normalizer.adapt(xs)

xs = xs.reshape(-1,1)
ys = ys.reshape(-1,1)
model1 = keras.Sequential()
# model1.add(normalizer)
model1.add(layers.Dense(units=1))

model1.compile(loss='binary_crossentropy')

model1.fit(xs, ys, validation_split=0.2, epochs=100, verbose=0)
model1.predict(xs)

array([[-0.03436869],
       [ 0.9754254 ],
       [ 1.9852195 ],
       [ 2.9950137 ],
       [ 4.0048075 ],
       [ 5.0146017 ]], dtype=float32)

- https://www.tensorflow.org/api_docs/python/tf/keras/layers
- https://www.tensorflow.org/api_docs/python/tf/keras/Model
- https://www.tensorflow.org/api_docs/python/tf/keras/optimizers
- https://www.tensorflow.org/api_docs/python/tf/keras/metrics