# How does a neural net work?

Illustrated for the task to learn to multiply.

## Data

We generate many rows from the true "model"
$$
 y = f(x_1, x_2) = x_1 x_2,
$$
and want to estimate $f$ by $\hat f$.

In [None]:
# Load modules
import numpy as np
import pandas as pd
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.simplefilter('ignore')

In [None]:
# Generate n observations
np.random.seed(1901)
n = 1_000_000 # 10_000_000

df = pd.DataFrame(np.random.uniform(-10, 10, size=(n, 2)), 
                  columns=['x1', 'x2'])
df['y'] = df.x1 * df.x2 #+ np.random.normal(scale=1, size=(n, ))

# Visualize
df.hist(bins=100, layout=(1, 3), figsize=(15, 4))
df.head()

# Save first few rows
# df[0:1000].to_excel("Data.xlsx", index=False)

In [None]:
# sns.regplot(x='x1', y='y', data=df, ci=None)

### Step 1: Linear regression as neuronal net

We start with a simple neural net to mimic a linear regression with two covariables. Do not expect too much as one cannot represent a multiplication by a linear function (= weighted sum).

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.optimizers import Nadam, Adam

# Plot history (dropping the first m burn-in epochs)
def plot_history(h, drop_m=0):
    h = pd.DataFrame(h.history.history)
    h['epoch'] = np.arange(len(h.index)) + 1
    h = h.iloc[drop_m:]
    plt.plot(h.epoch, h.loss, label='Training')
    plt.plot(h.epoch, h.val_loss, label='Validation')
    plt.legend()
    
# Callbacks
cb = [EarlyStopping(patience=10),
      ReduceLROnPlateau(patience=3)]

### Model structure

In [None]:
# Structure
inputs = Input(shape=(2, ))
output = Dense(1)(inputs)
neural_net = Model(inputs=inputs, outputs=output)

neural_net.compile(loss='mse', 
                   optimizer=Nadam(lr=0.2))
neural_net.summary()

### Model fit

In [None]:
neural_net.fit(
    df[['x1', 'x2']],
    df['y'],
    batch_size=10000, 
    epochs=100, 
    validation_split=0.2,
    callbacks=cb,
    verbose=2
)

### Estimated parameter

Since a multiplication cannot be written as weighted sum, the parameters are essentially 0.

In [None]:
neural_net.get_weights()

### Compared with linear least squares

In [None]:
import statsmodels.api as sm

model = sm.OLS(df['y'], sm.add_constant(df[['x1', 'x2']]))
results = model.fit()
results.params

### Test

In [None]:
x1 = 2
x2 = 2
pred = neural_net.predict([[x1, x2]]).flatten()[0]
print(f'{x1} times {x2} is {pred:.3f}')

## Step 2: Hidden layers

In this step, we introduce hidden layers. Their additional parameters will give the neural network the ability to learn more complex relationships between in- and output like e.g. interactions.

### Model structure

In [None]:
# Structure
inputs = Input(shape=(2, ))
hidden = Dense(5)(inputs)
output = Dense(1)(hidden)
neural_net = Model(inputs=inputs, outputs=output)

neural_net.compile(loss='mse', 
                   optimizer=Nadam(lr=0.2),
                   metrics=['mse'])
neural_net.summary()

### Model fit

In [None]:
neural_net.fit(
    df[['x1', 'x2']],
    df['y'],
    batch_size=10000, 
    epochs=1000, 
    validation_split=0.2,
    callbacks=cb,
    verbose=True,
)

### Test

In [None]:
x1 = 2
x2 = 2
pred = neural_net.predict([[x1, x2]]).flatten()[0]
print(f'{x1} times {x2} is {pred:.3f}')

Hmmm...

## Step 3: Activation functions

A linear function of linear functions remains a linear function. We need to introduce some form of non-linearity. This is done by transforming the values of the nodes on the hidden layers by a non-linear function called "activation function".

### Model structure

In [None]:
# Structur
inputs = Input(shape=(2, ))
hidden = Dense(5, activation='tanh')(inputs)
output = Dense(1)(hidden)
neural_net = Model(inputs=inputs, outputs=output)

neural_net.compile(loss='mse', 
                   optimizer=Nadam(lr=0.1),
                   metrics=['mse'])
neural_net.summary()

### Model fit

In [None]:
neural_net.fit(
    df[['x1', 'x2']],
    df['y'],
    batch_size=5000, 
    epochs=400, 
    validation_split=0.2,
    callbacks=cb,
    verbose=1,
)

In [None]:
plot_history(neural_net)

### Test

In [None]:
x1 = 5
x2 = 5
pred = neural_net.predict([[x1, x2]]).flatten()[0]
print(f'{x1} times {x2} is {pred:.3f}')

### Extrapolation?

In [None]:
x1 = 50
x2 = 2
pred = neural_net.predict([[x1, x2]]).flatten()[0]
print(f'{x1} times {x2} is {pred:.3f}')

## Step 4? Play!

### Model structure

In [None]:
# Structure
inputs = Input(shape=(2, ))
hidden1 = Dense(50, activation='tanh')(inputs)
hidden2 = Dense(10, activation='tanh')(hidden1)
output = Dense(1)(hidden2)
neural_net = Model(inputs=inputs, outputs=output)

neural_net.compile(loss='mse', optimizer=Nadam(lr=0.002))
neural_net.summary()

### Model fit

In [None]:
neural_net.fit(
    df[['x1', 'x2']],
    df['y'],
    batch_size=5000, 
    epochs=200, 
    validation_split=0.2,
    callbacks=cb,
    verbose=1,
)

### Test

In [None]:
x1 = -7
x2 = 4
pred = neural_net.predict([[x1, x2]]).flatten()[0]
print(f'{x1} times {x2} is {pred:.3f}')