In [1]:
import warnings
warnings.filterwarnings("ignore")

<font size="+5">#04. Why Neural Networks Deeply Learn a Mathematical Formula?</font>

- Book + Private Lessons [Here ↗](https://sotastica.com/reservar)
- Subscribe to my [Blog ↗](https://blog.pythonassembly.com/)
- Let's keep in touch on [LinkedIn ↗](www.linkedin.com/in/jsulopz) 😄

# Machine Learning, what does it mean?

> - The Machine Learns...
>
> But, **what does it learn?**

In [2]:
%%HTML
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Machine Learning, what does it mean? ⏯<br><br>· The machine learns...<br><br>Ha ha, not funny! 🤨 What does it learn?<br><br>· A mathematical equation. For example: <a href="https://t.co/sjtq9F2pq7">pic.twitter.com/sjtq9F2pq7</a></p>&mdash; Jesús López (@sotastica) <a href="https://twitter.com/sotastica/status/1449735653328031745?ref_src=twsrc%5Etfw">October 17, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

# How does the Machine Learn?

## In a Linear Regression

In [3]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Ht3rYS-JilE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## In a Neural Network

In [4]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/IHZwWFHWa-w?start=329" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

A Practical Example → [Tesla Autopilot](https://www.tesla.com/AI)

An Example where It Fails → [Tesla Confuses Moon with Semaphore](https://twitter.com/Carnage4Life/status/1418920100086784000?s=20)

# Load the Data

> - Simply execute the following lines of code to load the data.
> - This dataset contains **statistics about Car Accidents** (columns)
> - In each one of **USA States** (rows)

https://www.kaggle.com/fivethirtyeight/fivethirtyeight-bad-drivers-dataset/

In [5]:
import seaborn as sns

df = sns.load_dataset(name='car_crashes', index_col='abbrev')
df.sample(5)

Unnamed: 0_level_0,total,speeding,alcohol,not_distracted,no_previous,ins_premium,ins_losses
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
TN,19.5,4.095,5.655,15.99,15.795,767.91,155.57
VT,13.6,4.08,4.08,13.056,12.92,716.2,109.61
CO,13.6,5.032,3.808,10.744,12.92,835.5,139.91
TX,19.4,7.76,7.372,17.654,16.878,1004.75,156.83
AR,22.4,4.032,5.824,21.056,21.28,827.34,142.39


# Neural Network Concepts in Python

## Initializing the `Weights`

> - https://keras.io/api/layers/initializers/

### How to `kernel_initializer` the weights?

$$
accidents = speeding \cdot w_1 + alcohol \cdot w_2 \ + ... + \ ins\_losses \cdot w_7
$$

In [13]:
from tensorflow.keras import Sequential, Input
from tensorflow.keras.layers import Dense

In [14]:
df.shape

(51, 7)

In [50]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='zeros'))
model.add(layer=Dense(units=1))

#### Make a Prediction with the Neural Network

> - Can we make a prediction for for `Washington DC` accidents
> - With the already initialized Mathematical Equation?

In [51]:
X = df.drop(columns='total')
y = df.total

In [52]:
AL = X[:1]

In [53]:
AL

Unnamed: 0_level_0,speeding,alcohol,not_distracted,no_previous,ins_premium,ins_losses
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL,7.332,5.64,18.048,15.04,784.55,145.08


In [54]:
model.predict(AL)



2022-01-05 15:31:19.028496: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


array([[0.]], dtype=float32)

#### Observe the numbers for the `weights`

In [48]:
model.get_weights()

[array([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=float32),
 array([0., 0., 0.], dtype=float32),
 array([[-0.1387695 ],
        [-0.15761685],
        [ 0.54051685]], dtype=float32),
 array([0.], dtype=float32)]

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [20]:
model.predict(x=AL)

array([[0.]], dtype=float32)

#### `fit()` the `model` and compare again

In [21]:
model.compile(loss='mse', metrics=['mse'])

In [22]:
model.fit(X, y, epochs=500, verbose=1)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500


2022-01-05 15:25:30.253431: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78/500
Epoch 79/500
Epoch 80/500
Epoch 81/500
Epoch 82/500
Epoch 83/500
Epoch 84/500
E

<keras.callbacks.History at 0x1686286d0>

##### Observe the numbers for the `weights`

In [23]:
model.get_weights()

[array([[ 0.08066401, -0.08103037,  0.08105689],
        [ 0.17918132, -0.17946328,  0.17948398],
        [ 0.12544397, -0.1257719 ,  0.12579569],
        [ 0.19439285, -0.19473135,  0.19475527],
        [-0.00223334,  0.00116586, -0.00109652],
        [ 0.00606154, -0.00703107,  0.0070945 ]], dtype=float32),
 array([ 0.07513256, -0.07591349,  0.07596511], dtype=float32),
 array([[ 0.26604873],
        [-1.010353  ],
        [ 1.220803  ]], dtype=float32),
 array([0.07621886], dtype=float32)]

##### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [24]:
model.predict(AL)

2022-01-05 15:26:07.527944: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


array([[17.336948]], dtype=float32)

In [25]:
y_pred = model.predict(X)

In [26]:
dfsel = df[['total']].copy()
dfsel['pred_zeros_after_fit'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1
AL,18.8,17.336948
AK,18.1,16.232738
AZ,18.6,16.6241
AR,22.4,20.550613
CA,12.0,11.388576


In [27]:
mse = ((dfsel.total - dfsel.pred_zeros_after_fit)**2).mean()
mse

3.9455144617939495

### How to `kernel_initializer` the weights to 1?

$$
accidents = speeding \cdot w_1 + alcohol \cdot w_2 \ + ... + \ ins\_losses \cdot w_7
$$

In [28]:
from tensorflow.keras import Sequential, Input
from tensorflow.keras.layers import Dense

In [29]:
df.shape

(51, 7)

In [30]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='ones'))
model.add(layer=Dense(units=1))

#### Make a Prediction with the Neural Network

> - Can we make a prediction for for `Washington DC` accidents
> - With the already initialized Mathematical Equation?

In [31]:
X = df.drop(columns='total')
y = df.total

In [32]:
AL = X[:1]

In [33]:
AL

Unnamed: 0_level_0,speeding,alcohol,not_distracted,no_previous,ins_premium,ins_losses
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AL,7.332,5.64,18.048,15.04,784.55,145.08


In [34]:
model.predict(AL)

2022-01-05 15:26:44.322688: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


array([[-774.39435]], dtype=float32)

#### Observe the numbers for the `weights`

In [35]:
model.get_weights()

[array([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=float32),
 array([0., 0., 0.], dtype=float32),
 array([[-0.88074064],
        [ 1.1793224 ],
        [-1.0922706 ]], dtype=float32),
 array([0.], dtype=float32)]

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

#### `fit()` the `model` and compare again

In [36]:
model.compile(loss='mse', metrics=['mse'])

In [37]:
model.fit(X, y, epochs=500, verbose=1)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500


2022-01-05 15:27:17.017965: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78/500
Epoch 79/500
Epoch 80/500
Epoch 81/500
Epoch 82/500
Epoch 83/500
Epoch 84/500
Epoch 85/500
Epoch 86/500
Epoch 87/500

<keras.callbacks.History at 0x1687c8e50>

##### Observe the numbers for the `weights`

In [38]:
model.get_weights()

[array([[0.78321904, 1.2191149 , 0.7829328 ],
        [0.7357272 , 1.2666777 , 0.73542976],
        [0.7587743 , 1.2436014 , 0.7584809 ],
        [0.7399096 , 1.262459  , 0.73961765],
        [0.87216467, 1.1297032 , 0.87194175],
        [0.86290854, 1.1390507 , 0.86267155]], dtype=float32),
 array([-0.17468476,  0.17676489, -0.17493707], dtype=float32),
 array([[-0.7475918 ],
        [ 1.3148766 ],
        [-0.95912707]], dtype=float32),
 array([0.17591994], dtype=float32)]

##### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [39]:
model.predict(AL)



2022-01-05 15:28:13.822397: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


array([[18.471478]], dtype=float32)

In [40]:
y_pred = model.predict(X)

In [41]:
dfsel = df[['total']].copy()
dfsel['pred_zeros_after_fit'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1
AL,18.8,18.471478
AK,18.1,17.161646
AZ,18.6,17.160315
AR,22.4,20.979366
CA,12.0,12.787192


In [42]:
mse = ((dfsel.total - dfsel.pred_zeros_after_fit)**2).mean()
mse

2.016406947568492

### How to `kernel_initializer` the weights to `glorot_uniform` (default)?

## Play with the Activation Function

> - https://keras.io/api/layers/activations/

In [36]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/IHZwWFHWa-w?start=558" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

### Use `sigmoid` activation in last layer

In [37]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='glorot_uniform'))
model.add(layer=Dense(units=1, activation='sigmoid'))

In [38]:
model.compile(loss='mse', metrics=['mse'])

#### `fit()` the Model

In [39]:
model.fit(X, y, epochs=500, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f81e8d14310>

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [40]:
y_pred = model.predict(X)



In [41]:
dfsel['pred_sigmoid'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit,pred_sigmoid
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AL,18.8,17.909626,0.0
AK,18.1,17.232151,0.0
AZ,18.6,16.393391,0.0
AR,22.4,19.506733,0.0
CA,12.0,14.266361,0.0


In [42]:
mse = ((dfsel.total - dfsel.pred_sigmoid)**2).mean()
mse

265.98803921568634

#### Observe the numbers for the `weights`

> - Have they changed?

In [43]:
model.get_weights()

[array([[ 0.7248485 ,  0.36856735,  0.63146174],
        [ 0.07509726,  0.33351755, -0.30260724],
        [-0.7252705 , -0.55833864,  0.15868711],
        [-0.39576796, -0.33873704, -0.768407  ],
        [ 0.64309967, -0.5359219 , -0.67604953],
        [ 0.11188209, -0.31420344,  0.22522306]], dtype=float32),
 array([0., 0., 0.], dtype=float32),
 array([[-0.3183049 ],
        [ 1.1042565 ],
        [-0.02906787]], dtype=float32),
 array([0.], dtype=float32)]

### Use `linear` activation in last layer

In [37]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='glorot_uniform'))
model.add(layer=Dense(units=1, activation='linear'))

In [38]:
model.compile(loss='mse', metrics=['mse'])

#### `fit()` the Model

In [39]:
model.fit(X, y, epochs=500, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f81e8d14310>

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [40]:
y_pred = model.predict(X)



In [41]:
dfsel['pred_sigmoid'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit,pred_sigmoid
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AL,18.8,17.909626,0.0
AK,18.1,17.232151,0.0
AZ,18.6,16.393391,0.0
AR,22.4,19.506733,0.0
CA,12.0,14.266361,0.0


In [42]:
mse = ((dfsel.total - dfsel.pred_sigmoid)**2).mean()
mse

265.98803921568634

#### Observe the numbers for the `weights`

> - Have they changed?

In [43]:
model.get_weights()

[array([[ 0.7248485 ,  0.36856735,  0.63146174],
        [ 0.07509726,  0.33351755, -0.30260724],
        [-0.7252705 , -0.55833864,  0.15868711],
        [-0.39576796, -0.33873704, -0.768407  ],
        [ 0.64309967, -0.5359219 , -0.67604953],
        [ 0.11188209, -0.31420344,  0.22522306]], dtype=float32),
 array([0., 0., 0.], dtype=float32),
 array([[-0.3183049 ],
        [ 1.1042565 ],
        [-0.02906787]], dtype=float32),
 array([0.], dtype=float32)]

### Use `tanh` activation in last layer

### Use `relu` activation in last layer

### How are the predictions changing? Why?

## Optimizer

> - https://keras.io/api/optimizers/#available-optimizers

Optimizers comparison in GIF → https://mlfromscratch.com/optimizers-explained/#adam

Tesla's Neural Network Models is composed of 48 models trainned in 70.000 hours of GPU → https://tesla.com/ai

1 Year with a 8 GPU Computer → https://twitter.com/thirdrowtesla/status/1252723358342377472

### Use Gradient Descent `SGD`

In [44]:
model = Sequential()
model.add(layer=Input(shape=(6,)))
model.add(layer=Dense(units=3, kernel_initializer='glorot_uniform'))
model.add(layer=Dense(units=1, activation='sigmoid'))

#### `compile()` the model

In [45]:
model.compile(optimizer='sgd', loss='mse', metrics=['mse'])

#### `fit()` the Model

In [46]:
history = model.fit(X, y, epochs=500, verbose=0)

#### Predictions vs Reality

> 1. Calculate the Predicted Accidents and
> 2. Compare it with the Real Total Accidents

In [47]:
y_pred = model.predict(X)



In [48]:
dfsel['pred_gsd'] = y_pred
dfsel.head()

Unnamed: 0_level_0,total,pred_zeros_after_fit,pred_sigmoid,pred_gsd
abbrev,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AL,18.8,17.909626,0.0,0.0
AK,18.1,17.232151,0.0,0.0
AZ,18.6,16.393391,0.0,0.0
AR,22.4,19.506733,0.0,0.0
CA,12.0,14.266361,0.0,0.0


In [49]:
mse = ((dfsel.total - dfsel.pred_sgd)**2).mean()
mse

AttributeError: 'DataFrame' object has no attribute 'pred_sgd'

#### Observe the numbers for the `weights`

> - Have they changed?

In [None]:
model.get_weights()


#### View History

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

### Use `ADAM`

### Use `RMSPROP`

### Does it take different times to get the best accuracy? Why?

## Loss Functions

> - https://keras.io/api/losses/

### `binary_crossentropy`

### `sparse_categorical_crossentropy`

### `mean_absolute_error`

### `mean_squared_error`

## In the end, what should be a feasible configuration of the Neural Network for this data?

# Common Errors

## The `kernel_initializer` Matters

## The `activation` Function Matters

## The `optimizer` Matters

## The Number of `epochs` Matters

## The `loss` Function Matters

## The Number of `epochs` Matters

# Neural Network's importance to find **Non-Linear Patterns** in the Data

> - The number of Neurons & Hidden Layers

https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.87287&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

## Summary

- Mathematical Formula
- Weights / Kernel Initializer
- Loss Function
- Activation Function
- Optimizers

## What cannot you change arbitrarily of a Neural Network?

- Input Neurons
- Output Neurons
- Loss Functions
- Activation Functions