# Neural network packages
* MLPClassifier and MLPRegressor in sklearn (home assignment)
* Tensorflow and Keras (open source libraries developed by Google)
* PyTorch (open source libraries developed by Meta/Facebook)

<h1>Tensorflow</h1>
<li>end to end opensource tool for deep learning</li>
<li>A <span style="color:blue">tensor</span> is an n-dimensional mathematical object</li>
<li>n=0 => scalar</li>
<li>n=1 => vector</li>
<li>n=2 => matrix</li>
<li>n>2 => tensor</li>
<li>tensorflow was built to provide support for numerical computation in high dimensional mathematical objects</li>

In [None]:
import tensorflow as tf
tf.__version__

<h2>Keras</h2>
<li>Keras is an open source "human friendly" tensorflow wrapper</li>
<li>tensorflow provides the math, keras provides an ML interface to tensorflow</li>
<li>We'll implement our "non-linear" model example in keras/tf</li>

In [None]:
import numpy as np
import keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

training_data = np.array([[0,0,1],
            [0,1,1],
            [1,0,1],
            [1,1,1],
             [1,1,0],
             [0,1,0],
             [1,0,0],
             [1,0,0]],"float32")

y = np.array([[0],[1],[1],[1],[1],[0],[0],[0]],"float32")

<h3>Our network</h3>
<li>A <b>sequential</b> network (inputs will flow sequentially toward outputs)</li>
<li>3 nodes in the input layer (<b>input_dim = 3</b>)</li>
<li>A <b>dense</b> network is a fully connected network</li>
<li>We'll add a hidden layer of 16 nodes (connected to the 3 input nodes)</li>
<li>And an output layer of 1 node</li>

In [None]:
model = Sequential(name="My_Example")
model.add(Dense(16, input_dim=3, activation='relu',name="layer_1"))
model.add(Dense(1, activation='sigmoid',name="output_layer"))

<h2>ReLu: Rectified Linear Unit</h2>
<li>The sigmoid function, while easy to use, has a drawback</li>
<ul>
<li>large values snap to 1.0 and -1.0 quickly</li>
<li>the function is sensitive around the midpoint (0.5) but not much elsewhere</li>
<li>this makes it harder for the algorithm to adapt, and this is especially a problem with large datasets and many layered (deep) networks</li>
<li><b>vanishing gradient problem</b>: as the error is backpropagated through many layers, it decreases and the derivative becomes smaller and smaller so the weights barely change. (information is not well utilized)</li>
</ul>

<li>Relu is a popular activation function</li>

<li>The function:
<p>
$ f_{x} = \left\{\begin{array}{ll}
0 & if \ x\leq 0 \\
x & if \ x \gt 0
\end{array}\right.$

The function can be rewritten as:

$ f_{x} = max(0,x) $

linear above 0 and non-linear below 0 (negative values become 0). Linear functions are more generalizable and don't suffer from the vanishing gradient problem




In [None]:
model.summary()

<li><b>Optimizer</b> refers to the algorithm that tweaks weights during backpropogation</li>
<li>Typically, stochastic gradient descent is used</li>
<li><b>adam</b> extends stochastic gradient descent </li>
<ul>
<li>faster convergence</li>
<li>adaptive learning rates</li>
<li>See: <a href="https://www.geeksforgeeks.org/intuition-of-adam-optimizer/">https://www.geeksforgeeks.org/intuition-of-adam-optimizer/</a></li>
</ul>
<li><b>binary_accuracy</b> is a keras metric that converts predictions (floats between 0 and 1) into 0 or 1 binary values using a threshold of 0.5</li>

In [None]:
model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['binary_accuracy'])

In [None]:
model.fit(training_data, y, epochs=200,verbose=0)

In [None]:
model.predict(training_data).round() #round returns 0 or 1 rather than the predicted float value
#y = np.array([[0],[1],[1],[1],[1],[0],[0],[0]],"float32")

In [None]:
training_data

In [None]:
test_data = np.array([[1, 1, 1],
       [0, 1, 1],
       [1, 0, 0],
       [0, 0, 1]],"float32")
model.predict(test_data).round()

In [None]:
model.predict(test_data)

<h1>Classifying handwritten digits</h1>
<li>Let's use a neural net to build a handwritten digits predictor</li>

In [None]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')
mnist.keys()

<li>Let's take a look at the data by drawing the digits</li>

In [None]:
digit_number = 37833
for line in mnist.data.to_numpy()[digit_number].reshape(28,28):
    for num in line:
        if num > 0:
            print('*', end = ' ')
        else:
            print(' ', end = ' ')
    print('')
print("Actual value: ",mnist.target.to_numpy()[digit_number])


<h3>Training and testing data</h3>
<li>The data is in a tuple ((training data, training labels), (testing data, testing labels)) </li>
<li>extract training (60,000) and testing (10,000) data</li>
<li>normalize the independent variables</li>
<li>one hot encode the target values using <b>to_categorical</b> (keras one hot encoding function)</li>

In [None]:
import pandas as pd
import numpy as np
from tensorflow.keras.utils import to_categorical
tf.random.set_seed(7)
from sklearn.model_selection import train_test_split

mnist = keras.datasets.mnist
(train_iv, train_dv), (test_iv,
                               test_dv) = mnist.load_data()

# Standardize the data.
mean = np.mean(train_iv)
stddev = np.std(train_iv)
train_iv = (train_iv - mean) / stddev
test_iv = (test_iv - mean) / stddev

# One-hot encode labels.
train_dv = to_categorical(train_dv, num_classes=10)
test_dv = to_categorical(test_dv, num_classes=10)

In [None]:
test_dv

<h2>Creating the neural network</h2>
<li>We'll create a Sequential network as in our earlier example</li>
<li>

In [None]:
# Object used to initialize weights.
initializer = keras.initializers.RandomUniform(
    minval=-0.1, maxval=0.1)

# Create a Sequential model.

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)), #Converts each 28x28 matrix into a 784 vector
    keras.layers.Dense(25, activation='tanh', #the tanh activation function (values from -1 to 1 unlike sigmoid 0 to 1)
                       kernel_initializer=initializer,
                       bias_initializer='zeros'), #a bias input vector for regularization
    keras.layers.Dense(10, activation='relu', #output layer - 10 values, relu activation (fires/doesn't fire)
                       kernel_initializer=initializer,
                       bias_initializer='zeros')])

In [None]:
initializer

<h3>Training Parameters</h3>
<li><b>EPOCH</b>: Number of training passes</li>
<li><b>Batch Size</b>: Number of cases used to update the network (batch size = 1 means that each case is passed through the network and weights are updated. 60000 in each epoch!</li>
<li><b>Initializer</b>: An object that picks values from a uniform distribution between -0.1 and 0.1</li>


In [None]:


EPOCHS = 20
BATCH_SIZE = 2

In [None]:
len(train_dv)

In [None]:
# SGD optimizer with learning rate of 0.01
# MSE loss function

opt = tf.keras.optimizers.SGD(learning_rate=0.01)

model.compile(loss='mean_squared_error', optimizer = opt,
              metrics =['accuracy'])

# Train the model for 5 epochs (result for 20 epochs is below)
# Shuffle (randomize) order.
# Update weights after each example (batch_size=1).
history = model.fit(train_iv, train_dv,
                    validation_data=(test_iv, test_dv),
                    epochs=EPOCHS, batch_size=BATCH_SIZE,
                    verbose=2, shuffle=True)

<li><b>loss</b>: mean square error on the training set</li>
<li><b>accuracy</b>: accuracy on the training set</li>
<li><b>val_loss</b>: mean square error on the test set</li>
<li><b>val_accuracy</b>: accuracy on the test set</li>
<li>Progression of stats across 20 epochs:</li>
<pre>
Epoch 1/20
60000/60000 - 72s - loss: 0.0549 - accuracy: 0.6740 - val_loss: 0.0284 - val_accuracy: 0.8826 - 72s/epoch - 1ms/step
Epoch 2/20
60000/60000 - 73s - loss: 0.0226 - accuracy: 0.8900 - val_loss: 0.0182 - val_accuracy: 0.9063 - 73s/epoch - 1ms/step
Epoch 3/20
60000/60000 - 72s - loss: 0.0173 - accuracy: 0.9062 - val_loss: 0.0158 - val_accuracy: 0.9153 - 72s/epoch - 1ms/step
Epoch 4/20
60000/60000 - 73s - loss: 0.0153 - accuracy: 0.9149 - val_loss: 0.0145 - val_accuracy: 0.9192 - 73s/epoch - 1ms/step
Epoch 5/20
60000/60000 - 73s - loss: 0.0142 - accuracy: 0.9196 - val_loss: 0.0136 - val_accuracy: 0.9235 - 73s/epoch - 1ms/step
Epoch 6/20
60000/60000 - 72s - loss: 0.0134 - accuracy: 0.9235 - val_loss: 0.0131 - val_accuracy: 0.9264 - 72s/epoch - 1ms/step
Epoch 7/20
60000/60000 - 73s - loss: 0.0128 - accuracy: 0.9269 - val_loss: 0.0129 - val_accuracy: 0.9241 - 73s/epoch - 1ms/step
Epoch 8/20
60000/60000 - 72s - loss: 0.0124 - accuracy: 0.9292 - val_loss: 0.0125 - val_accuracy: 0.9283 - 72s/epoch - 1ms/step
Epoch 9/20
60000/60000 - 73s - loss: 0.0119 - accuracy: 0.9312 - val_loss: 0.0122 - val_accuracy: 0.9300 - 73s/epoch - 1ms/step
Epoch 10/20
60000/60000 - 74s - loss: 0.0116 - accuracy: 0.9333 - val_loss: 0.0122 - val_accuracy: 0.9298 - 74s/epoch - 1ms/step
Epoch 11/20
60000/60000 - 72s - loss: 0.0112 - accuracy: 0.9354 - val_loss: 0.0118 - val_accuracy: 0.9314 - 72s/epoch - 1ms/step
Epoch 12/20
60000/60000 - 73s - loss: 0.0110 - accuracy: 0.9378 - val_loss: 0.0118 - val_accuracy: 0.9297 - 73s/epoch - 1ms/step
Epoch 13/20
60000/60000 - 73s - loss: 0.0107 - accuracy: 0.9394 - val_loss: 0.0114 - val_accuracy: 0.9339 - 73s/epoch - 1ms/step
Epoch 14/20
60000/60000 - 73s - loss: 0.0104 - accuracy: 0.9407 - val_loss: 0.0114 - val_accuracy: 0.9324 - 73s/epoch - 1ms/step
Epoch 15/20
60000/60000 - 73s - loss: 0.0102 - accuracy: 0.9419 - val_loss: 0.0111 - val_accuracy: 0.9345 - 73s/epoch - 1ms/step
Epoch 16/20
60000/60000 - 83s - loss: 0.0100 - accuracy: 0.9430 - val_loss: 0.0109 - val_accuracy: 0.9357 - 83s/epoch - 1ms/step
Epoch 17/20
60000/60000 - 73s - loss: 0.0098 - accuracy: 0.9441 - val_loss: 0.0110 - val_accuracy: 0.9346 - 73s/epoch - 1ms/step
Epoch 18/20
60000/60000 - 73s - loss: 0.0097 - accuracy: 0.9446 - val_loss: 0.0110 - val_accuracy: 0.9342 - 73s/epoch - 1ms/step
Epoch 19/20
60000/60000 - 72s - loss: 0.0095 - accuracy: 0.9457 - val_loss: 0.0108 - val_accuracy: 0.9362 - 72s/epoch - 1ms/step
Epoch 20/20
60000/60000 - 73s - loss: 0.0094 - accuracy: 0.9466 - val_loss: 0.0107 - val_accuracy: 0.9359 - 73s/epoch - 1ms/step
</pre>