<a href="https://colab.research.google.com/github/rahiakela/data-learning-research-and-practice/blob/main/deep-learning-with-python-by-francois-chollet/2-mathematical-building-blocks/2_neural_network_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Neural Network from scratch

You should now have a general understanding
of what’s going on behind the scenes in a neural network. What was a magical
black box at the start, has turned into a clearer picture.

<img src='images/2.png?raw=1' width='800'/>

- the model, composed of layers that are chained together, maps the input data to predictions. 
- The loss function then compares these predictions
to the targets, producing a loss value: a measure of how well the model’s predictions match what was expected. 
- The optimizer uses this loss value to update the model’s weights.



##Setup

In [1]:
from tensorflow.keras.datasets import mnist
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np

##NN using Keras

Now you understand that the input images are stored in NumPy tensors, which are
here formatted as float32 tensors of shape (60000, 784) (training data) and (10000,
784) (test data) respectively.

In [2]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Now you understand that this model consists of a chain of two Dense layers, that each
layer applies a few simple tensor operations to the input data, and that these operations
involve weight tensors. Weight tensors, which are attributes of the layers, are
where the knowledge of the model persists.

In [3]:
model = keras.Sequential([
   layers.Dense(512, activation="relu"),
   layers.Dense(10, activation="softmax")                      
])

Now you understand that sparse_categorical_crossentropy is the loss function
that’s used as a feedback signal for learning the weight tensors, and which the training
phase will attempt to minimize. 

You also know that this reduction of the loss
happens via mini-batch stochastic gradient descent. The exact rules governing a specific
use of gradient descent are defined by the rmsprop optimizer passed as the first
argument.

In [4]:
model.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

Now you understand what happens when you call fit: the model will start to iterate
on the training data in mini-batches of 128 samples, 5 times over (each iteration over
all the training data is called an epoch).

For each batch, the model will compute the
gradient of the loss with regard to the weights (using the Backpropagation algorithm,
which derives from the chain rule in calculus) and move the weights in the direction
that will reduce the value of the loss for this batch.

In [5]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f91347913d0>

After these 5 epochs, the model will have performed 2,345 gradient updates (469
per epoch), and the loss of the model will be sufficiently low that the model will be
capable of classifying handwritten digits with high accuracy.

At this point, you already know most of what there is to know about neural networks.
Let’s prove it by reimplementing a simplified version of that first example
“from scratch” in TensorFlow, step by step.

##NN from scratch in TensorFlow