# Chapter 2, Basic Building Blocks of Neural Network (Math Intuition)

It is necessary to learn mathematics in order to be a deep learning engineer. Without math, it will become hard to identify and resolve the issues, you will face in the deep learning architecture. 
So here we gonna start with the hello world of deep learning which is “MNIST”. MNIST (National Institute  of standards and Technology) has developed this dataset in 1980s. This dataset consist of handwritten 60,000 training and 10,000 test images categorized through 0 to 9. Here we will classify grayscale images using Keras. This comes along with the installation of Keras. 
Note: In machine learning a category in a classification problem is called a class.Data points called samples

In [2]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

The code imports the MNIST dataset from the TensorFlow Keras library, which is a high-level API for building and training machine learning models. Specifically, the code imports the `mnist` module from the `datasets` subpackage of the `tensorflow.keras` library.

The second line of the code uses the `load_data()` function from the MNIST module to load the dataset into two sets of variables: `train_images` and `train_labels` for the training data, and "test_images" and "test_labels" for the test data.

The training data consists of 60,000 images and their corresponding labels, while the test data consists of 10,000 images and their corresponding labels. The images are represented as NumPy arrays with shape (28, 28) ranging from 0 to 9, while the labels are represented as NumPy arrays of integers.

In [3]:
# training data
train_images.shape

(60000, 28, 28)

In [4]:
print(train_images)

[[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 [[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]


In [5]:
len(train_labels)

60000

In [6]:
train_labels.shape

(60000,)

In [7]:
print(train_labels)

[5 0 4 ... 5 6 8]


In [8]:
# test data
test_images.shape

(10000, 28, 28)

In [9]:
test_labels.shape

(10000,)

In [10]:
len(test_labels)

10000

In [11]:
print(test_labels)

[7 2 1 ... 4 5 6]


The output (10000,) for ``test_labels.shape`` indicates that test_labels is a ``1-dimensional`` NumPy array with 10,000 elements.

Since the MNIST test set contains 10,000 images, each image is associated with a single label. Therefore, the test_labels array contains the corresponding label for each image in the test set.

The shape of the test_labels array is (10000,), which means that it has a length of 10,000. The lack of a second dimension in the shape indicates that test_labels is a 1D array. This is in contrast to the test_images array, which has a shape of (10000, 28, 28) indicating that it is a 3D array with 10,000 28x28 images.

Overall, the (10000,) output indicates the shape of the test_labels array and confirms that it contains 10,000 labels for the test set.

In [17]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(512, activation = 'relu'),     
    layers.Dense(10, activation = 'softmax')])

Core building block of neural network is layer and our model consist of two dense layers (densly connected or fully connected)

- By importing the `layers` module from TensorFlow Keras, you can access all of these layer types and more. For example, `layers.Dense` is used to define fully connected layers in a neural network, while `layers.Conv2D` is used to define convolutional layers.

- When using `keras.Sequential`, you can add layers. Each layer in the sequence is connected to the previous layer, forming a chain of operations that process the input data, where the output of one layer is fed as input to the next layer in the sequence., and so on, until the final layer produces the output of the model. 

`layers.Dense(512, activation = 'relu')` defines the first dense layer in the model. The Dense layer is a fully connected layer where each neuron is connected to every neuron in the previous layer. The first argument 512 specifies the number of neurons in this layer. The activation argument specifies the activation function used by this layer, which in this case is the ReLU function. ReLU is a commonly used activation function that **returns the input if it is positive, and zero otherwise.**

`layers.Dense(10, activation = 'softmax')` defines the second dense layer in the model. The second layer is also a fully connected layer with 10 neurons, but the activation function used in this layer is the Softmax function. Softmax is another commonly used activation function that converts the outputs of the previous layer into a probability distribution over the 10 possible classes in the MNIST dataset.

- The two Dense layers are enclosed within square brackets [] and passed as a list to the Sequential model constructor. This specifies the order of layers in the neural network.



To make the model ready for training, we need to pick three more things as part of
the compilation step:

An optimizer
- The mechanism through which the model will update itself based
on the training data it sees, so as to improve its performance.

A loss function
- How the model will be able to measure its performance on the
training data, and thus how it will be able to steer itself in the right direction.

Metrics 
- To monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

In [18]:
model.compile(optimizer='rmsprop', 
loss = 'sparse_categorical_crossentropy', 
metrics = ['accuracy'])

The preprocessing step is important for preparing the data before feeding it into the neural network model. In this case, the data needs to be reshaped and scaled.

First, the training images are reshaped from an array of shape `(60000, 28, 28)` to a `float32` array of shape `(60000, 28 * 28)`, which means each image is flattened into a `1D` array of length `784`. This is necessary because the Dense layer in the model expects the input to be a 1D array.

Second, the pixel values of the training images are scaled from the original range of `[0, 255]` to the interval `[0, 1]`. This is important for numerical stability and to ensure that the optimization algorithm works effectively. The scaling is done by dividing each pixel value by `255`, which results in a new range of ``[0, 1]``.

In [19]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

In [20]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1e432fa7220>

**Observations**

This output shows the progress of training a neural network on the MNIST dataset for five epochs (passes through the entire training set).

In each epoch, the network is trained on the training set, and its performance on both the training set and validation set is evaluated. The training process is divided into batches, and each batch consists of a subset of the training data. In this case, each batch contains 469 samples (images and labels) from the training set.

For each epoch, the output shows the following:

Epoch number: the current epoch being trained.
- Number of batches: the total number of batches in the training set.
- Time taken to complete the epoch: the time taken to train the network on the entire training set.
- Training loss: the average loss (error) on the training set for this epoch.
- Training accuracy: the fraction of correctly classified samples on the training set for this epoch.
- The training loss and accuracy are measures of how well the network is fitting the training data. Ideally, we want the loss to decrease and the accuracy to increase as the training progresses. However, it is also possible to overfit the training data, meaning the network is too specialized to the training data and performs poorly on new, unseen data.

we can see that the loss and accuracy are improving with each epoch, which is a good sign that the network is learning to classify the images correctly.

In [21]:
test_digits = test_images[0:10]
predictions = model.predict(test_digits)

predictions[0]



array([2.2348040e-07, 7.3500854e-09, 3.9086581e-06, 1.7115715e-05,
       6.6224463e-11, 1.6618484e-08, 3.2788908e-12, 9.9997699e-01,
       7.0170636e-08, 1.6369813e-06], dtype=float32)

In [22]:
predictions[0].argmax()

7

In [23]:
predictions[0][7]

0.999977

The code then prints out the first set of predictions, accesses the predicted class with the highest probability using the `argmax()` method, and also accesses the probability for the 8th class `(indexed from 0)` using the index operator `[7]`.

#### Evaluating the model on new data

In [24]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

test_acc: 0.9800999760627747


**Test accuracy is 98.2 and training accuracy was 99.5, a bit lower than training accuracy is the example of model overfitting**

### Summary 
1. Load the dataset
2. Split it into train and test
3. Build network architecture (define model type, layers, activtion function etc)
4. Compilation step (optimizer, loss, metrics)
5. Preprocessing the data to feed the neural network (reshaping)
6. Model fitting
7. Predictions
8. Evaluating the model onto the new data.