<a href="https://colab.research.google.com/github/shengy90/2.Regression/blob/Week1/Francois%20Chollet%20Deep%20Learning%20With%20Python/Chapter_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Super Simple Example of Neural Net

Loading Data Set

In [0]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Building a really simple network that comprises of:
- 1 `Dense` (aka fully connected) layer with 512 nodes 
- 1 `Softmax` layer with 10 nodes (to classify 10 digits)


In [0]:
from tensorflow.keras import models
from tensorflow.keras import layers
model = models.Sequential([
                           layers.Dense(512, activation='relu'),
                           layers.Dense(10, activation='softmax')
                           ])

To make the model ready for training, we still need to pick:
- an `optimiser` : mechanism in which the model will update itself based on the training data 
- a `loss function` : how the model will measure the accuracy on the training data and iterate 
- a `metric` : to measure accuracy on the test data 

In [0]:
model.compile(
    optimizer='rmsprop', # RMSprop - exponentially weighted averages of the gradients 
    loss='sparse_categorical_crossentropy', # for multiclass classification
    metrics=['accuracy'] # simple % of misclassified objects
    )

Before feeding the data into the model, we need to normalise the input data by:
- scaling it so that they are between [0,1] to speed up learning by dividing by 255 (number of possible greyscale pixel bits)
- reshaping each image into a 28 X 28 array

In [0]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Now let's fit the model with the training data:
- 5 epochs (or iterations, or forward pass + backprops)
- training 128 images in a single batch



In [0]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Let's test the model on the test data

In [0]:
test_digits = test_images[0:10] # first 10 images 
predictions = model.predict(test_digits)

print(f"Prediction for the first image: {predictions[0].argmax()}")
print(f"Test label for the first image: {test_labels[0]}")
print(f"Prediction for the fifth image: {predictions[4].argmax()}")
print(f"Test label for the fifth image: {test_labels[4]}")

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Average accuracy over the whole test set: {test_acc}")


Recall that our train set has an accuracy of 98.8%, which is slightly higher than the test set. This discrepancy is what we call 'overfitting' - model has overfit to the train set and doesn't generalise 'well' to data it hasn't seen before. 

# 2. Data Representation in Neural Networks

Data is represented in the form of a `tensor` in Deep Learning. A `tensor` is essentially an array of numbers:
- **tensor of rank 0:** a `scaler`, e.g. `1`, `2` or `3` etc.
- **tensor of rank 1:** a `vector`, e.g. `[1,2,3,4,5]` etc. In this example, the vector has 5 `dimensions`.
- **tensor of rank 2:** a `matrix`, e.g. `[ [1,2,3,4,5], [1,2,3,4,5] ]`. In this example, the matrix is of the dimension 2X5 (2 rows, 5 columns).
- **a tensor of rank 3:** e.g. `[ [ [1,2,3,4,5],[1,2,3,4,5] ], [ [2,3,4,5,6],[2,3,4,5,6] ]`. You can essentially think of this as a 'cube' of 2X2X5 numbers (2 axes with each axis having 2 rows and 5 columns).

To find out the rank of a tensor, we can use numpy's `ndim` attribute.


In [0]:
import numpy as np 
t = np.array([
              [
               [1,2,3,4,5],
               [1,2,3,4,5]
              ],
              [
               [2,3,4,5,6],
               [2,3,4,5,6]
              ]
])

In [0]:
print(f"Tensor t: \n{t}")
print(f"Rank of tensor t: {t.ndim}")

**Key attributes of a tensor:**

1. Number of axes (or rank) : can be accessed via `ndim` attribute
2. Shape : number of dimensions along each axes; can be accessed via `shape` attribute 
3. Data type: e.g. floats, integers etc; can be accessed via `dtype` attribute

# 3. Engine of Neural Nets : Gradient Based Operations

**Neural nets operate by:**
1. Start with random initial parameters 
2. Carry out forward pass : computing $\hat y$ given $x$
3. Carry out backpropagation : computing the derivative of the loss function w.r.t. the weights 
4. Update weights : $weight_{new} = weight_{old} - \eta * \frac{\delta J}{\delta weight}$ where J is the loss function
5. Do 2. to 4. until loss function stops decreasing (reached minima). 

**Representing Forward Pass and Backward Propagation using a computational graph**

Let's use linear regression for this example: $\hat y = w * x + b$

And loss function = $(\hat y-y)^2$.

![Computational Graph](https://github.com/shengy90/Deep-Learning-Tutorials/blob/master/misc/forwardbackwards.png?raw=true)
