# Simple Tensorflow Demo for MNIST Dataset

In [None]:
import tensorflow as tf
print(tf.__version__)

## Loading MNIST data
MNIST dataset is already available within tensorflow package. The numbers are represented as 28x28 pixels stored as array. The pixels have value ranging from 0 to 255 which represents the gray scale from white to black respectively. The load() function returns a tuple of training and test sets.

In [None]:
mnist = tf.keras.datasets.mnist
(x_train,y_train),(x_test,y_test)= mnist.load_data() 

In [None]:
x_train_scaled = x_train/255
x_test_scaled = x_test/255

## Model initialization

In [None]:
model = tf.keras.models.Sequential()

The above model is a Sequential model. Here, each layer gets input from the previous layer and passed output to next layer. As alternative to Sequential model, the Functional API of Keras allows users to define more complex graph or layers where a layer can get input from more than one layer and pass outputs to multiple layers.

In [None]:
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))

Usually images are flattened into a vector to represent each input image as a row of the dataset.

In [None]:
model.add(tf.keras.layers.Dense(units=128,activation='relu'))

Dense layer is a regular densely-connected NN layer. The number of units is a hyper-parameter selected by experience. 
The output of a dense layer is calculated by `output = activation(dot(input, kernel) + bias)`. Here the ReLU or Rectified Linear Unit activation function is used. ReLU is half rectified function and it returns 0 for all negative inputs and for positive inputs it increases monotonically. 

In [None]:
model.add(tf.keras.layers.Dropout(0.2))

The dropout layer sets input units to 0 with the given rate. This is used to prevent overfitting. The shape can also be modified into a 1D tensor mask. And seed values can also be fixed. The dropout is active only when `training=True`. During inference, it is not used. During `model.fit()` training is True by default.

In [None]:
model.add(tf.keras.layers.Dense(10,'softmax'))

Softmax converts a vector of values into a probablity distribution. Usually it is used in output layers as it can be used to interpret probablity distributions. Here 10 is used as we have 0-9 as the labels. 

## Compile the Model

During this step, the configurations of the model are assigned. Optimizers are described in detail on later sections. In this example, `adam`optimizer is used.

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

`sparse_categorial_crossentropy` function used here is a loss function that is useful when the output prediction is a sparse array. In this case, the one-hot encoding 
produces 1 for the right integer and 0 for all other indices. The `sparse_categorial_crossentropy` function only computes the loss for the k<sup>th</sup> index and ignores the rest.
The cross entrophy loss for the rest of the positions would anyways be 0 and just summing up 0s is redundant. 


In [None]:
model.fit(x=x_train_scaled,y=y_train, validation_data=(x_test_scaled,y_test),epochs=10)