## Simple Sequential Model using Tensorflow and Keras

In [1]:
from tensorflow.keras.datasets import mnist
from tensorflow import keras
import numpy as np

2023-12-15 13:04:10.158075: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


#### STEP 0: Collect and prepare data

In [2]:
(train_images, train_labels) , (test_images, test_labels) = mnist.load_data()

In [3]:
train_images.shape

(60000, 28, 28)

In [4]:
test_images.shape

(10000, 28, 28)

In [5]:
train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype("float32") / 255

test_images = test_images.reshape((10000, 28*28))
test_images = test_images.astype("float32") / 255

#### STEP 1: DEFINE SEQUENTIAL MODEL

The model is defined using the `Sequential` model API which is a linear stack of layers. You can create a `Sequential` model and add configurable layers to it in a linear stack.

Here's a breakdown of the code:

1. `model = keras.Sequential()`: This line initializes a new sequential model. A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

2. `model.add(keras.layers.Dense(512, activation='relu', input_shape=(28*28,)))`: This line adds a densely connected (also known as fully connected) layer to the model. The layer has 512 neurons, and uses the ReLU (Rectified Linear Unit) activation function. The `input_shape` parameter indicates that each input sample is an array of length 28*28 (which is 784). This could be, for example, a flattened 28x28 pixel grayscale image.

3. `model.add(keras.layers.Dense(10, activation='softmax'))`: This line adds another densely connected layer, but this one has only 10 neurons. This is the output layer of the model. The softmax activation function is used here, which means that the model will output a probability distribution over the 10 output classes, meaning the output for each sample is a 10-element vector of probabilities summing to 1. Each class's probability is the model's confidence that the input sample belongs to that class.

In summary, this code defines a simple two-layer neural network with a fully connected hidden layer with 512 neurons and a fully connected output layer with 10 neurons. The hidden layer uses ReLU as the activation function, and the output layer uses softmax.

In [6]:
model = keras.Sequential()

model.add(keras.layers.Dense(512, activation='relu', input_shape=(28*28,)))
model.add(keras.layers.Dense(10, activation='softmax'))

#### STEP 2: COMPILE MODEL

The `model.compile()` function is used to configure the learning process before training the model. It takes three main arguments: `optimizer`, `loss`, and `metrics`.

1. `optimizer`: This is the strategy that the model will use to find the minimum of the loss function. In this case, 'rmsprop' is used. RMSprop (Root Mean Square Propagation) is an optimizer that utilizes the magnitude of the recent gradient descents to normalize the gradient. It's a popular choice for recurrent neural networks.

2. `loss`: This is the function that the model will try to minimize. It's a way to measure how far the model's predictions are from the actual data. Here, "sparse_categorical_crossentropy" is used, which is appropriate for multi-class classification problems where each instance belongs to a single class and the classes are mutually exclusive.

3. `metrics`: These are the metrics used to evaluate the performance of your model. It's not used during the training process but is used to judge the performance of your model. In this case, 'accuracy' is used, which calculates how often the predictions given by the model are correct.

In summary, this code is preparing a neural network model for training by setting the optimizer, loss function, and evaluation metrics.

In [7]:
model.compile(optimizer='rmsprop',
              loss="sparse_categorical_crossentropy",
              metrics=['accuracy'])

#### STEP 3: FIT MODEL ON TRAINING DATA

The `model.fit()` function is used to train the model for a fixed number of epochs (iterations on a dataset).

The first two parameters, `train_images` and `train_labels`, are the training data. `train_images` is the input data for the model, and `train_labels` are the labels or the expected output for the training data. The model uses this data to learn the relationship between the input and the output.

The `epochs=5` parameter specifies the number of epochs, which is the number of complete passes through the entire training dataset. Here, the model is trained for 5 epochs, meaning the entire dataset is passed forward and backward through the neural network five times.

The `batch_size=128` parameter is the number of samples per gradient update. The model weights are updated after propagation of 128 samples.

The `validation_split=0.2` parameter specifies the fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. Here, 20% of the data will be used for validation.

In summary, this code is training a neural network model using the training data (`train_images` and `train_labels`), for a specified number of epochs (5), with a specified batch size (128), and using a portion of the training data for validation (20%). The training progress is displayed as a progress bar (due to `verbose=1`).

In [8]:
model.fit(train_images, train_labels, 
          epochs=5, 
          batch_size=128, 
          validation_split=0.2, verbose=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x132edbc50>

#### STEP 4: EVALUATE MODEL ON UNSEEN TEST DATA

In [9]:
loss, accuracy = model.evaluate(test_images, test_labels)
print(accuracy)

0.9786999821662903


#### STEP 5: PREDICT CLASS

In [10]:
model.predict(test_images[0:1])



array([[3.8276971e-07, 1.2232002e-07, 7.2351636e-06, 9.9634439e-05,
        3.6654117e-11, 1.4482663e-07, 2.0489065e-11, 9.9988163e-01,
        2.7736899e-06, 8.0777618e-06]], dtype=float32)

In [11]:
print(np.argmax(model.predict(test_images[0:1]), axis=-1))

[7]


#### STEP 6: MODEL SUMMARY

In [12]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 512)               401920    
                                                                 
 dense_1 (Dense)             (None, 10)                5130      
                                                                 
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


#### STEP 7: SAVE MODEL

In [13]:
model.save("mnist.h5")