<a href="https://colab.research.google.com/github/rajdeepbanerjee-git/JNCLectures_Intro_to_ML/blob/main/Week10/Lec10_HelloWorldOfDeepLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let's start with the "Hello world" of deep learning - the MNIST dataset - consisting of 60k train and 10k test images of handwritten digits (0 - 9). Each image is 28*28 pixels. We will try to clasify them into 10 classes.

In [1]:
from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [3]:
# the images are encoded as numpy array, we can therefore check their shape
print(train_images.shape, len(train_labels))
print(test_images.shape, len(test_labels))

(60000, 28, 28) 60000
(10000, 28, 28) 10000


Now, let's create the network architecture - a feed-forward neural network basically consists of layers of neurons or nodes. Each node in a layer receives signal from the previous layer nodes, processes (applies some function on it) and sends the output to the nodes in the next layer. The final layer in a classification case is generally a softmax layer.

In [4]:
# defining model architecture
from tensorflow import keras # as you will see this abstracts away all the operations
from tensorflow.keras import layers

model = keras.Sequential(
    [layers.Dense(512, activation = 'relu'),
    layers.Dense(10, activation = 'softmax')]
                        )
# single layer with 512 nodes - Dense - fully connected
# output layer has 10 neurons to output 10 classes
# for each image you will get a vector of probabilities length 10
# argmax of the vector will give you the class

In [5]:
# The compilation or defining the optimization details
model.compile(
    optimizer = "rmsprop",
    loss = "sparse_categorical_crossentropy",
    metrics = ["accuracy"]
)

In [6]:
# prepare the data: input training images have shape (60000, 28, 28) and values in the interval [0, 255] in uint8 type
# we will change them to shape (60000, 28*28) and scale the values to [0,1] interval in float32 type

train_images = train_images.reshape((60000, 28*28))
train_images.astype("float32")/255
test_images = test_images.reshape((10000, 28*28))
test_images.astype("float32")/255

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [7]:
# we are now ready for training ...
model.fit(train_images,
          train_labels,
          epochs = 5,
          batch_size = 128)
# providing batches ensures how many datapoints the gradient will be averaged over and updates

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7a23311a4070>

In [17]:
# making predictions on 10 test images
test_digits = test_images[0:10]
predictions_prob = model.predict(test_digits)
predicted_labels = predictions_prob.argmax(axis = 1)

print(f"test labels: {test_labels[0:10]}")
print(f"predicted labels: {predicted_labels}")


test labels: [7 2 1 0 4 1 4 9 5 9]
predicted labels: [7 2 1 0 4 1 4 9 6 9]


In [18]:
# calculating test accuracy on all test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

test_acc: 0.9642000198364258


In the next lecture, we will go through the basics of the training and try to connect back to this example.