In [1]:
%%capture
# Download dependcies
%pip install tensorflow_datasets

In [3]:
%%capture
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras import Sequential, losses, metrics, layers
import numpy as np

1. Get the data
2. Create the neural network (AI Model)
3. Train the model
4. Evaluate the model

In [6]:
# download dataset from tfds
(train_dataset, test_dataset), info = tfds.load('mnist', split=['train', 'test'], as_supervised=True, with_info=True)

1. Creating training and testing dataset

In this section we use tensorflow datasets to download the Mnist dataset of thousands of handwritten digits.

We create distince training and testing datasets for the purpose of training the model, and subsequenty 
evaluating the dataset. 

We will use code to view the dataset


In [8]:
tfds.as_dataframe(train_dataset.take(5), info)

2022-12-02 14:09:00.454003: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


Unnamed: 0,image,label
0,,4
1,,1
2,,0
3,,7
4,,8


At https://knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=mnist you can examine the MNIST dataset with more options. 

2. Create Model

In [10]:
model = Sequential()
model.add(layers.Input(shape=(28, 28, 1)))
model.add(layers.Rescaling(scale=1./255))
model.add(layers.Flatten())
model.add(layers.Dense(units=64))
model.add(layers.Dense(units=10))

model.compile(optimizer='adam',
              loss=losses.SparseCategoricalCrossentropy(from_logits=True), 
              metrics=metrics.SparseCategoricalAccuracy()
             )
          
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 rescaling (Rescaling)       (None, 28, 28, 1)         0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 64)                50240     
                                                                 
 dense_1 (Dense)             (None, 10)                650       
                                                                 
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________


The model is composed of a few layers. We will break them down here. 

The first layer is the input layer. This tell the model what size input to expect. While this is not striclty necessary in this 
circumstance, it makes the idea of input shape more understandable. 

The second layer is a rescaling layer. This layer takes all of the datapoints in scales them (linearly) between 0 and 1. 
Rescaling the domain (data) between 0 and 1 increases model convergence. The number 255 was chosen based on the range of 
possible datapoints being between 0 and 255. Example: (0, 255) / 255 --> (0/255, 255/255) = (0, 1)

The third layer is a flattening layer. Images are 2 dimensional (arguable with idea of channels, but irrelevant to the purpose of
this text). Our feed-forward multilayer perceptron (MLP) can only take 1 dimensional data. Because of this, we need a way to convert
the 2D image to 1D. This can be done by flattening the image. By taking every row of pixels and stacking them next to 
eachother (as displayed in the image below), you are able to turn 2D data into 1D data. The drawback of this method is that it 
destroys all spatial continuity. Imagine trying to identfiy an image that is flat. It's pretty difficult! This issue was solved
with the introduction of the CNN based off of the human visual system. 

The fourth layer is the first hidden layer. This is where the neural network first begins to influence the data. The other layers 
are simply data processing. There can be an arbitrary amount of nodes in this layer, but standard practice is to chose an amount
of nodes in multiples of twos (Ex. 32, 64, 128, 256..)

The fifth layer is the output layer of the model. This layer is required to have the same amout of nodes as the amount of classes. 
Each output represents a different confidence for each class. For example, an image classifier looking to determine
the differences between Cats and Dogs will have 2 outputs. The may be the confidence the model thinks it's a Cat, and the second
is the confidence the model think it's a Dog. Ex. [-1.523, 5.2715]. This output is what is known as logits. The index of the highest
number in the output is the models prediction. These logits can be converted to percentage confidence throuhg a softmax activation
function. Ex. [0.13, 0.87] The model is 13% confident the input is a cat, and 87% confident the model is a dog. In order to
achieve percentages in this form, a softmax activation funciton is necessary. Without it, the model will
output logits which are NOT a percentage confidence, but instead a relative percentage. Ex. [-1.523, 5.2715]. These logits are then 
converted to percentage confidence through the softmax activation function. 

<img src="https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/73_blog_image_1.png" alt="Flattened Image"/>

3. Train Model

In [11]:
model.fit(train_dataset.batch(64), epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f01541ee6a0>

4. Evaluate Model

In [68]:
sample = train_dataset.take(1)
tfds.as_dataframe(sample, info)

Unnamed: 0,image,label
0,,4


In [80]:
for x, y in sample: 
    image = tf.reshape(x, (1, 28, 28, 1))
    output = model(image)
    print('Model Output: ', output)
    print('Prediction: ', np.argmax(output.numpy()))
    print('Label: ', y)

Model Output:  tf.Tensor(
[[-4.722601    0.91229534 -0.5110078  -1.4698832   6.3419003   1.2759331
   1.2624689  -1.1037034   2.5184054   2.4678671 ]], shape=(1, 10), dtype=float32)
Prediction:  4
Label:  tf.Tensor(4, shape=(), dtype=int64)
