Based on the [Beginner Quickstart](https://www.tensorflow.org/tutorials/quickstart/beginner) and [Keras Basics](https://www.tensorflow.org/tutorials/keras/classification) articles and the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.

In [None]:
%reset -f

### Machine Learning

A branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

### Import the Dataset

The MNIST dataset contains 70,000 images of handwritten digits: 60,000 to train the model and 10,000 to test it.

Note: You can access this dataset directly from tensorflow.

In [None]:
import tensorflow as tf

mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Now we have 4 numpy arrays:

* Training set: `train_images` and `train_labels`, which is the data the model uses to learn.
* Test set: `test_images` and `test_labels`, which is the data the model is tested against.

### Explore the Data

Let's explore the format of the dataset before training the model.

There are 60,000 images in the training set, with each image represented as 28 x 28 pixels.

In [None]:
train_images.shape

Each label is an integer between 0 and 9.

In [None]:
print(f"   Min: {min(train_labels)}")
print(f"   Max: {max(train_labels)}")
print(f"Labels: {train_labels}")

There are 10,000 images in the test set. Again, each image is represented as 28 x 28 pixels.

In [None]:
test_images.shape

And 10,000 corresponding test labels.

In [None]:
test_labels.shape

### Preprocess the Data

The data must be preprocessed before training the network. Look at the first image in the training set to see that the pixel values fall in the range of 0 to 255.

In [None]:
import matplotlib.pyplot as plt

plt.imshow(train_images[0], cmap=plt.cm.binary)
plt.colorbar()
plt.show()

Scale these values to a range of 0 to 1 before feeding them to the neural network model. To do so, divide the values by 255. It's important to preprocess the training and the testing sets in the same way.

In [None]:
train_images = train_images / 255.0
test_images = test_images / 255.0

Display the first 10 images from the training set with corresponding labels to verify that the data is in the correct format.

In [None]:
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(train_labels[i])
plt.show()

### Build the Model

Building the neural network requires two steps: configuring the layers of the model and compiling it.

#### Set up the layers

The basic building block of a neural network is a layer. Layers extract representations from the data fed into them. Hopefully, these representations are meaningful for the problem at hand.

In general, deep learning consists of chaining together simple layers. Most layers, such as `tf.keras.layers.Dense`, have parameters that are learned during training.

In [None]:
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10),
    ]
)

The first layer, `tf.keras.layers.Flatten`, transforms the format of the images from a two-dimensional array (28 x 28 pixels) to a one-dimensional array (28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn: it only reformats the data.

After the pixels are flattened, the network consists of two `tf.keras.layers.Dense` layers. These are densely (fully) connected neural layers. The first `Dense` layer has 128 nodes (or neurons). The second layer returns a logits array with length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes.

The `tf.keras.layers.Dropout` layer randomly sets input units to 0 with the specified `rate` at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by `1/(1 - rate)` such that the sum over all inputs is unchanged.

#### Compile the model

Before the model is ready for training, it needs a few more settings, added during the model's compile step:

* *Optimizer* updates the model based on the data it sees and its loss function.
* *Loss function* measures how accurate the model is during training. Optimizer minimizes the loss function to "steer" the model in the right direction.
* *Metrics* are used to monitor the training and testing steps. The following example uses `accuracy`—the fraction of the images that are correctly classified.

In [None]:
model.compile(
    optimizer="adam",
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

### Train the Model

#### Feed the model

To start training, call the `model.fit` method to "fit" the model to the training data. It adjusts the model parameters to minimize the loss.

In [None]:
_ = model.fit(train_images, train_labels, epochs=5)

#### Evaluate accuracy

Next, compare how the model performs on the test dataset.

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

print(f"\nTest accuracy: {test_acc * 100:.2f}%")

#### Make predictions

With the model trained, you can use it to make predictions about some images.

For each example the model returns a vector of [logits](https://developers.google.com/machine-learning/glossary#logits). The `tf.nn.softmax` function converts these logits to probabilities, which are easier to interpret.

In [None]:
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)

Look at the first prediction.

In [None]:
predictions[0]

A prediction is an array of 10 numbers. They represent the model's "confidence" that the image corresponds to each of the 10 different digits.

You can see which label has the highest confidence value.

In [None]:
import numpy as np

np.argmax(predictions[0])

So, the model is most confident that this image is 7.

Examining the test label shows that this classification is correct.

In [None]:
test_labels[0]

#### Verify predictions

With the model trained, you can use it to make predictions.

In [None]:
def plot_image(predictions, true_label, image):
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(image, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions)
    if predicted_label == true_label:
        color = "blue"
    else:
        color = "red"

    plt.xlabel(
        f"{predicted_label} {100 * np.max(predictions):2.0f}% ({true_label})",
        color=color,
    )

def plot_value_array(predictions_array, true_label):
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    plt.ylim([0, 1])

    plot = plt.bar(range(10), predictions_array, color="#777777")
    predicted_label = np.argmax(predictions_array)
    plot[predicted_label].set_color("red")
    plot[true_label].set_color("blue")

Look at two predictions, one right and one wrong. Correct prediction labels are blue and incorrect prediction labels are red.

In [None]:
right = np.argwhere((np.argmax(predictions, axis=1) - test_labels) == 0).flatten()
i = np.random.choice(right, 1)[0]
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1)
plot_image(predictions[i], test_labels[i], test_images[i])
plt.subplot(1, 2, 2)
plot_value_array(predictions[i], test_labels[i])
plt.show()

In [None]:
wrong = np.argwhere((np.argmax(predictions, axis=1) - test_labels) != 0).flatten()
i = np.random.choice(wrong, 1)[0]
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1)
plot_image(predictions[i], test_labels[i], test_images[i])
plt.subplot(1, 2, 2)
plot_value_array(predictions[i], test_labels[i])
plt.show()

Plot several images with their predictions. Note that the model can be wrong even when very confident.

In [None]:
num_rows = 2
num_cols = 3
num_images = num_rows * num_cols

right_sample = np.random.choice(right, int(np.ceil(num_images / 2)))
wrong_sample = np.random.choice(wrong, int(np.floor(num_images / 2)))
images = np.concatenate((right_sample, wrong_sample))
np.random.shuffle(images)

plt.figure(figsize=(2 * 2 * num_cols, 2 * num_rows))
for index, image in enumerate(images):
    plt.subplot(num_rows, 2 * num_cols, 2 * index + 1)
    plot_image(predictions[image], test_labels[image], test_images[image])
    plt.subplot(num_rows, 2 * num_cols, 2 * index + 2)
    plot_value_array(predictions[image], test_labels[image])
plt.tight_layout()
plt.show()

## Use the Trained Model

Now we'll use the model to make a prediction about our image.

Draw a digit and save it as `digit.jpg`. The picture should be square.

In [None]:
image = tf.keras.preprocessing.image.load_img(
    "digit.jpg", color_mode="grayscale", target_size=(28, 28)
)
image = 1 - tf.keras.preprocessing.image.img_to_array(image).reshape(28, 28) / 255
image.shape

In [None]:
true_label = int(input("Enter the digit you drew:"))

In [None]:
plt.imshow(image, cmap=plt.cm.binary)
plt.colorbar()
plt.show()

`tf.keras` models are optimized to make predictions on a batch (collection) of examples at once. Accordingly, even though you're using a single image, you need to add it to a list.

In [None]:
image = np.expand_dims(image, 0)
image.shape

Now predict the label for the image.

In [None]:
prediction = probability_model.predict(image)
prediction

`tf.keras.Model.predict` returns a list of lists—one list for each image in the batch of data. Grab the predictions for our (only) image in the batch:

In [None]:
prediction = prediction[0]
np.argmax(prediction)

In [None]:
plt.figure(figsize=(6, 3))
plt.subplot(1, 2, 1)
plot_image(prediction, true_label, image[0])
plt.subplot(1, 2, 2)
plot_value_array(prediction, true_label)
plt.show()