## Introduction to neural network classification with Tensorflow

In this notebook we're going to learn how to write neural networks for classification problems.

A classification is where you try to classify something as one thing or another
* Binary classification
* Multiclass classification
* Multilabel classification

## Creating Data to view and fit

In [None]:
import tensorflow as tf
import numpy as np

In [None]:
from sklearn.datasets import make_circles

# Make 1000 circles
n_samples = 1000

# Create circles
X, Y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)

In [None]:
tf.constant(X), Y[:10]

Our data is a little hard to understand right now... Let's visualize it!

In [None]:
import pandas as pd
circles = pd.DataFrame({"X0":X[:, 0], "X1":X[:, 1], "label":Y})
circles

In [None]:
# Visualize with plot
import matplotlib.pyplot as plt

plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.RdYlBu)

## Steps in modelling

The steps in modelling with Tensorflow are typically

1. Create or import a model
2. Compile the model
3. Fit the model
4. Evaluate the model
5. Tweak
6. Evaluate...

In [None]:
# Set the random seed
tf.random.set_seed(42)

# 1. Create the model using Sequential API
model_1 = tf.keras.Sequential([
    tf.keras.layers.Dense(100),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(1),
])

# 2. Compile the model
model_1.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# 3. Fit the model
model_1.fit(X, Y, epochs=150, verbose=0)

In [None]:
# Evaluate the model
model_1.evaluate(X, Y)

To visualize our  model's predictions, let's create a function `plot_decision_boundary()`, this function will:
    
* Take in a trained model, features (X) and labels(Y)
* Create a meshgrid of the different X values
* Make predictions across the meshgrid
* Plot the predictions as well as line between zones (where each unique class falls)

In [None]:
def plot_decision_boundary(model, X, Y):
    """
    Plots the decision boundary created by a model predicting on X.
    """
    # Define the axis boundary of the plot and create the meshgrid
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    
    # Create X value (we're going to make predictions on this)
    x_in = np.c_[xx.ravel(), yy.ravel()]
    
    # Make predictions
    y_pred = model.predict(x_in)
    
    # Check for multi-class
    if len(y_pred[0]) > 1:
        print("Doing multiclass classification")
        # We have to reshape our predictions to get them ready for plotting
        y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
    else:
        print("Doing binary classification")
        y_pred = np.round(y_pred).reshape(xx.shape)
        
    # Plot the decision boundary
    plt.contourf(xx, yy, y_pred, cmap=plt.cm.Spectral, alpha=0.7)
    plt.scatter(X[:, 0], X[:, 1], c=Y, s=40, cmap=plt.cm.Spectral)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())   

In [None]:
# Check out the predictions our model is making
plot_decision_boundary(model=model_1, X=X, Y=Y)

In [None]:
x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1

x_min, x_max, y_min, y_max

## The Missing Piece: Non-linearity

In [None]:
# Set random seed
tf.random.set_seed(42)

# Create the model
model_2 = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model
model_2.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit the model
his = model_2.fit(X, Y, epochs=100, verbose=0)

In [None]:
model_2.evaluate(X, Y)

In [None]:
# Check the decision boundary for our latest model
plot_decision_boundary(model=model_2, X=X, Y=Y)

🤔 **Question:** What's wrong with the predictions we've made? Are we really evaluating our model correctly? Hint: What data did the model learned on and what data did we predict on?

In [None]:
# Create a toy tensor (similar to the data we pass into our models)
A = tf.cast(tf.range(-10, 10), tf.float32)
A

In [None]:
# Let's start by replicating sigmoid - sigmoid(x) = 1/(1+exp(-x))
def sigmoid(x):
    return 1 / (1 + tf.exp(-x))

# Using this sigmoid function now on our toy tensor
sigmoid(A)
   
plt.plot(sigmoid(A))

In [None]:
# Let's recreate the relu function
def relu(x):
    return tf.maximum(0, x)

# Let's plot our toy tensor using relu function
plt.plot(relu(A))
    

## Evaluating and Improving our classification model

So far we've been training and testing on the same dataset...

However, in machine learning this is a sin,

So let's create a training and test set.

In [None]:
# Check how many examples we have
len(X)

# Split into train and tests sets
X_train, Y_train = X[:800], Y[:800]

X_test, Y_test = X[800:], Y[800:]

In [None]:
# Let's recreate a model to fit in the training data and evaluating in the testing data

# Set a random seed
tf.random.set_seed(42)

# Create a model
final_model = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compile the model
final_model.compile(loss="binary_crossentropy",
                    optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
                    metrics=["accuracy"])

# Fit the model
his = final_model.fit(X_train, Y_train, epochs=25, verbose=0)

In [None]:
# Evaluating the model
final_model.evaluate(X_test, Y_test)

In [None]:
# Let's plot the decision boundary
plot_decision_boundary(model=final_model, X=X_test, Y=Y_test)

## Plot the loss (or training) curves

In [None]:
pd.DataFrame(his.history).plot()
plt.title("final_model loss curves")

🔑 **Note:** For many problems, the loss function goinng down means the model is improving (The predictions it's making are getting closer to the ground truth labels).

## Finding the best learning rate

To find the ideal learning rate (The learning rate where the loss decreases the most during the training) we're going to use the following steps:
* A learning rate **callback** - we can think of a callback as an extra piece of functionality, we can add to the model while its training.
* Another  model (we could use the same one as above, but we're practicing building models here).
* A modified loss curves plot.

In [None]:
# Set random seed
from gc import callbacks


tf.random.set_seed(42)

# Create the model
model_3 = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compiling the model
model_3.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# Create a learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20))

# Fit the model
his = model_3.fit(X_train, Y_train, epochs=100, callbacks=[lr_scheduler], verbose=0)

In [None]:
# Evaluating the model_3
model_3.evaluate(X_test, Y_test)

In [None]:
# Visualizing the plot history
pd.DataFrame(his.history).plot(figsize=(10, 7))

In [None]:
tf.random.set_seed(42)

# Create the model
model_4 = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

# Compiling the model
model_4.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.02),
                metrics=["accuracy"])

# Fit the model
model_4.fit(X_train, Y_train, epochs=20, verbose=0)

In [None]:
# Evaluating the model_4
model_4.evaluate(X_test, Y_test)

## More classification evaluation methods

Alongside visualizing our models results as much as possible, there are a handful of other classification evaluation methods & metrics we should be fimiliar with
* Accuarcy
* Precision
* Recall
* F1-score
* Confusion matrix
* Classification report (from sklearn) - [Read the Docs](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)

In [None]:
# Check the accuracy of our model
loss, accuracy = model_4.evaluate(X_test, Y_test)
print(f"Model loss on the test set: {loss}")
print(f"Model accuracy on the test set: {(accuracy*100):.2f}%")

## How about a confusion Matrix


In [None]:
# Craete confusion matrix
from sklearn.metrics import confusion_matrix

# Make predictions 
Y_pred = model_4.predict(X_test)

# Create a confusion_matrix
confusion_matrix(Y_test, tf.squeeze(Y_pred))

In [None]:
Y_test

Looks like our prediction array has turned out in **prediction probability** form... The standard output from the sigmoid activation functions

In [None]:
# Convert our prediction probabilities to binary format and view the first 10
Y_pred1 = tf.round(Y_pred)
Y_pred1 =tf.cast(Y_pred1, dtype=tf.int64)
Y_pred1 = tf.squeeze(Y_pred1)
Y_pred1

In [None]:
# Create a confusion_matrix
from sklearn.metrics import confusion_matrix
cf_matrix = confusion_matrix(Y_test, Y_pred1)

How about prettify our confusion matrix?

In [None]:
import seaborn as sns

def make_confusion_matrix(matrix, cmap, title, xlabel, ylabel, classes=None):
    ax = sns.heatmap(matrix, annot=True, cmap=cmap)
    ax.set_title(f"{title}\n\n")
    ax.set_xlabel(f"\n{xlabel}")
    ax.set_ylabel(f"{ylabel}")

    # Plotting the Matrix
    plt.show()

make_confusion_matrix(cf_matrix, "Oranges", "Seaborn Confusion Matrix", "Predicted Values", "Actual Values")    

## Working with Multiclass Classification

When we have more than two classes as an option, it's known as **multi-class-classfication**.
* This means if we have 3 differrent classes, it's a multi-class-classfication.
* It also means if you have 100 different classes, it's a multi-class-classfication.

To practice multi-class-classfication, we're going to build a neural network to classify images of different items of clothing.



In [None]:
from keras.datasets import fashion_mnist
import matplotlib.pyplot as plt

# The data is already been sorted into training and test sets for us
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

In [None]:
# Show the first training sample
print(f"Training sample:\n{train_data[2]}\n")
print(f"Training labels: {train_labels[2]}")

In [None]:
# Check the shape of a single example
train_data[2].shape, train_labels[2].shape

In [None]:
# Plot a single sample
plt.imshow(train_data[7]), train_labels[7]

In [None]:
# Plot multiple random images of fashion MNIST
import random
plt.figure(figsize=(5, 5))
for item in range(4):
    ax = plt.subplot(2, 2, item+1)
    rand_index = random.choice(range(len(train_data)))
    plt.imshow(train_data[rand_index], cmap=plt.cm.binary)
    plt.axis(False)

## Building a multiclass classification model

For our multiclass classifiaction model we can use the similar architecture to our binary classifiers, however, we're going to have to tweak a few things:
* Input shape = 28 x 28 (the shape of one image)
* Output shape = 10 (one per class of clothing)
* Loss function = tf.keras.losses.CategoricalCrossentropy()
  * If the labels are one-hot encoded use the CategoricalCrossentropy(). 
  * If the labels are in integer form use the SparseCategoricalCrossentropy().
* Output layer activation = **softmax** instead of **sigmoid**


In [None]:
import tensorflow as tf

In [None]:
train_labels.shape

In [None]:
# Set random seed
tf.random.set_seed(42)

# Create a model
clothing_model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])

# Compile the model
clothing_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                       optimizer=tf.keras.optimizers.Adam(),
                       metrics=["accuracy"])

# Fit the model
his = clothing_model.fit(train_data, train_labels, epochs=10, validation_data=(test_data, test_labels), verbose=0)

In [None]:
# Evaluating the model
clothing_model.evaluate(test_data, test_labels)

In [None]:
# Check the model summary
clothing_model.summary()

In [None]:
# Check the min and max values of the training data
train_data.min(), train_data.max()

Neural networks prefer data to be scaled (or normalized), this means they like to have the numbers in tensors they try to find patternsbetween 0 & 1.

In [None]:
# We can get our training and testing data between 0 & 1 by dividing by the maximum
train_data_norm = train_data / 255
test_data_norm = test_data / 255

In [None]:
# Now our data is normalized, let's build a model to find patterns in it
tf.random.set_seed(42)

# Creating a model
clothing_model_norm = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])

# Compile the model
clothing_model_norm.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                            optimizer=tf.keras.optimizers.Adam(),
                            metrics=["accuracy"])

# Fit the model
his_norm = clothing_model_norm.fit(train_data_norm, train_labels, epochs=10, verbose=0, validation_data=(test_data_norm, test_labels))

In [None]:
# Evaluating the model
clothing_model_norm.evaluate(test_data_norm, test_labels)

In [None]:
import pandas as pd
# Plot non-normalized data loss curves
pd.DataFrame(his.history).plot(title="Non-normalized data")

In [None]:
# Plot normalized data loss curves
pd.DataFrame(his_norm.history).plot(title="Normalized data")

> 🔑 **Note:** The same model with even *slightly* different data can produce *dramatically* different results. So when comparing, it's important to make sure we're comparing them on the same criteria (e.g. Same architecture but different data or same data but different architecture)

## Training our model for a longer period of time


In [None]:
# Now our data is normalized, let's build a model to find patterns in it
tf.random.set_seed(42)

# Creating a model
clothing_model_norm_1 = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])

# Compile the model
clothing_model_norm_1.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                            optimizer=tf.keras.optimizers.Adam(),
                            metrics=["accuracy"])

# Fit the model
his_norm = clothing_model_norm_1.fit(train_data_norm, train_labels, epochs=20, verbose=0, validation_data=(test_data_norm, test_labels))

In [None]:
# Evaluate the model
clothing_model_norm_1.evaluate(test_data_norm, test_labels)

In [None]:
# Make predictions using our model
predictions = clothing_model_norm_1.predict(test_data_norm)
predictions

In [None]:
test_labels

In [None]:
train_labels[0]

In [None]:
# Creating the confusion matrix of our model
clothing_matrix = confusion_matrix(y_true=test_labels, y_pred=predictions.argmax(axis=1))
clothing_matrix

In [None]:
# Plotting our confusion matrix
make_confusion_matrix(clothing_matrix, "Blues", "Clothing confusion matrix", "Predicted Values", "Actual Values")

> 🔑 **Note:** Often when working with images and other forms of visual data, it's a good idea to visualize as much as possible to devlop a furthur understanding of the data and the inputs and the outputs of your models.

How about we create a fun little function for:
* Plot a random image
* Make a prediction on said image
* label the model with the truth table and the predicted table