<a href="https://colab.research.google.com/github/stevengiacalone/Python-workshop/blob/main/Session_7_Intro_to_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An Introduction to Deep Learning

In today's session, we'll dive into the world of machine learning. If this is your first exposure to the concept, you've probably heard many words that kind of sound like the same thing: artificial intelligence, machine learning, neural networks, deep learning, etc. What's the difference? These concepts are arranged in the following hierarchy:

<div>
<img src="http://apmonitor.com/do/uploads/Main/ai_overview.png" width="500"/>
</div>

Let's go over each of these concepts quickly.

## Artificial Intelligence

Artificial Intelligence is the field of developing computers and robots that are capable of behaving in ways that both mimic and go beyond human capabilities. AI-enabled programs can analyze and contextualize data to provide information or automatically trigger actions without human interference. [Source](https://ai.engineering.columbia.edu/ai-vs-machine-learning/)

## Machine Learning

Machine learning is a pathway to artificial intelligence. This subcategory of AI uses algorithms to automatically learn insights and recognize patterns from data, applying that learning to make increasingly better decisions. [Source](https://ai.engineering.columbia.edu/ai-vs-machine-learning/)

Broadly speaking, machine learning algorithms can be divided into two categories: **supervised learning** and **unsupervised learning**. In supervised learning, you train your model on *labeled* data (e.g., images that you know the correct classifications of). Supervised machine learning algorithms are often trained for the purpose of **regression** (i.e., predicting a continuous value) or **classification** (i.e., predicting a categorical value). In unsupervised learning, you train your model on *unlabeled* data, usually to identify unknown correlations and trends.

Here is a useful visualization of the difference between the two (from [here](https://www.labellerr.com/blog/supervised-vs-unsupervised-learning-whats-the-difference/)):

<div>
<img src="https://www.labellerr.com/blog/content/images/size/w2000/2023/02/bannerSupervised-vs.-Unsupervised-Learning-1.webp" width="700"/>
</div>

There are many types of machine learning algorithms that we will not have time to talk about here, but I encourage you to check them out if the topic interests you!

- Linear regression
- Naive Bayes
- K Nearest Neighbors
- Support Vector Machine
- Decision trees and random forests
- many more!

## Neural Networks

A neural network is a machine learning agorithm inspired by the human brain. In the brain, groups of interconnected units called neurons send signals to one another. Many of them combined can perform complex tasks. In the context of machine learning, these tasks can include regression and classification.

The simplest neural network, known as a **single-layer perceptron**, looks like this:

<div>
<img src="http://apmonitor.com/do/uploads/Main/neural_network.png" width="500"/>
</div>

In this example, the perceptron is *fully connected*, meaing every input is connected to every node. Inputs are combined with weights and summed in each node. The summed values are then sent to the output where they are put through an *activation function*, which determines their final value. Here's a visualization of how it works (sorry about the hard-to-read text):

<div>
<img src="https://static.javatpoint.com/tutorial/tensorflow/images/single-layer-perceptron-in-tensorflow2.png" width="800"/>
</div>

There are many choices for the [activation function](https://en.wikipedia.org/wiki/Activation_function), but the exact choice will usually depend on the the specifics of you're problem. Some popular activation functions include:
- linear
- [rectified linear unit](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) (a.k.a. ReLU)
- [logistic](https://en.wikipedia.org/wiki/Logistic_function)
- [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function)

When you *train* a neural network, you use an algorithm like [backpropogation](https://en.wikipedia.org/wiki/Backpropagation) to *optimize the values of the weights*.

## Deep Learning

A deep learning algorithm refers to any neural network with multiple layers of neurons. The simplest deep learning network is the **multi-layer perceptron**, shown below. The layers of neurons are known as *hidden layers*.

<div>
<img src="http://apmonitor.com/do/uploads/Main/deep_neural_network.png" width="700"/>
</div>

Each connection (line) in the diagram above has a corresponding weight that is optimized when the model is trained.

There are tons of types of deep networks. Some examples are shown in the graphic below (Source). The type of network you use typically depends on the problem that needs solving.

<div>
<img src="https://miro.medium.com/v2/resize:fit:2000/format:webp/1*cuTSPlTq0a_327iTPJyD-Q.png" width="800"/>
</div>

In this session, we'll be focusing on one specific deep learning architecture: the **Convolutional Neural Network** ([CNN](https://en.wikipedia.org/wiki/Convolutional_neural_network)). CNNs were first popularized for their ability to perform highly accurate image classification. In the exercise below, we'll be designing and training our own CNN using `tensorflow`, a popular Python package for designing neural networks. But first, let's go over the architecture of a CNN so that we know what's going on in the code.

## Convolutional Neural Networks

A CNN is a neural network that utilizes a combination of convolutional layers, pool layers, and dense (i.e., fully connected) layers. We'll go over each of these terms in a moment. Here is a graphic of a typical CNN ([Source](https://www.mdpi.com/1099-4300/19/6/242)):

<div>
<img src="https://www.mdpi.com/entropy/entropy-19-00242/article_deploy/html/images/entropy-19-00242-g001.png" width="800"/>
</div>

CNNs can applied on data with any number of dimensions, but in the exercise below we'll use it on 2D data (i.e., images).

#### Convolutional Layers

A convolutional layer is a layer in which an NxN convolutional filter passes along the input matrix. This is best explained via visualization. Imagine you have a convolutional filter (in this case, a 2x2 matrix) that looks like this ([Source](https://developers.google.com/machine-learning/glossary/#convolutional-layer)):

<div>
<img src="https://developers.google.com/static/machine-learning/glossary/images/ConvolutionalLayerFilter.svg" width="180"/>
</div>

And an input matrix that looks like this:

<div>
<img src="https://developers.google.com/static/machine-learning/glossary/images/ConvolutionalLayerInputMatrix.svg" width="500"/>
</div>

The filter is passed scanned along the input matrix like so to reduce the size of the input layer. At each location, we are multiplying each pixel of the input matrix by the corresponding filter pixel value, then summing everything together. So for the first position (the top left), we do this:


<div>
<img src="https://developers.google.com/static/machine-learning/glossary/images/ConvolutionalLayerOperation.svg" width="700"/>
</div>

This is repeated until you end up with a new, smaller matrix. For an input matrix of size NxN and a convolution filter of size MxM, the output matrix size will be LxL, where L = N - M - 1. Click [here](https://developers.google.com/static/machine-learning/glossary/images/AnimatedConvolution.gif) for a good animation of this process (which does not display well in Colab).

#### Pooling Layers

A pooling layer reduces the size of the previous layer by "pooling" in windows of NxN. Most often, the maximum or average of values in the pooling window are take. Here's a visualization ([Source](https://developers.google.com/machine-learning/glossary/#pooling)):

<div>
<img src="https://developers.google.com/static/machine-learning/glossary/images/PoolingConvolution.svg" width="800"/>
</div>

#### Dense Layers

A "dense layer" is just another term for a group of fully connected hidden layers. In CNNs, these are typically applied right at the end of the algorithm, before the output layer.

#### Hyperparameters

In any neural network, there will be a number of knobs to turn to change the architecture (and therefore performance) of the algorithm. These knobs are called *hyperparameters*. They range from everything to the number of convolution and pooling layers, to the size of the convolution filter, to the number of neurons in the dense layer, etc etc etc (you get the gist). There is no easy way to determine the optimal hyperparameters for your model, you just need to try a bunch to see what works best.

## Exercise

In this exercise, we will be training a CNN to classify images from the MNIST (Modified National Institute of Standards and Technology database), which contains many hand-written images of the digits 0-9. The dataset is often used to test the performance of image classification algorithms.


<div>
<img src="https://camo.githubusercontent.com/960ef680c26a95405b35e9e417955032cb0b0815b746cefd8ff95c73a49ae4ee/687474703a2f2f692e7974696d672e636f6d2f76692f3051493378675875422d512f687164656661756c742e6a7067" width="500"/>
</div>

We'll start by importing our Python packages and the data. The data is split into four categories:

- train_images: the images that we use to train the algorithm
- train_labels: the correct label corresponding to each image (i.e., the actual digit in the image)
- test_images: the images that we use to test the trained model
- test_labels: the correct label corresponding to each test images

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense

(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
print(len(train_images), "train images")
print(len(test_images), "test images")

Let's take a look at an image to see what we're working with. We'll also print the size of each image and the max value in each image array.

In [None]:
print("Size of each image =", train_images[0].shape)
print("Max value of each image =", train_images[0].max())
print()
print("First train image is a", train_labels[0])
print("First train image:")
plt.imshow(train_images[0]);

Let's do some quick pre-processing of the data by dividing all values in each image by 255 (the max value) so that everything is normalized to 1. We also need to change the shape of each to 28x28x1 (rather than the current 28x28).

In [None]:
train_images = train_images/255
test_images = test_images/255

train_images = train_images.reshape(len(train_images), 28, 28, 1)
test_images = test_images.reshape(len(test_images), 28, 28, 1)

Now it's time to define our model. Let's go with the following order:

- a 2D convolution layer with 32 filters and a filter (kernel) size of 3x3, applying a ReLU activation function
- a 2D convolution layer with 64 filters and a filter size of 3x3, applying a ReLU activation function
- a max pooling layer (pools according to the maximum value in the window) with a window size of 2x2
- a dropout layer (randomly activates and deactivates some neurons to remove biases)
- a dense layer with 128 neurons and a ReLU activation function
- another dropout layer
- a final dense layer with 10 neurons and a softmax activation function -- this will return the probability of each of the 10 labels (note that softmax activation functions are common when returning probabilities)

In [None]:
model = keras.Sequential()
# 32 convolution filters used each of size 3x3
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
# 64 convolution filters used each of size 3x3
model.add(Conv2D(64, (3, 3), activation='relu'))
# choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
# randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))
# flatten since too many dimensions, we only want a classification output
model.add(Flatten())
# fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
# one more dropout
model.add(Dropout(0.5))
# output a softmax to squash the matrix into output probabilities
model.add(Dense(10, activation='softmax'))

Now we need to define some things that will determine how the model does the calculation while training. Here, we are defining our optimizer algorithm, our loss function, and the metric on which we are training.

In [None]:
model.compile(optimizer=tf.compat.v1.train.AdamOptimizer(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Now we can train our model on our training images and labels. We will set the number of epochs to 5, meaning the algorithm will iteratively train 5 times, using the results of the previous iteration to inform the current one.

In [None]:
model.fit(train_images, train_labels, epochs=5)

The model is trained! Let's now assess the accuracy of the model on the *test* data (which the model has not yet seen).

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy =', test_acc)

We achieve a pretty good accuracy of ~99%! Let's now make some predictions.

In [None]:
# select some random test images
random_idcs = np.random.randint(10000, size=10)
# make predictions using the models
predictions = model.predict(test_images[random_idcs])

# plot a few of them
for i,idx in enumerate(random_idcs):
    print("True label is", test_labels[idx])

    fig, ax = plt.subplots(1, 2, figsize=(10,5))

    ax[0].imshow(test_images[idx])
    ax[1].bar(np.arange(10), predictions[i])
    ax[1].set_xlabel("Predicted Label")
    ax[1].set_ylabel("Probability")
    ax[1].set_xticks(np.arange(10))
    plt.show()

    print()

I'll let you give it a shot and let you design your own CNN with your own layers. Use the model from the cell above and change the layers and hyperparameters to whatever you want, but leave the first and last layers the same. If you want a suggestion, try the following:

- a 2D convulution cell (first layer)
- a max pooling layer
- a 2D convolution layer
- a max pooling layer
- a dense layer
- a dropout layer
- a dense layer with 10 neurons (last layer)

How does the accuracy of this model compare to the one we trained before?

#### Additional resources

Deep learning algorithms, including convolutional neural networks, are commonly used in physics and astronomy (and many other fields) when faced with large data sets that are impossible to parse by-eye. Many of these projects are hosted on [Zooniverse](https://www.zooniverse.org/) - I encourage you to check them out!

#### Acknoledgements

This exercise is inspired by the work of the [TensorFlow team](https://www.tensorflow.org).