# Intro to Artificial Intelligence with Python

## Part VI - Neural Networks

Harvard CS50 Introduction to Artificial Intelligence with Python is an online course that I took in the Spring of 2020. It consisted of 6 lectures of which I have a notebook for each. Each lecture consisted of accompanying projects which are located in the projects folder in the same directory as this notebook.

[Course Link](https://cs50.harvard.edu/ai/)

[Lecture Link](https://www.youtube.com/watch?v=mFZazxxCKbw&list=PLhQjrBD2T382Nz7z1AEXmioc27axa19Kv&index=7)

## Artificial Neural Networks
Mathematical model for learning inspried by biological neural networks.
* Artificial neural networks model mathematical function from inputs to outputs based on the structure and paramters of the network 
* allows for learning the network's parameters based on data
* Rather than real nueurons from a biological network, ai neural networks use units (similar to the idea of a node) that represent artificial neurons, that connect to one another in a network. These units can pass along inputs and outputs

Recall from the ML lecture that a hypothesis function for a machine learning task took in inputs and produced outputs based on pre-determined weights like so:

* h(x1, x2) = w0 + w1 * x1 + w2 * x2

With machine learning, we saw that many of the algorithms of ML passed the hypothesis calculation results through an 'activation function' like so:

* h(x1, x2) = g(w0 + w1 * x1 + w2 * x2)
* the w0 is known as the bias value and is not assigned to any input

Some activation function are (note, some in lecture 5 notes):
* Step Functions: g(x) = 1 if x >= 0, else 0
* Logistic Sigmoid (s-shape curve) Functions: g(x) = e^x / e^x + 1
* Rectified Linear Unit (ReLU): g(x) = max(0, x)

To sum up, activiation functions get applied to the results of a hypothesis function and usually the results are some linear combination (rain, no rain) type of problems. 

The above example of a hypothesis with two inputs that have a solution passed through an activation function can represent the simplest of neural networks, where the two left units representing the inputs are connected to one output like so:

<img src='images/nn2.png'>

Note in the above image that the two inputs are connected to the output and each connection is weighted which is used gy the output g(x) function to calculate the results.

---

### Example of Simple Neural Network 
This example trains a neural network to be able to learn a specific function, in this case the Or function as pictured below (0 if False, 1 is True):

<img src='images/nn3.png'>

Using the simple network idea above, weights (1) and bias (-1) can be set in order to model the Or function like so:

<img src='images/nn5.png'>

In the above example, only 0 or 1 are possible choices and they are chosen by whether or not they cross the threshold (the dotted line or at 0).

In the below image:
* x1=0, therefore (x1 * w1) = (0 * 1) = 0
* x2=0, therefore (x2 * w2) = (0 * 1) = 0
* the total calculation is g(-1 + 0 + 0) or -1
* therefore the overall results is 0 (or False) as -1 is before the threshold of 0

<img src='images/nn8.png'>

One more example:
* x1=1, therefore (x1 * w1) = (1 * 1) = 1
* x2=0, therefore (x2 * w2) = (0 * 1) = 0
* the total calculation is g(-1 + 1 + 0) or 0
* therefore the overall results is 1 (or True) as 0 is the threshold

<img src='images/nn7.png'>

Using a simple model like above, some of the examples we already used in previous lectures can also be modeled like humidty, pressure determining rain or advertising dollars spent to sales

<img src='images/nn9.png'>

In all the previous examples, only two inputs were used, but multiple inputs can be added as well, the below images shows 5 inputs mapped to one output:

<img src='images/nn10.png'>

---

## Training Neural Networks (Determining  Ideal Weights)

**Gradient Descent** - Algorithm for minimizing loss(how off the hypothesis function is) when training a neural network

High level implementation of gradient descent with a neural network:
* Start with a random choice of weights
* Repeat: 
    * Calculate the gradient based on all data points direction that will lead to decreasing loss
    * Update weights according to the gradient

Because this algorithm makes calculations based on ALL DATA POINTS it is non-efficient for sufficiently large data sets.

Some more efficient alternatives to gradient descent:

**Stochastic Gradient Descent** - same setup as regular gradient decent except it just randomly chooses ONE data point at a time to perform weight calculations on rather than all the data points at once

**Mini-Batch Gradient Descent** -  same setup as regular gradient except it calculates based on small groups of data points or 'batches'


Neural Networks can have multple output layers just like they do for inputs and the only real difference to the calculations is that more weights have to be added (to match the number of outputs, so in this example each input would have four separate weights, one for each output, like so:

<img src='images/nn11.png'>

An example of muliple outputs would be with weather where the outcomes could represent the probabilty of each particular weather event. So the below diagram represents that based on the three inputs the model predicts that there is a 10% chance of rain, 60% chance of sun, and so on: 

<img src='images/nn12.png'>

The above networks allow for more complicated classifications more than just binary (rain/no rain) performed previously. Such networks can also be used with reinforcement learning to model ideal actions to take like so: 

<img src='images/nn13.png'>

The overall message here is that neural networks are broadly applicable to many different machine learning algorithms and problem types. 

To determine weights for multi-output neural networks, each output and inputs can be treated as separate neural networks, note in the image below, the three inputs have separate weights for each output, yet they all three have a mapping to each output, therefore each output and its particular weight mappings to the inputs can be viewed separately from the rest and calculated as individual neural networks

<img src='images/nn14.png'>

**Most of the above examples fall under the category of a Perceptron (recall that this is a binary classifier model, see lecture 5 notes for more info), which as a downside:**
* are only capable of learning linearly seprable decision boundries (data that can be split evenly with regression line), and therefore cannot calculate more complex data like the circle example from lecture 5 where a straight line would not work.


---
## Multilayer Neural Network
Artificial Neural Network with an input layer, an output layer, and at least one hidden layer (in the image below, the center layer is the hidden one). There can be multiple hidden layers inside of a network. Note that in multilayer networks, the input layer are not directly connected to the output layer but rather to the hidden layer. Weights are still handled exactly the same way between inputs, hidden layers, and output. 

<img src='images/nn15.png'>

---
### Training Multilayer Neural Network

**Backpropagation** - algorithm for training multilayer neural networks (or networks with hidden layers), the main algorithm that makes neural networks possible.

High level implementation of Backpropagation with a multilayer neural network:
* Start with a random choice of weights
* Repeat:
    * Calculate error for output layer
    * For each layer, starting with output layer, and moving inwards towards earliest hidden layer:
        * Propagate error back one layer
        * Update weights


**Deep Neural Network** - Neural Network with multiple hidden layers, see image below:

<img src='images/nn16.png'>

**Overfitting** - can occur in a deep neural network when too many hidden layers are created that prevent the input data from generalizing well to the output

**Dropout** - temporarily removing units (selected randomly) from a nerual network to prevent over-reliance on certain units, this is handled similarly to the epsilon greedy reinforcement algorithm from lecture 5 and project 5a 'Nim', where psuedo-random actions were made in order to train a game so that the ai didn't make the same move over and over. 

To see the dropout technique in action, a multilayer network may be trained by randomly dropping out some units like so:

<img src='images/nn17.png'>

At this point, once the weights that were trained are updated, then another random selection could be made like so and then after the weights would be re-updated and so on and so forth:

<img src='images/nn18.png'>

---
## Implementing Neural Networks in Code using TensorFlow
One of the more well known libraries for neural networks is TensorFlow (created by Google). 

[Tensorflow Playground Link](http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=&seed=0.68443&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)

In [2]:
# This is a Tensorflow Neural Network That will
# will help identify authentic and counterfeit banknotes 
# This code is discussed 45 minutes into lecture

import csv
import tensorflow as tf

from sklearn.model_selection import train_test_split

# Read data in from file
with open("banknotes.csv") as f:
    reader = csv.reader(f)
    next(reader)

    data = []
    for row in reader:
        data.append({
            "evidence": [float(cell) for cell in row[:4]],
            "label": 1 if row[4] == "0" else 0
        })

# Separate data into training and testing groups
evidence = [row["evidence"] for row in data]
labels = [row["label"] for row in data]
X_training, X_testing, y_training, y_testing = train_test_split(
    evidence, labels, test_size=0.4
)

# Create a neural network
model = tf.keras.models.Sequential()

# Add a hidden layer with 8 units, with ReLU activation
# Dense means that each hidden layer node will be connected to a previous node
# note activation function is the g(x) from initial basic example
model.add(tf.keras.layers.Dense(8, input_shape=(4,), activation="relu"))

# Add output layer with 1 unit, with sigmoid activation
model.add(tf.keras.layers.Dense(1, activation="sigmoid"))

# Train neural network
model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)
model.fit(X_training, y_training, epochs=20)

# Evaluate how well model performs
model.evaluate(X_testing, y_testing, verbose=2)

Train on 823 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
549/549 - 0s - loss: 0.1996 - accuracy: 0.9417


[0.19956535589499552, 0.9417122]

---

## Some Real-World Applications for Deep Neural Networks


### Computer Vision
computational methods for analyzing and understanding digital images

**Image Convolution** - applying a filter (matrix) that adds each pixel value of an image to its neighbors, weighted according to a kernal matrix

In [3]:
import math
import sys

from PIL import Image, ImageFilter

# Open image
image = Image.open('images/bridge.png').convert("RGB")

# Filter image according to edge detection kernel
filtered = image.filter(ImageFilter.Kernel(
    size=(3, 3),
    kernel=[-1, -1, -1, -1, 8, -1, -1, -1, -1],
    scale=1
))

# Show resulting image
filtered.show()

**Pooling** - reducing the size of an input by sampling from regions in the input (for example, only max pixels from certain quadrants)

### Convolutional Neural Network
Nerual Networks that use convolution, usually for analyzing images. This type of model combines convolution and pooling in phases in order to learn various features of an image. 
* Rather than start by taking a base image and putting all of its inputs into a hidden layer, this model starts with a convolution step, which involves applying filters to an image to get some resulting **feature map**
* Convolution is usually applied a number of times to get multiple feature maps, each of which might extract some imporant characterstic from an image
* After convolution is applied and feature maps are discovered, pooling is performed to reduce the size of the feature maps
* After pooling is complete and overall image features have been condensend, flattening can occur and then these results can be put into a neural network as inputs for each pixel value from each feature map. 

The image below visually depicts the above process for taking an image, applying convolution to get resulting feature maps, then applying pooling to the feature maps to condense them, then flattening the resulting condensed maps in order to use them as inputs for a neural network (note the first square on the left represents the initial image and all its pixels):

<img src='images/nn19.png'>

In many cases in a convolutional neural network, the above process is performed more than once in order to extract even more precise features. In the image below, the first round of convolution and pooling extracts low-level featurs of an image like its edges, curves, and shapes, then after a second round of convolution and pooling higher-level features like actual objects can be extracted:

<img src='images/nn20.png'>

### Convolution Neural Network Used for Handwriting Recognition

In [1]:
import sys
import tensorflow as tf

# Use MNIST handwriting dataset
mnist = tf.keras.datasets.mnist

# Prepare data for training
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
x_train = x_train.reshape(
    x_train.shape[0], x_train.shape[1], x_train.shape[2], 1
)
x_test = x_test.reshape(
    x_test.shape[0], x_test.shape[1], x_test.shape[2], 1
)

# Create a convolutional neural network
model = tf.keras.models.Sequential([

    # Convolutional layer. Learn 32 filters using a 3x3 kernel
    tf.keras.layers.Conv2D(
        32, (3, 3), activation="relu", input_shape=(28, 28, 1)
    ),

    # Max-pooling layer, using 2x2 pool size
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),

    # Flatten units
    tf.keras.layers.Flatten(),

    # Add a hidden layer with dropout
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.5),

    # Add an output layer with output units for all 10 digits
    tf.keras.layers.Dense(10, activation="softmax")
])

# Train neural network
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)
model.fit(x_train, y_train, epochs=10)

# Evaluate neural network performance
model.evaluate(x_test,  y_test, verbose=2)

# Save model to file (so can be reused without needing to train again)
filename = 'model.h5'
model.save(filename)
print(f"Model saved to {filename}.")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Train on 60000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
10000/10000 - 2s - loss: 0.0389 - accuracy: 0.9874
Model saved to model.h5.


**The above code created and trained a convolution neural network to understand various digits. After training was complete, the model was saved into a file called model.h5. This file can be used to actually run the model in a real setting. See recongnition.py to run this program** 

---

## Feed-Forward Neural Network (1 to 1 neural network)
A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle. As such, it is different from recurrent neural networks. The feedforward neural network was the first and simplest type of artificial neural network devised

A FFNN has connections only in one direction, from one layer to another and they only put in one vector of inputs and get one vector of outputs.

<img src='images/nn22.png'>

## Recurrent Neural Network (1 to many neural network)
A recurrent neural network is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their internal state to process variable length sequences of inputs

RNN's are particulalry useful when dealing with sequences of data. (example, videos which are sequences of images)

<img src='images/nn21.png'>
