# Deep Learning

## Github Repos

- [Deep Learning](https://github.com/udacity/deep-learning)
- [Deep Learning with Pytorch](https://github.com/udacity/deep-learning-v2-pytorch)

## Introductioin to Neural Networks

### Gradient Descent
[Principles and the math behind the gradient descent algorithm](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/Introduction%20to%20Neural%20Networks/Gradient%20Descent.pdf)

#### Error Function

- The error function should be differentiable
- THe error function should be continuous

### Activation Function

#### Gradient Descent Algorithm

- Sigmoid activation function

$$\sigma(x) = \frac{1}{1+e^{-x}}$$

- Derivative of the sigmoid function
$$\sigma'(x)=\sigma(x)(1-\sigma(x))$$

- Output (prediction) formula

$$\hat{y} = \sigma(w_1 x_1 + w_2 x_2 + b)$$

- Error function

$$Error(y, \hat{y}) = - y \log(\hat{y}) - (1-y) \log(1-\hat{y})$$

- The function that updates the weights

$$ w_i \longrightarrow w_i + \alpha (y - \hat{y}) x_i$$

$$ b \longrightarrow b + \alpha (y - \hat{y})$$


```python
# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def output_formula(features, weights, bias):
    return sigmoid(np.dot(features, weights) + bias)

def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)

def update_weights(x, y, weights, bias, learnrate):
    output = output_formula(x, weights, bias)
    d_error = y - output
    weights += learnrate * d_error * x
    bias += learnrate * d_error
    return weights, bias
```

### One-hot Encoding
Use the `get_dummies` function in Pandas in order to one-hot encode the data.

```python
# Make dummy variables for rank
one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1)
```

### Maximum Likelihood
- log(ab) = log(a) + log(b)

### Cross Entropy
A higher cross-entropy implies a lower probability for an event. (cross-entropy is inversely proportional to the total probability of an outcome.)

- A good model gives a low cross entropy
- A bad model gives a high cross entropy

$$
CE = - \sum_{i=1}^m y_i ln(p_i) + (1-y_i) ln (1-p_i)
$$

#### Coding Cross-entropy
```python
# Y is for the category, and P is the probability.

import numpy as np

def cross_entropy(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))
```

### Logistic Regression
1. Start with random weights: $w_1, ... , w_n, b$
2. For every point $(x_1, ... , x_n)$: update $w_i, b$
3. Reapeat until error is small

### Neural Network Architecture
- Input Layer
- Hidden Layer
- Output Layer

### Feedforward

### Backpropagation
- Doing a feedforward operation.
- Comparing the output of the model with the desired output.
- Calculating the error.
- Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
- Use this to update the weights, and get a better model.
- Continue this until we have a model that is good.

#### Backpropagate the error
$$ (y-\hat{y}) \sigma'(x) $$

```python
def error_term_formula(x, y, output):
    return (y - output)*sigmoid_prime(x)
```

[Lab: Analyzing Student Data](../../notebooks/01%20Introduction%20to%20Neural%20Networks/StudentAdmissions.ipynb)

## Implementing Gradient Descent

### Mean Squared Error Function
$$
E=\frac{1}{2m}\sum_{\mu}(y^{\mu}-\hat{y}^{\mu})^2
$$

- [Gradient Descent](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/02%20Implementing%20Gradient%20Descent/Gradient%20Descent.pdf)
- [Gradient Descent Code](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/02%20Implementing%20Gradient%20Descent/Gradient%20Descent%20Code.pdf)
- [Gradient Descent Implementing](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/02%20Implementing%20Gradient%20Descent/Gradient%20Descent%20Implementing.pdf)
- [Multilayer Perceptrons](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/02%20Implementing%20Gradient%20Descent/Multilayer%20Perceptrons.pdf)
- [Backpropagation](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/02%20Implementing%20Gradient%20Descent/Backpropagation.pdf)
- [Backpropagation Implementing](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/02%20Implementing%20Gradient%20Descent/Backpropagation%20Implementing.pdf)

Further reading
- From Andrej Karpathy: [Yes, you should understand backprop](https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b#.vt3ax2kg9)
- Also from Andrej Karpathy, [a lecture from Stanford's CS231n course](https://www.youtube.com/watch?v=59Hbtz7XgjM)

## Training Neural Network

### Overfitting and Underfitting

- Overfitting -> high variance
- Underfitting -> high bias

![earlyStopping](./img/earlyStopping.png)

### Regularization
Large coefficients -> overfitting
- L1 Error Function: Good for feature selection
$$= -\frac{1}{m} \sum_{i=1}^m y_i ln(\hat{y}_i) + (1-y_i) ln (1-\hat{y}_i) + \lambda(|w_1|+...+|w_n|)$$
- L2 Error Function: Normally better for training models
$$E = -\frac{1}{m} \sum_{i=1}^m y_i ln(\hat{y}_i) + (1-y_i) ln (1-\hat{y}_i) + \lambda(w_1^2+...+w_n^2)$$

### Dropout
Prevent overfitting

### Random Restart
Jump out the local minima

### Vanishing Gradient
- Hyperbolic tangent function
$$tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

- Rectified Linear Unit (ReLU)
$$
relu(x)=
\begin{cases}
x & if x\ge 0\\
0 & if x<0
\end{cases}
$$

### Batch vs Stochastic Gradient Descent
Decrease training time

### Learning Rate Decay
Rule:
- If steep: long steps
- If plain: small steps

### Momentum
Solve local minmum problem.
- STEP: average of previous steps
- $\beta$: momentum
- STEP(n) $\rightarrow$ STEP(n) + $\beta$ STEP(n-1) + $\beta^2$ STEP(n-2) + ...




## [Deep Learning with TensorFlow](https://github.com/udacity/intro-to-ml-tensorflow)

### Build Neural Network
[Part 1 Introduction to Neural Networks with TensorFlow](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_1_Introduction_to_Neural_Networks_with_TensorFlow_(Solution).ipynb)

[Part 2 Neural networks with TensorFlow and Keras](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_2_Neural_networks_with_TensorFlow_and_Keras_(Solution).ipynb)

- `tf.multiply()`: Performs element-wise multiplication on two inputs
- `tf.matmul()`: Performs matrix multiplication on two inputs
- `tf.reduce_sum()`: Computes the sum of elements across an input tensor's dimensions
- `tf.convert_to_tensor()`: convert ndarray to a TensorFlow tensor
- `tensor.numpy()`: command on the tensor itself to convert it to an ndarray

There are [plenty of different datasets](https://www.tensorflow.org/datasets/catalog/overview) available from the `tensorflow_datasets` library, which we shortened in the code to `tfds`. Loading one of the datasets is simple with the `tfds.load()` function, which takes in the dataset name (in this case `mnist`), as well as some other optional arguments such as: 1) the dataset split to get (training, test, validation), 2) whether to shuffle the data, 3) if the data is to be used as part of a supervised learning algorithm (including labels), 4) whether to include metadata about the dataset itself, and [more](https://www.tensorflow.org/datasets/api_docs/python/tfds/load).

You can use the `.take()` function with an integer as an argument to get a certain number of images at once from the dataset.

#### Pipelines

- [Pipeline Performance](https://www.tensorflow.org/guide/data_performance)
- [Transformations](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)

#### Softmax

To calculate this probability distribution, we often use the [**softmax** function](https://en.wikipedia.org/wiki/Softmax_function). Mathematically this looks like

$$
\Large \sigma(x_i) = \cfrac{e^{x_i}}{\sum_k^K{e^{x_k}}}
$$

TensorFlow also includes one of its own built-in Softmax activation functions you can use. Using the [TensorFlow API documentation]

- `tf.nn.softmax`
- `tf.math.softmax`
- `tf.keras.activations.softmax`

#### Neural Networks with TensorFlow

Keras helps further simplify working with neural networks running on TensorFlow under the hood. You can more easily stack layers with `tf.keras.Sequential`, making sure to feed an `input_shape` to the first layer of the network. You can also either add separate `Activation` layers, or feed an activation as an argument within certain layers, such as the `Dense` fully-connected layers.

Example:
```python
model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape = (28,28,1)),
        tf.keras.layers.Dense(256, activation = 'sigmoid'),
        tf.keras.layers.Dense(10, activation = 'softmax')
])
```

#### Subclassing
```python
class Network(tf.keras.Model):
    def __init__(self, num_classes = 2):
        super().__init__()
        self.num_classes = num_classes
    
        # Define layers 
        self.input_layer = tf.keras.layers.Flatten()
        self.hidden_layer = tf.keras.layers.Dense(256, activation = 'relu')
        self.output_layer = tf.keras.layers.Dense(self.num_classes, activation = 'softmax')
    
    # Define forward Pass   
    def call(self, input_tensor):
        x = self.input_layer(input_tensor)
        x = self.hidden_layer(x)
        x = self.output_layer(x)
    
        return x


# Create a model object
subclassed_model = Network(10)

# Build the model, i.e. initialize the model's weights and biases
subclassed_model.build((None, 28, 28, 1))

subclassed_model.summary()
```

#### Adding Layers with .add

Example:
```python
layer_neurons = [512, 256, 128, 56, 28, 14]

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape = (28,28,1)))

for neurons in layer_neurons:
    model.add(tf.keras.layers.Dense(neurons, activation='relu'))
            
model.add(tf.keras.layers.Dense(10, activation='softmax'))
          
model.summary() 
```

#### Clearing the Graph

In order to avoid clutter from old models in the graph, we can use:

```python
tf.keras.backend.clear_session()
```

This command deletes the current `tf.keras` graph and creates a new one.


### Train Neural Network
[Part 3 Training Neural Networks](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_3_Training_Neural_Networks_(Solution).ipynb)

Before we can train our model we need to set the parameters we are going to use to train it. We can configure our model for training using the `.compile` method. The main parameters we need to specify in the `.compile` method are:

* **Optimizer:** The algorithm that we'll use to update the weights of our model during training. Throughout these lessons we will use the [`adam`](http://arxiv.org/abs/1412.6980) optimizer. Adam is an optimization of the stochastic gradient descent algorithm. For a full list of the optimizers available in `tf.keras` check out the [optimizers documentation](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/optimizers#classes).


* **Loss Function:** The loss function we are going to use during training to measure the difference between the true labels of the images in your dataset and the predictions made by your model. In this lesson we will use the `sparse_categorical_crossentropy` loss function. We use the `sparse_categorical_crossentropy` loss function when our dataset has labels that are integers, and the `categorical_crossentropy` loss function when our dataset has one-hot encoded labels. For a full list of the loss functions available in `tf.keras` check out the [losses documentation](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/losses#classes).


* **Metrics:** A list of metrics to be evaluated by the model during training. Throughout these lessons we will measure the `accuracy` of our model. The `accuracy` calculates how often our model's predictions match the true labels of the images in our dataset. For a full list of the metrics available in `tf.keras` check out the [metrics documentation](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/metrics#classes).

These are the main parameters we are going to set throught these lesson. You can check out all the other configuration parameters in the [TensorFlow documentation](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Model#compile)

Example:
```python
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
```

#### Training the Model

Now let's train our model by using all the images in our training set. Some nomenclature, one pass through the entire dataset is called an *epoch*. To train our model for a given number of epochs we use the `.fit` method, as seen below:

```python
EPOCHS = 5

history = model.fit(training_batches, epochs = EPOCHS)
```

The `.fit` method returns a `History` object which contains a record of training accuracy and loss values at successive epochs, as well as validation accuracy and loss values when applicable. We will discuss the history object in a later lesson. 

With our model trained, we can check out it's predictions.

```python
## Build model
my_model = tf.keras.Sequential([
           tf.keras.layers.Flatten(input_shape = (28,28,1)),
           tf.keras.layers.Dense(128, activation = 'relu'),
           tf.keras.layers.Dense(64, activation = 'relu'),
           tf.keras.layers.Dense(32, activation = 'relu'),
           tf.keras.layers.Dense(10, activation = 'softmax')
])


my_model.compile(optimizer='adam',
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])


## Train model
EPOCHS = 5

history = my_model.fit(training_batches, epochs = EPOCHS)


## Predict model
for image_batch, label_batch in training_batches.take(1):
    ps = my_model.predict(image_batch)
    first_image = image_batch.numpy().squeeze()[0]
```


### Train Neural Network on Complex Dataset
[Part 4 Fashion MNIST](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_4_Fashion_MNIST_(Solution).ipynb)

[Part 5 Inference and Validation](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_5_Inference_and_Validation_(Solution).ipynb)

### Inference & Validation

We used `tfds.Split.ALL.subsplit` to make a 60/20/20 split for training, validation and test sets, although some TensorFlow datasets have these subsections already built in. Depending on the dataset, you may also want to make sure to shuffle the data at this point as well.

Avoid overfitting to the training data?
- Stop training when the training and validation curves start to diverge by a certain amount
- Save down the best validation accuracy model from during training
- Add layers like Dropout to help generalize the network


### Saving & Loading
[Part 6 Saving and Loading Models](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_6_Saving_and_Loading_Models.ipynb)

In TensorFlow we can save our trained models in different formats. Here we will see how to save our models in TensorFlow's SavedModel format and as HDF5 files, which is the format used by Keras models.

#### Saving and Loading Models in HDF5 Format

To save our models in the format used by Keras models we use the `.save(filepath)` method. For example, to save a model called `my_model` in the current working directory with the name `test_model` we use:

```python
my_model.save('./test_model.h5')
```

It's important to note that we have to provide the `.h5` extension to the `filepath` in order the tell `tf.keras` to save our model as an HDF5 file. 

The above command saves our model into a single HDF5 file that will contain:

* The model's architecture.
* The model's weight values which were learned during training.
* The model's training configuration, which corresponds to the parameters you passed to the `compile` method.
* The optimizer and its state. This allows you to resume training exactly where you left off.


In the cell below we save our trained `model` as an HDF5 file. The name of our HDF5 will correspond to the current time stamp. This is useful if you are saving many models and want each of them to have a unique name. By default the `.save()` method will **silently** overwrite any existing file at the target location with the same name. If we want `tf.keras` to provide us with a manual prompt to whether overwrite files with the same name, you can set the argument `overwrite=False` in the `.save()` method.

```python
t = time.time()

saved_keras_model_filepath = './{}.h5'.format(int(t))

model.save(saved_keras_model_filepath)
```

Once a model has been saved, we can use `tf.keras.models.load_model(filepath)` to re-load our model. This command will also compile our model automatically using the saved training configuration, unless the model was never compiled in the first place.

```python
reloaded_keras_model = tf.keras.models.load_model(saved_keras_model_filepath)
```

#### Saving and Loading TensorFlow SavedModels

To export our models to the TensorFlow **SavedModel** format, we use the `tf.saved_model.save(model, export_dir)` function. For example, to save a model called `my_model` in a folder called `saved_models` located in the current working directory we use:

```python
tf.saved_model.save(my_model, './saved_models')
```

It's important to note that here we have to provide the path to the directory where we want to save our model, **NOT** the name of the file. This is because SavedModels are not saved in a single file. Rather, when you save your model as a SavedModel, `the tf.saved_model.save()` function will create an `assets` folder, a `variables` folder, and a `saved_model.pb` file inside the directory you provided.

The SavedModel files that are created contain:

* A TensorFlow checkpoint containing the model weights.
* A SavedModel proto containing the underlying TensorFlow graph. Separate graphs are saved for prediction (serving), training, and evaluation. If the model wasn't compiled before, then only the inference graph gets exported.
* The model's architecture configuration if available.

The SavedModel is a standalone serialization format for TensorFlow objects, supported by TensorFlow serving as well as TensorFlow implementations other than Python. It does not require the original model building code to run, which makes it useful for sharing or deploying in different platforms, such as mobile and embedded devices (with TensorFlow Lite), servers (with TensorFlow Serving), and even web browsers (with TensorFlow.js).

In the cell below we save our trained model as a SavedModel. The name of the folder where we are going to save our model will correspond to the current time stamp. Again, this is useful if you are saving many models and want each of them to be saved in a unique directory.

```python
t = time.time()

savedModel_directory = './{}'.format(int(t))

tf.saved_model.save(model, savedModel_directory)
```

Once a model has been saved as a SavedModel, we can use `tf.saved_model.load(export_dir)` to re-load our model. 

```python
reloaded_SavedModel = tf.saved_model.load(savedModel_directory)
```

It's important to note that the object returned by `tf.saved_model.load` is **NOT** a Keras object. Therefore, it doesn't have `.fit`, `.predict`, `.summary`, etc. methods. It is 100% independent of the code that created it. This means that in order to make predictions with our `reloaded_SavedModel` we need to use a different method than the one used with the re-loaded Keras model.

To make predictions on a batch of images with a re-loaded SavedModel we have to use:

```python
reloaded_SavedModel(image_batch, training=False)
```

This will return a tensor with the predicted label probabilities for each image in the batch. Again, since we haven't done anything new to this re-loaded SavedModel, then both the `reloaded_SavedModel` and our original `model` should be identical copies. Therefore, they should make the same predictions on the same images.

We can also get back a full Keras model, from a TensorFlow SavedModel, by loading our SavedModel with the `tf.keras.models.load_model` function. 

```python
reloaded_keras_model_from_SavedModel = tf.keras.models.load_model(savedModel_directory)
```

#### Saving Models During Training

We have seen that when we train a model with a validation set, the value of the validation loss changes through the training process. Since the value of the validation loss is an indicator of how well our model will generalize to new data, it will be great if could save our model at each step of the training process and then only keep the version with the lowest validation loss. 

We can do this in `tf.keras` by using the following callback:

```python
tf.keras.callbacks.ModelCheckpoint('./best_model.h5', monitor='val_loss', save_best_only=True)
```
This callback will save the model as a Keras HDF5 file after every epoch. With the `save_best_only=True` argument, this callback will first check the validation loss of the latest model against the one previously saved. The callback will only save the latest model and overwrite the old one, if the latest model has a lower validation loss than the one previously saved. This will guarantee that will end up with the version of the model that achieved the lowest validation loss during training.

### Loading Images with TensorFlow
[Part 7 Loading Image Data](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_7_Loading_Image_Data_(Solution).ipynb)

### Data Augmentation
`tf.keras` offers many other transformations that we can apply to our images. You can take a look at all the available transformations in the [TensorFlow Documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#arguments)

* rotation_range
* width_shift_range
* height_shift_range
* shear_range
* zoom_range
* horizontal_flip
* fill_mode

### Creating a Validation Data Generator
Generally, we only apply data augmentation to our training data. Therefore, for the validation set we only need to normalize the pixel values of our images.

### Pre-Notebooks with GPU

### Transfer Learning

[Transfer Learning](https://github.com/stephengineer/Introduction-to-Machine-Learning-with-TensorFlow/blob/main/Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Transfer%20Learning.pdf)

[Part 8 Transfer Learning](../../Deep%20Learning/04%20Deep%20Learning%20with%20TensorFlow/Notebooks/Part_8_Transfer_Learning_(Solution).ipynb)

## Deep Learning with PyTorch

Calculate the output of single layer network using `torch.sum()` or `.sum()` and __matrix multiplication__.

### Watch those shapes
In general, you'll want to check that the tensors going through your model and other code are the correct shapes. Make use of the `.shape` method during debugging and development.

A few things to check if your network isn't training appropriately
Make sure you're clearing the gradients in the training loop with `optimizer.zero_grad()`. If you're doing a validation loop, be sure to set the network to evaluation mode with `model.eval()`, then back to training mode with `model.train()`.


### CUDA errors
Sometimes you'll see this error:

```
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #1 ‘mat1’
```

You'll notice the second type is `torch.cuda.FloatTensor`, this means it's a tensor that has been moved to the GPU. It's expecting a tensor with type `torch.FloatTensor`, no `.cuda` there, which means the tensor should be on the CPU. PyTorch can only perform operations on tensors that are on the same device, so either both CPU or both GPU. If you're trying to run your network on the GPU, check to make sure you've moved the model and all necessary tensors to the GPU with `.to(device)` where `device` is either `"cuda"` or `"cpu"`.


[Tutorial: Deep Learning in PyTorch](http://iamtrask.github.io/2017/01/15/pytorch-tutorial/)

[Notebooks](https://github.com/stephengineer/Machine-Learning/tree/main/Deep%20Learning/05%20Deep%20Learning%20with%20PyTorch)

## Sentiment Analysis

[Project folder](https://github.com/stephengineer/Deep-Learning/tree/main/2_Neural_Networks/Sentiment_Analysis)