## Sections
* Network Design Principles
* Kernel Methods
* Neural Networks
* Linear Algebra and Probability for Deep Learning
* Linear Classification / Linear Regression Models
* Computational Graphs
* Stochastic Gradient Descent (SGD)
* Activation Functions
* Convolutions
* Optimization & Normalization
* Residual Networks
* Training Optimization
* Reinforcement Learning

---

## Network Design Principles

**Activations**
- Always try ReLU first. 
- If you have peristant problems with dead ReLU aactivations try Leaky ReLU and PReLU.
- Rarely use sigmoid or tanh as these functions backprop gradients poorly.

**Loss Functions**
- For continuous/regression output:
    - L1 Loss / MSELoss
- For classification with 2 classes:
    - BCEWithLogitsLoss
- For more than 2 classes:
    - CrossEntropyLoss

**Optimizer**
- Generally use SGD with momentum.
    - Momentum parameter rho should be set to 0.9.
- For training small networks or working with small datasets use ADAM.
    - ADAM B_1 parameter should be set to 0.9, B_2 parameter should be set to 0.999.

**Learning Rate**
- Try 1e-3 as the first default learning rate.
- Use a learning rate scheduler to reduce learning rate at loss plateaus.
- Learning rate general rule of thumb: use the largest learning rate that will train.

**Batch Size**
- Batch size / learning rate scaling rule: When the batch size is multiplied by K, multiply the learning rate by K.
 
**Overfitting**
- Methods for addressing overfitting:
    * Dropout layers
    * Weight Decay
        * When using weight decay use 1e-4 as the default decay param

**Convolution Neural Network Design Principals**
- Trade spatial resolution (large x and y dimensions) for many channels.
    - A good rule of thumb is to use stride=2 and output channels C2 = input channels C1 * 2.
    - This increases the number of channels with each convolution and shrinks the x and y dimensions.
- Keep kernels small.
    - A 3x3 kernel should be used almost everywhere, apart from the first layer where a 7x7 kernel may be used.

---


![](./NoteFiles/DL100.png)
![](./NoteFiles/DL101.png)
![](./NoteFiles/DL102.png)
![](./NoteFiles/DL103.png)
![](./NoteFiles/DL104.png)
![](./NoteFiles/DL105.png)

---

![](./NoteFiles/DL106.png)
![](./NoteFiles/DL107.png)
![](./NoteFiles/DL108.png)

---

![](./NoteFiles/DL109.png)
![](./NoteFiles/DL110.png)
![](./NoteFiles/DL111.png)
![](./NoteFiles/DL112.png)
![](./NoteFiles/DL113.png)
![](./NoteFiles/DL114.png)
![](./NoteFiles/DL115.png)
![](./NoteFiles/DL116.png)
![](./NoteFiles/DL117.png)

---

![](./NoteFiles/DL118.png)
![](./NoteFiles/DL119.png)
![](./NoteFiles/DL120.png)
![](./NoteFiles/DL121.png)
![](./NoteFiles/DL122.png)
![](./NoteFiles/DL123.png)
![](./NoteFiles/DL124.png)
![](./NoteFiles/DL125.png)

---

![](./NoteFiles/DL126.png)
![](./NoteFiles/DL127.png)
![](./NoteFiles/DL128.png)
![](./NoteFiles/DL129.png)
![](./NoteFiles/DL130.png)
![](./NoteFiles/DL131.png)
![](./NoteFiles/DL132.png)
![](./NoteFiles/DL133.png)
![](./NoteFiles/DL134.png)
![](./NoteFiles/DL135.png)
![](./NoteFiles/DL136.png)
![](./NoteFiles/DL137.png)
![](./NoteFiles/DL138.png)
![](./NoteFiles/DL139.png)
![](./NoteFiles/DL140.png)
![](./NoteFiles/DL141.png)
![](./NoteFiles/DL142.png)

---


---