# Deep learning in computer vision
## 1. Image and Classification Fundamentals: 

Four steps in the deep learning classification pipeline:
    - Gathering our dataset
    - Splitting our data into training, testing, and validation steps, 
    - Training our network,
    - Finally evaluating our model.

## 2. Parameterized:

Components of parameterized learning:
    1. Data: - In the context of image classification, our input data is our dataset of images. 
    2. Scoring function: The scoring function produces predictions for a given input image.
    3. Loss function: The loss function then quantifies how good or bad a set of predictions are over the dataset.
    4. Weights and biases: The weight matrix (W) and bias (b) vectors are what enable us to actually “learn” from the input data – these parameters will be tweaked and tuned via optimization methods in an attempt to obtain higher classification accuracy.

Hinge loss and cross-entropy loss:
- Hinge loss: $L_i = \sum_{j \neq y_i} max(0,s_i - s_{y_i} + 1) $
- Cross-entropy loss: $L_i =  -log(e^{s_{y_i}}/ \sum_{j}e^{s_j}) $

where ${s_j} $ predicted score of the j-th class via the i-th data point:
    ${s_j = f(x_i, W)}$
## 3. Optimization Methods:

  - Most important aspect of machine learning, neural networks, and deep learning is optimization.

#### Gradient descent: (Optimization Methods):
    Gradient descent algorithms are controlled via a learning rate:
    There are two types of gradient descent:
    1. The standard vanilla flavor: Vanilla gradient descent performs only one weight update per epoch,
     making it very slow (if not impossible) to converge on large datasets.
    
    2. The stochastic version that is more commonly used: since it applies multiple weight updates per epoch by computing the gradient on small mini-batches.
    
        By using SGD we can dramatically reduce the time it takes to train a model while also enjoying lower loss and higher accuracy.

Pseudocode for Gradient Descent ( standard vanilla flavor version)

In [None]:
while True:
    W_gradient = evaluate_gradient(loss, data, W)
    W += -alpha * W_gradient


1. Looping until some condition is met, typical are: 
    + Specified number of epochs has passed.
    + Our loss has become sufficiently low or training accuracy satisfactory high.
    + Or loss has not improved in M subsequent epochs.
2. Then calls a function named evaluate_gradient. 
This function requires three parameters:
    1. loss: A function used to compute the loss over our current W and input data.
    2. data: Our training data where each training sample is represented by an image.
    3. W: Our actual weight matrix that we are optimizing over.
    Our goal is to apply gradient descent to find a W that yields minimal loss.
3. We then apply gradient descent. We multiply our W_gradient by alpha (a), our learning rate.
    The learning rate controls the size of our step.

Pseudocode for Gradient Descent ( SGD version)

In [None]:
while True:
    batch = next_training_batch(data, 256)
    W_gradient = evaluate_gradient(loss, batch, W)
    W += -alpha * W_gradient

The only difference between vanilla gradient descent and SGD is the addition of
the next_training_batch function.

Instead of computing our gradient over the entire data set, we instead sample our data,
yielding a batch. We evaluate the gradient on the batch, and update our weight matrix W.
We also try to randomize our training samples before applying SGD since the algorithm is
sensitive to batches

Typical batch sizes include 32, 64, 128 and 256


## 4. Regularization
Regularization helps us control our model capacity, ensuring that our models are better at
making (correct) classifications on data points that they were not trained on, which we call the
ability to generalize

Three common types of regularization there are applied directly to the loss function.
- L2 regularization (“weight decay”): 

- L1 regularization which takes the absolute value rather than the square:

- Elastic Net regularization seeks to combine both L1 and L2 regularization:

In deep learning and neural networks,  the L2 regularization used commonly for image classification 
– the trick is tuning the alpha parameter to include just the right amount of regularization.

## 5. Neural Network (artificial)
Implement with keras: Link to [neutral_net](neutral_net.ipynb)
### Perceptron architecture:
![Perceptron](images/Selection_003.png)
### Perceptron Training Procedure 
1. Initialize our weight vector w with small random values
2. Until Perceptron converges:
    - Loop over each feature vector $x_j$ and true class label $d_i$ in our training set D
    - Take x and pass it through the network, calculating the output value: $y_j = f(w(t)·xj)$
    - Update the weights w: ${w}_i (t +1) = w_i(t)+ \alpha(d_j − y_j)x_{ji}$  for all features $0 <= i <= n$

### Multi-layer Networks:
Backpropagation is the most important algorithm in neural network: Backpropagation can be considered
the cornerstone of modern neural networks and deep learning.
1. The forward pass where our inputs are passed through the network and output predictions obtained.
2. The backward pass where we compute the gradient of the loss function at the final layer (i.e.,
predictions layer) of the network and use this gradient to recursively apply the chain rule to
update the weights in our network.
(Backpropagation: efficiently train neural networks and “teach” them to learn from their mistakes.)


### Neutral Network Recipe:
- Dataset
- Loss Function ('categorical cross-entropy') ('binary cross-entropy')
- Model/Architecture: 
    1. How many data points you have.
    2. The number of classes.
    3. How similar/dissimilar the classes are.
    4. The intra-class variance.
- Optimization Method: SGD (Stochastic Gradient Descent)


## 6. Convolutional Neural Networks (CNNs)
### Convolutions: Link to [convolutions](convolutions.ipynb)
### CNN Building Blocks:Link to [convolutions](cnn_building_block.ipynb)


## 7. Learning Rate Schedulers
## 8. Underfitting and Overfitting
## 9. Checkpointing Models
## 10. Architecture Visualization
## 11. The (Mini) VGGNet Architecture
