# Deep Learning  
*Use TensorFlow to take machine learning to the next level*

# 1. Intro to DL for Computer Vision  
*A quick overview of how models work on images*

- Convolutions are the basic building block for deep learning models in computer vision and many other applications.  
- Convolutions are filters that can are applied onto the images to obtain desired results. They can be: horizontal line sensors, vertical line sensors, round edges sensors, and so on.  
- An image is represented by a tensor (generalized matrix, specially for color images). A convolution is also a tensor. Thus applying a convolution is tensor algebra. 
- Specifically, the math involved to obtain the result after applying a convolution is a straigthforward term-by-term multiplication followed by a summation. 


- Convolutions don't have to be square tensors. You can have a $4x7$ convolutions wo problem, for instance.

# 2. Building Models From Convolutions
*Scale up from simple building blocks to models with beyond human capabilities*

- You can apply a series of filters (convolutions) in what is called a *layer*. 
- The resulting group of images after the application of the layer is a 3D tensor. Where the 3rd dimension is coming from all the filters that form the layer. The so called **channel dimension**
- You can continue applying more layer to find more details from a picture


# 3. TensorFlow Programming  
*Start writing code using TensorFlow and Keras*

We used the ResNet model and the VGG16 model, with pre-trained weights.

# 4. Transfer Learning
*A powerful technique to build highly accurate models even with limited data*

- Transfer learning is useful to build new models that classify data into new categories but using an already trained model.
- A model consists os several layers of filters plus a final layers that makes the prediction. When using Transfer Learning, that prediction layer is dropped and a *new prediction layer* is trained. 
- This new predictions layer has connections with all the nodes of the final layer of the filters (which has to be a vector) creating a so called *dense layer*. 
- The prediction values are then transformed into probabilities with a *softmax* function
- After the model is specified, the compilation is done. The optimizer function used in the example is `sgd`: *stochastic gradient descent*. The loss function is `categorical_crossentropy`, and the metrics is `accuracy`.
 - **optimizer** determines how we determine the numerical values that make up the model. So it can affect the resulting model and predictions
 - **loss** determines what goal we optimize when determining numerical values in the model. So it can affect the resulting model and predictions
 - **metrics** determines only what we print out while the model is being built, but it doesn't affect the model itself.

- After compilation comes the Fit of the model. 
- Here you have to create a train generator and a validation generator by using already classified images. 
- It is a very useful technique since it is possible to obtain high acciracy even with little new data. 

## Notes from exercises:

- Specifying and Compiling the model when doing Transfer Learning is not computationally expensive as no new data has been introduced to the model. 
- In the Fitting step, the new data is introduced and the process takes more time.
- when defining the *training generator* keep in mind the amount of fitted data to match the batch size and the epochs. 
- `steps_per_epochs` is a parameter of the `fit_generator` method of the model

Fitting stats are:  
`22/22 [==============================] - 24s 1s/step - loss: 0.4899 - accuracy: 0.7864 - val_loss: 0.3341 - val_accuracy: 0.8750`


---
# 5. Data Augmentation
*Learn a simple trick that effectively increases amount of data available for model training*

There are a few parameters that can be used when generating images from the preprocessing Function `ImageDataGenerator` that increase the amount of images you can use for your training step. Some of these parameters are: `horizontal_flip` (*boolean*), to get mirror images of the original data; `width_shift_range` (and `height_...`) (*float from 0 to 1*), to shift the position of the image which will result in a slightly different image, as well. 

- When using Data Augmentation it is best to create 2 image generators, with and without augmentation. This way you test your model using the one without augmentation to have a standard model. 
- Another parameter for data augmentation is `rotation_range`
- Data augmentation affects images without touching the labels, so you have to be careful when using it!

Here is some example code with added data augmentation parameters

```python
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, GlobalAveragePooling2D

num_classes = 2
resnet_weights_path = '../input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

my_new_model = Sequential()
my_new_model.add(ResNet50(include_top=False, pooling='avg', weights=resnet_weights_path))
my_new_model.add(Dense(num_classes, activation='softmax'))

my_new_model.layers[0].trainable = False

my_new_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])


from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_size = 224

# Specify the values for all arguments to data_generator_with_aug.
data_generator_with_aug = ImageDataGenerator(preprocessing_function=preprocess_input,
                                              horizontal_flip = True,
                                              width_shift_range = 0.1,
                                              height_shift_range = 0.1)
            
data_generator_no_aug = ImageDataGenerator(preprocessing_function=preprocess_input)


# Specify which type of ImageDataGenerator above is to load in training data
train_generator = data_generator_with_aug.flow_from_directory(
        directory = '../input/dogs-gone-sideways/images/train',
        target_size=(image_size, image_size),
        batch_size=12,
        class_mode='categorical')

# Specify which type of ImageDataGenerator above is to load in validation data
validation_generator = data_generator_no_aug.flow_from_directory(
        directory = '../input/dogs-gone-sideways/images/val',
        target_size=(image_size, image_size),
        class_mode='categorical')

my_new_model.fit_generator(
        train_generator,# if you don't know what argument goes first, try the hint
        epochs = 3,
        steps_per_epoch=19,
        validation_data=validation_generator)
```

### How could you test whether data augmentation improved your model accuracy?

Create `train_generator` usng `data_generator_no_aug` but don't change other arguments to `train_generator`.

Run the model and see the resuling accuracy. Compare this to the accuracy you got when train_generator used augmentation.

Our validation dataset is very small, so there's a little bit of luck or randomness in the exact score from any model run. Validation scores will be more reliable as you start using larger datasets.

# 6. A Deeper Understanding of Deep Learning  
*How Stochastic Gradient Descent and Back-Propagation train your deep learning model*

- *Dense and Convolutional Layers*
- The better the weights, the better the predictions
- Loss, gradient-descent and backwards-propagation are the steps to get better weights
- You want to minimize the loss function -> get predictions closer to target 
- Back-propagation calculates for how much the weights have to be changed
- learning rate is a scaling factor to those weights changes

# 7. Deep Learning From Scratch  
*Build models without transfer learning. Especially important for uncommon image types.*

- You have to define the hidden layers as well as the predictions layer. 
- The most used activation function for the hidden layers is the [RELU](https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning)
- The output of the hidden layers has to be one-dimensional, we thus **Flatten()** them out before the prediction layer. 
- These model perform better when you add a Dense layer before the precition layer.
- You fit the model using your data, and you can define the ratio of data saved as validation data. 
- We you have all your data stored in arrays, you can fit the model with a `fit()` function directly, instead of the `fit_generator()` function used when data was created with the `ImageDataGenerator` class (?)


Sample code to create a model from scratch
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.python import keras
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, Conv2D, Dropout


img_rows, img_cols = 28, 28
num_classes = 10

def data_prep(raw):
    out_y = keras.utils.to_categorical(raw.label, num_classes)

    num_images = raw.shape[0]
    x_as_array = raw.values[:,1:]
    x_shaped_array = x_as_array.reshape(num_images, img_rows, img_cols, 1)
    out_x = x_shaped_array / 255
    return out_x, out_y

train_file = "../input/digit-recognizer/train.csv"
raw_data = pd.read_csv(train_file)

x, y = data_prep(raw_data)

model = Sequential()
model.add(Conv2D(20, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(img_rows, img_cols, 1)))
model.add(Conv2D(20, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer='adam',
              metrics=['accuracy'])
model.fit(x, y,
          batch_size=128,
          epochs=2,
          validation_split = 0.2)
```

# 8. Dropout and Strides for Larger Models  
*Make your models faster and reduce overfitting*

- Stride lengths to make your model faster and reduce memory consumption
- Dropout to combat overfitting

Both of these techniques are especially useful in large models.

- Stride is a change in the *steps* made by the convolution when being applied to the data. If set to 2, then the filter moves 2 pixels at a time, instead of one. 
- This causes the resulting output size to be smaller (1/2 * 1/2)
- As the representation going to the next latyer is smaller, the whole model is much faster
- Another technique used to reduce output size is *max pooling*, but changing the stride length is more effective 
- Dropout consists on dropping nodes or convolutions in some part of the training. The dropped nodes are randomly selected. 
- It prevents the possible domination of one node throughout the whole training.
- You set the ratio of dropped convolutions as a float between 0 and 1. 
- This method is very effective to prevent overfitting. 
- Before Dropout was used, peopled reduced the number of layers or convolutions to fights overfitting.

Sample code using these techniques:
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.python import keras
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, Conv2D, Dropout

img_rows, img_cols = 28, 28
num_classes = 10

def data_prep(raw):
    out_y = keras.utils.to_categorical(raw.label, num_classes)

    num_images = raw.shape[0]
    x_as_array = raw.values[:,1:]
    x_shaped_array = x_as_array.reshape(num_images, img_rows, img_cols, 1)
    out_x = x_shaped_array / 255
    return out_x, out_y

train_size = 30000
train_file = "../input/digit-recognizer/train.csv"
raw_data = pd.read_csv(train_file)

x, y = data_prep(raw_data)

model = Sequential()
model.add(Conv2D(30, kernel_size=(3, 3),
                 strides=2,
                 activation='relu',
                 input_shape=(img_rows, img_cols, 1)))
model.add(Dropout(0.5))
model.add(Conv2D(30, kernel_size=(3, 3), strides=2, activation='relu'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer='adam',
              metrics=['accuracy'])
model.fit(x, y,
          batch_size=128,
          epochs=2,
          validation_split = 0.2)
```
