# Chapter 14

In [None]:
from functools import partial

import tensorflow as tf
from tensorflow import keras

1. A CNN uses far fewer parameters than a fully connected network which means they are more computationally efficient for training & inference. For example, for a typical iphone picture of shape (4032, 3024, 3), a single fully connected layer of 100 neurons (probably not enough) would have $$(4032 \times 3024 \times 3\ + 1)\times 100 = 3,657,830,500$$ trainable parameters. Because a single filter uses the same weights over an entire image, it can learn patterns that are invariant to translations.

2.  Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels: 

    a. What is the total number of parameters in the CNN?

    - Each filter is going to have $3 \times 3 \times \#channels + 1$. 
    
    - This is 28 for the first layer, 901 for the second, and 1,801 for the third.

    - Since there are 100 filters in the first layer, 200 in the next, and 400 in the third layer, we have a total of:$$28 \times 100 + 901 \times 200 + 1,801 \times 400 = 903,400 \ \text{parameters.}$$

    b. If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance?

    - During inference the RAM occupied by one layer can be released as soon as the next layer has been computed.
    
    - Therefore, the total RAM needed to make predictions is as much RAM as required by two consecutive layers.

    - Because padding is set to "same" and stride is 2, the output of each layer will be half the size of its input (rounding up), $$(200, \ 300, \ 3) \rightarrow (100, \ 150, \ 100) \rightarrow (50, \ 75, \ 200) \rightarrow (25, \ 38, \ 400)$$

    - The first two layers have the largest outputs so the network will need at least enough RAM to hold the outputs of these layers, i.e. $$(100 \times 150 \times 100 + 50 \times 75 \times 200) \times 32 = 72,000,000 \ \text{bits},$$ or 9 MB.

    - The model itself also needs to be kept in RAM, so that's another $903,400 \times 32 = 28,908,800 \ \text{bits},$ or 3.61 MB.

    - This gives a total of 12.61 MB.

    c. What about when training on a mini-batch of 50 images?

    - The reverse pass of backpropagation requires all the intermediative values computed during the forward pass. so 
    
    - So the total RAM needed is at least the total amount of RAM required by all layers: $$50 \times 32 \times (100 \times 150 \times 100 + 50 \times 75 \times 200 + 25 \times 38 \times 400) = 4,208,000,000 \ \text{bits},$$ or 526 MB.

    - Plus the RAM needed for the model: 3.61 MB.

    - Plus the RAM needed for the input images: $50 \times 32 \times 200 \times 300 \times 3 = 288,000,000 \ \text{bits}$, or 36 MB.

    - This totals 565.61 MB (not including memory required to store the gradients).

In [None]:
Conv = partial(keras.layers.Conv2D, kernel_size=(3, 3), padding="same", strides=2)

model = keras.Sequential(
    [
        keras.layers.Input(shape=(200, 300, 3)),
        Conv(filters=100),
        Conv(filters=200),
        Conv(filters=400),
    ]
)

model.summary()

3. If your GPU runs out of memory during training you can try:

    - reduce the mini-batch size,

    - use 16-bit floats instead of 32-bit,

    - reduce the dimensionality by increasing the stride,

    - reduce the number of layers, or

    - distribute the CNN across multiple devices.
 
4. **Max pooling layer vs. convolutional layer with stride:** both methods shrink the image, reducing computational & memory load (and also reduce number of parameters in subsequent layers, reducing overfitting). A max pooling layer however introduces translation, scale, and rotational invariance to the model, which is ideal for tasks like classification
 
5. **Local response normalization:** a competitive normalization layer that should be added after convolutional layers to improve generalization. The most strongly activated neurons inhibit other neurons located at the same position in neighboring feature maps. This encourages different feature maps to specialize, pushing them apart and forcing them to explore a wider range of features.
 
6. Main innovations of CNN architectures compared to LeNet-5:

    - **AlexNet:**

    - **GoogleLeNet:**

    - **ResNet:**

    - **SENet:**

    - **Xception:**

    - **EfficientNet:**

7. A fully convolutional network can be created by replacing the dense layers at the top of a CNN with convolutional layers. The new convolutional layers should have as many features as neurons in the old dense layers and a kernel size that matches the image size of the previous convolutional layer with padding set to valid. The FCN is equivalent to the CNN except it can now process larger images and will output a result as if the old CNN swept across the larger image, but much more efficiently as it only needs to see the input once!

8. Semantic segmentation is difficult because images lose their spatial resolution as they pass through a regular CNN (due to pooling layers & convolutional layers with stride). One solution is to use transposed convolutional layers (like a regular convolutional layer with a fractional stride) to upsample the image.

9. Build a CNN to get the highest possible accuracy on MNIST had written digits dataset: [Kaggle competition](https://www.kaggle.com/competitions/digit-recognizer/leaderboard) & [my code](https://github.com/edwardbickerton/Kaggle-competitions/blob/main/digit-recognizer.ipynb).

## 10. Use transfer learning for large image classification

a. Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).

b. Split it into a training set, a validation set, and a test set.

c. Build the input pipeline, apply the appropriate preprocessing operations, and optionally add data augmentation.

d. Fine-tune a pretrained model on this dataset.

## 11. TensorFlow's [Style Transfer tutorial](https://homl.info/styletuto)