# Reusing Pretrained Layers
- generally not a good idea to train a very large DNN from scratch
- instead: **should always try to find an existing neural network that accomplishes a similar task to the one I am trying to tackle, then reuse the lower layers of this network**
    - called *"Transfer learning"*
    - speeds up training and requires significantly less data
    
<img src="images/TransferLearning.jpeg" width=360/>

- output layer should always be replaced because it is most likely not seful at all for the new task, and may not even have the right number of outputs for the new task
- The more similar the tasks are, the more layers you want to use
    - if the tasks are very similar, just replace the output layer
- **strategy**
    - freeze all the reused layers
    - train model to see how it performs
    - unfreeze one or two of the top hidden layers to let backpropagation change them 
    - see if performance improves
    - *the more training data, the more layers can be unfrozen*
    - *reduce the learning rate when unfreezing reused layers to avoid wrecking fine-tuned weights*
    - if performance is still not good:
        - small training data
            - drop the top hidden layers
            - freeze remaining hidden layers again
            - iterate until right number of layers to reuse is found
        - plenty of training data
            - replace the top hidden layers instead of dropping them
            - or add more hidden layers

**Transfer Learning works best with CNNs**
- Works well with CNNs b/c lower layers of the network detect features that are much more general
- Doesn't work well with small dense networks:
    - because they learn few patterns
- Doesn't work well with dense networks: 
    - because they learn very specific patterns

## Unsupervised Pretraining
- Use when I cannot find a model trained on a similar task I'm working on, and I cannot gather anymore *labeled* data
- Gather more *unlabeled* data and train an unsupervised model, such as an autoencoder or a generative adversarial network (GAN)
    - Then, use the lower layers of the autoencoder or of the GAN's discriminator
    - Then, add the output layer for the task on the top of the network
    - Then, fine-tune the final network using supervised learning w/ the labeled training data
- *this is the technique that Geoffrey Hinton and his team used in 2006 that led to the revival of NN and success of Deep Learning* 

## Pretraining on an Auxiliary Task
- If there is not much training data, traing a NN on an auxiliary task to obtain or generate labeled training data
    - Then, reuse the lower layers for the atual task
    - explanation:
        - The first NN's lower layers will learn features than will probably be reusable by the second neural network
- example use:
    - Task: train a network to recognize faces
    - Having only a few picture of people that are labeled
        - gathering more picture of each invidual would not be practical
    - However, I could gather lots of pictures of random people on the web and train a first NN to detect if two different pictures have the same person. 
        - This network would have good feature detectors for faces
        - ^ its lower layers will be good for classifying faces