#### Definitions: 
**Discriminative models**: computes the probability that given input belongs to a particular class - p(y|x)
- Usually *supervised* learning
- Useful for classification (i.e can be used to map an input x to a particular class y) <br>

**Generative model**: Learn the joint distribution of the input *and* class labels - p(x,y)
- Usually *unsupervised* learning 
- Convert to p(y|x) using probability rules 
- Useful for creating new data that looks like training data (i.e images) <br>

**Adversarial training**: Two or more "players" (i.e game AIs) are competing against each other (ex: Pacman) 
- What algorithm does it apply: minimax (algorithm that searches for optimized moves assuming that the opponent is also playing optimally) 

## Generative Adversarial Networks (GANs)
- An example: counterfeiters and police, to represent the two models. 
    - Two players: 
        - counterfeiters to make money that can fool the police == **generator**
        - policemen who classify the money as real or fake == **discriminator**
    - Their process:
        - Analyze the situation using game theory (minimax) to achieve their goal 
    - End goal:
        - Generator creates examples so well that discriminator can't distinguish which are real and fake 
- Flow:
    - Generator takes in noise to produce fake data (that is indistinguishable from the training set) 
    - Discriminator takes in the fake data and real data
    - Discriminator then determines which data is genuine. Ultimately, we want the generator to win.

## How we train GANs:
- Generator G is a neural network that "up-samples" to generate examples
    - G receives noise distributed according to a prior $p_{z}$ (usually Gaussian) 
    - G produces examples in input-space <br>
- Discriminator D is a neural network that classifies 
    - D receives input from input-space, fake or real 
    - a probability that the input was real (1-that probability = p(fake)) 
    
#### Why are GANs adversarial? 
- Minimax game between D and G (they're competing)
- D is trained to *maximize* the probability of detecting real and fake examples
- G is trained to *minimize* the probability 
- Graphical explanation: if G has its own graph, then D would do its best to match that graph. 
    
#### Detailed explanation:
1. Generate batch of fake examples from G using noise prior (ground-truth label is 0)
2. Select batch of real examples from training data (ground-truth label is 1) 
3. Work first with D. Train D on both batches separately
    - Think of G as frozen, unable to work with  
4. Then, freeze D and train G, using random noise as input and 1 as ground-truth for all examples
    - Model for training G generates example from noise input and passes example through frozen D for input
        
#### Tips and Tricks for training: 
- Normalize inputs between [-1,1] instead of [0,1]. Use tanh for generator 
- Sample noise from Gaussian distribution 
- Avoid sparse gradients
    - Prefer leaky ReLUs instead of ReLUs (reasoning: practically identical, but leaky ReLUs have a small negative value when it's <1, which allows it to avoid being sparse) 
    - Prefer strided convolutions or average pooling instead of max pooling 
- Use Adam as the optimizer 
- One-sided label smoothing: use 0 and 0.9 instead of 0 and 1 for discriminator 

## Challenges with training GANs
- G isn't producing diverse outputs
    - Known as **Mode Collapse**: G produces low-diversity (also identical) copies of the same image.
    - Rather than learning the input distribution of the differences, G only learns the images that fool D and produces similar images with slight differences. 
    - When this occurs, the minimum and maxium is swapped. (normally G=min and max=D, but mode collapse flips these) 
- Problems with:
    - counting (i.e produces 4 eyes instead of 2 for a dog)
    - perspective (i.e an image has perspectives from multiple POVs) 
    - global structure (i.e nose placed where ears should be) )