<a href="https://colab.research.google.com/github/sagihaider/GAN/blob/master/GAN_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Generative Adversarial Networks (GAN)

How should  I start about Generative Adversarial Networks (GAN):

In 2016, *Yann LeChun, Director Facebook AI, said "Generative Adversarial Networks is the most intersting in the last 10 years of machine learning." 

#### Discriminative vs Generative Models

In supervised learning problem:

1. X: is generally knows as a set of features, inputs, covariates
2. Y: is knows as a set of labels, targets, ground truth

(X, Y) → (features, labels) / (inputs, targets)


**Discriminative**

Given inputs, we want to build a model that can classify the inputs to the corresponding targets as correct as possible.

* Given `X` features this mail is `SPAM` or `Not` ?? It learns the conditional probability distribution. 

`P(Y|X)` -"the probability of Y given X should be maximum.

In other words, the model learns to predict the labels from the data. We can also say, it learns the decision boundary between two or multiple classes.

* Point to note here: It does not really care about “How the training data is generated/distributed.

Ex: Logistic Regression (LR), Support Vector Machine (SVM), conditional random fields (CRFs)

**Generative**

Given inputs, we want to build a model that can understand the inputs to generate similar inputs and it’s labels from the targets .

According to Wiki: Given an observable variable `X` and a target variable `Y`, a generative model is a statistical model of the joint probability distribution on `X` × `Y`,  `P(X,Y)`

* Assume this mail is SPAM, what likely are these features ?? or how does a spam email look like???
it learns the joint probability distribution.

`P(X,Y) = p(Y|X).p(X)`

The model has to learn `P(X)`. It cares about “How the training data is generated/distributed" and it cares about how to get `X`?

* I found an intersting figure, which may helps you to get an idea what I have said above ![alt text](https://i.stack.imgur.com/Xrmqg.png) 
[Ref to figure](https://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-a-discriminative-algorithm)

Ex: Naive Bayes Classifier and Linear Discriminant Analysis (LDA)

#### GAN’s Theory

GAN’s are generative models that try to learn the model to generate the input distribution as realistic as possible.

In other words, any machine or systems generating new data and any system observing it can no longer tell the difference between what is original and what is generated. Once we have a system that can do that much, we are free to begin generating up new samples that we haven’t even seen before, yet still are believably real data.

Now let's talk statistically,  we want our generative model to be able to accurately estimate the probability distribution of our real data. We will say that if we have a parameter `W`, we wish to find the parameter `W` that maximizes the likelihood of real samples. When we train our generative model, we find this ideal parameter `W` such that we minimize the distance between our estimate of what the data distribution is and the actual data distribution.

* GAN's end goal is to predict features `X` given a label `Y`, Instead of predicting a label `Y` given features `X`.

Eg: If we take any currency notes such as (£, $, or any currency)  images being `X` , then the GAN’s goal is to learn a model that can produce the realistic or believable currency images from the training data X. 


A GAN consists of two neural networks: 
![See Figure](https://miro.medium.com/max/1400/1*N4oqJsGmH-KZg3Vqrm_uYw.jpeg)


1. **Generator (G)** which generates new data points from some random uniform distribution or latent space. The goal is to produce the similar type of fake results from inputs.


2. **Discriminator (D)** which identifies the fake data produced by Generator from the real data.




#### Generator (G)

The generator is responsible for producing fake examples of data, according to the above example such as currency. It takes as input some latent variable (which we will refer to as `Z`) and outputs data that is of the same form as data in the original data set.
 
 
Latent variables are hidden variables. When talking about GANs we have this notion of a “latent space” that we can sample from. We can continuously slide through this latent space which, when you have a well-trained GAN, will have substantial (and oftentimes somewhat understandable effects) on the output.

* Mathematically, if our latent variable is `Z` and our target variable is `X`, we can think of the generator of network as learning a function that maps from z (the latent space) to x (hopefully, the real data distribution).


#### Discrminator (D)

The discriminator’s role is to discriminate. It is responsible for taking in a list of samples and coming up with a prediction for whether or not a given sample is real or fake. The discriminator will output a higher probability if it believes a sample is real such as real currency note. 

See the figure below, which I found from the [link](http://hunterheidenreich.com/blog/what-is-a-gan/). This figure clearly explains the same concept, which I have explained in text. 
![alt text](https://pbs.twimg.com/media/CwSKfkBWEAAXd4d.jpg)

The main idea for GAN’s is to train two different networks to compete with each other with two different objective functions.


* The generator `G` tries to fool the discriminator into believing that the input sent by generator is real
* While the discriminator `D` gives a slap to the generator by identifying that this is fake.
* Then after taking the slap from the discriminator `D` , the generator `G` learns to produce similar type of training data inputs.
*  And this process is repeated for a while or until Nash equilibrium found.

** What to know about Nash Equilibrium, refer to [Wiki](https://en.wikipedia.org/wiki/Nash_equilibrium) or [Watch Video](https://www.youtube.com/watch?v=eBQ2p8Xz-4Q&t=3s)

This above-mentioned process is called Adversarial Training.




#### GAN Training: Step-by-Step

1. We take some noise from random distribution , then we feed it to the Generator `G` to produce the fake `X` with a label `(Y=0)` → `(X,Y)` as input-label pair.

2. We take this fake pair and the real pair `X` with label  `(Y=1)` and feed it to the Discriminator `D` alternatively.

3. The discriminator `D` is a binary classification neural network so it calculates the loss for both fake `X` and real `X` and combine them as the final loss as `D` loss.

4. The generator `G` also calculates the loss from it’s noise as `G` loss since each network has a different objective function.

5. The two losses go back to their respective networks to learn from the loss (adjusting the parameters w r t the loss)

6. Apply any optimization algorithm (Grad descent, ADAM, RMS prop, etc..) Repeat this process for certain no of epochs or as long as you wish.


Using the steps given above, The generator `G` gets stronger and stronger at generating the real type of results and the discriminator `D` also gets stronger and stronger at identifying which one is real , which one is fake


### GAN Objective Function

The discriminator is a binary classifier, so when we feed the real data , the model should produce high probability for the real data and low probability for fake data( generator’s output).

Let's define the variables and functions:


* $z$ → Noise vector
* $x$ → Training sample → $x_{real}$
* $G(z)$ → Generator's Output → $x_{fake}$
* $D(x)$ → Discriminator's Output for → $x_{real} → P(y|x_{real}) → \{0,1\}$
* $D(G(z))$ → Discriminator's Output for → $x_{fake} → P(y|x_{fake}) → \{0,1\}$