# Generative vs Discriminative Methods

In this notebook, we will look at generative and discriminative approaches towards classification problems and also explore an application of generative methods in a generative adversarial network (GAN).

## Wine Data:
<img src="Resources/wine.jpg" width="250">
Making wine is pretty interesting, where many different factors play a role in determining the properties of the wine. This dataset presents a chemical and physical analysis of wine from 3 different sources in Italy.<sup>1</sup>

<sup>1. Forina, M. et al, PARVUS - An Extendible Package for Data
       Exploration, Classification and Correlation. Institute of Pharmaceutical
       and Food Analysis and Technologies, Via Brigata Salerno, 
       16147 Genoa, Italy.</sup>

#### Brief description of features:
1. Cultivar: source of wine
2. Alcohol: alcohol content
3. Malic acid (C4H6O5): Found in fruits, contributes sour taste
4. Ash: inorganic matter
5. Alkalinity of ash: how basic the ash is
6. Magnesium: magnesium content, a cofactor in many enzyme systems that regulate biochemical reactions in the body
7. Total phenols: natural compounds containing phenol group that contribute to the color and texture in wine
8. Flavanoids: a type of phenol, most of the phenols in wine are flavanoids
9. Nonflavanoid phenols: all the other phenols
10. Proanthocyanidins: polyphenols, composed of flavanoid oligomers
11. Color intensity: measurement made with spectrophotometer/colorometer to determine transmission properties of the wine
12. Hue: a property of color of the wine
13. OD280/OD315 of diluted wines: optical density at 280nm/315nm ratio, like absorbance except it considers the scattering of light as well. Used to determine protein concentration
14. Proline(C5H9NO2): The most abundant amino acid in wine

In [None]:
# Imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, confusion_matrix
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

In [None]:
df = pd.read_csv('wine.data',names = ['Cultivar','Alcohol','Malic_acid','Ash','Alkalinity_of_ash','Magnesium',
                                 'Total_phenols','Flavanoids','Nonflavanoid_phenols','Proanthocyanidins','Color_intensity',
                                 'Hue','OD280/OD315_of_diluted_wines','Proline'], index_col = 0)
df

## Naive Bayes

Naive Bayes classifiers are an application of Bayes theorem with naive independence assumptions between the features. 
<img src="Resources/Bayes.png" width="300">
P(A|B): Posterior

P(A): Prior

P(B|A): Likelihood

P(B): Evidence

Given that A is a class we are trying to predict using B as an independent variable, we notice that the denominator in Bayes theorem is constant as P(B) when the features are known. It follows that the conditional probability P(A|B) can be found using the joint probability model. Notice that P(A|B) is used to determine decision boundaries, just as in discriminative methods, but it is calculated using estimates of P(B|A) and the probability distribution of the data is learned.

Let's visualize the probability distrubution of each class of wine from our dataset using the distribution for flavanoid content

In [None]:
# Plot the distribution for Cultivar 1, 2, and 3
x = df[['__________']].drop([2,3])
sb.distplot(x, label = 'Cultivar 1')
x = df[['__________']].drop([1,3])
sb.distplot(x, label = 'Cultivar 2')
x = df[['__________']].drop([1,2])
sb.distplot(x, label = 'Cultivar 3')

# Plot formatting
plt.legend(prop={'size': 12})
plt.title('Flavanoid content of 3 Cultivar Sources')
plt.xlabel('Flavanoids')
plt.ylabel('Frequency')

Assuming that each class has a normal distribution, how would we determine the joint probability using the this plot?

Normally, we will most likely have more than one feature to look at. Let's use two features of flavanoids and alcohol content and try to create a classification model using naive Bayes!. 

In [None]:
# Visualization of our data with each point already labeled
fig = plt.figure(figsize = (8,8))
ax = fig.add_axes([.1,.1,.8,.8])
ax.scatter(df.Flavanoids, df['Alcohol'], c=df.index, edgecolors='k', cmap=plt.cm.Paired)
ax.set_xlabel('Flavanoids')
ax.set_ylabel('Alcohol')

In [None]:
# Split into training sets:
X = df[['Flavanoids','Alcohol']]
Y = df.index
__________,__________,__________,__________ = train_test_split(X,Y)

# Fit the data:
clf = GaussianNB()
clf.fit(X_train, Y_train)

# Results:
training_score = clf.score(X_train, Y_train)
print("training score: ",training_score)
test_score = clf.score(X_test, Y_test)
print("test score:     ", test_score)

How did we do? Despite its simple design, naive Bayes performs quite well given that its assumptions hold. Notice that we use Gaussian naive Bayes method from scikit, which assumes a continuous normal distribution. P(x|y) is estimated using the following equation:
<img src="Resources/gaussianBayes.png" width="300">
where C_k is a class, x is a feature, and sigma^2 and mu are variance and mean respectively.

The probability distribution is visualized below:

In [None]:
# Visualize the data:
# Plot the decision boundary in a mesh:
x_min, x_max = X['Flavanoids'].min() - .5, X['Flavanoids'].max() + .5
y_min, y_max = X['Alcohol'].min() - .5, X['Alcohol'].max() + .5
h = .02  # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,0]
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=.8)
plt.figure(1, figsize=(8, 8))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Blues)
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,1]
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Oranges, alpha=.5)
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:,2]
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Greys, alpha=.3)

# Plot the points:
plt.scatter(df.Flavanoids, df['Alcohol'], c=df.index, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Flavanoids')
plt.ylabel('OD280/OD315_of_diluted_wines')

## Logistic Regression

Logistic regressions directly estimate P(y|x) using the logistic equation:
<img src="Resources/logistic.png" width="300">
Because it is a discriminative method, it does not look at the joint probability. We will use logistic regression this time to classify our wine with the same flavanoid and alcohol features.

In [None]:
# Split into training sets:
X = df[['Flavanoids','Alcohol']]
Y = df.index
X_train,X_test,Y_train,Y_test = train_test_split(X,Y)

# Fit the data:
logreg = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial')   #Notice the solver settings
logreg.fit(__________, __________)

# Results:
training_score = logreg.score(__________, __________)
print("training score: ",training_score)
test_score = logreg.score(__________, __________)
print("test score:     ", test_score)

In [None]:
# Visualize the data:
# Plot the decision boundary in a mesh:
x_min, x_max = X['Flavanoids'].min() - .5, X['Flavanoids'].max() + .5
y_min, y_max = X['Alcohol'].min() - .5, X['Alcohol'].max() + .5
h = .02  # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=.8)
plt.figure(1, figsize=(8, 8))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot the points:
plt.scatter(df.Flavanoids, df['Alcohol'], c=df.index, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Flavanoids')
plt.ylabel('Alcohol')

Notice the difference in decision boundary compared to naive Bayes. How does logistic regression as a discriminative method compare to naive Bayes as a generative method?

## Generative Adversarial Network (GAN)

A GAN consists of to models that work together to synthesize data. A generator model is fed input noise to produce a fake image, while a discriminator model takes in this fake along with a real image and outputs whether it thinks the image is fake or real. By simultaneously training the generator to better generate fake images and the discriminator to better distinguish real images, the generator will eventually produce a sample that can fool the discriminator into thinking it is real.
<img src="Resources/gan.png" width="600">

This method requires TensorFlow and Keras, which are used for machine learning neural networks!

They can be installed with:
- pip install tensorflow OR conda install tensorflow
- pip install keras OR conda install keras

The following code was taken from https://github.com/MonteChristo46/GAN-Notebooks/blob/master/GAN.ipynb

In [None]:
#imports
import tensorflow as tf
from tensorflow import keras

### Global Parameters

In [None]:
BATCH_SIZE = 256
BUFFER_SIZE = 60000
EPOCHES = 300
OUTPUT_DIR = "img" # The output directory where the images of the generator a stored during training

### Load MNIST dataset

In [None]:
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(train_images.shape)
plt.imshow(train_images[1], cmap = "gray")

### Adding data to tensorflow

In [None]:
train_images = train_images.astype("float32")
train_images = (train_images - 127.5) / 127.5
train_dataset = tf.data.Dataset.from_tensor_slices(train_images.reshape(train_images.shape[0],784)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

### Generator Network

In [None]:
class Generator(keras.Model):

    def __init__(self, random_noise_size = 100):
        super().__init__(name='generator')
        #layers
        self.input_layer = keras.layers.Dense(units = random_noise_size)
        self.dense_1 = keras.layers.Dense(units = 128)
        self.leaky_1 =  keras.layers.LeakyReLU(alpha = 0.01)
        self.dense_2 = keras.layers.Dense(units = 128)
        self.leaky_2 = keras.layers.LeakyReLU(alpha = 0.01)
        self.dense_3 = keras.layers.Dense(units = 256)
        self.leaky_3 = keras.layers.LeakyReLU(alpha = 0.01)
        self.output_layer = keras.layers.Dense(units=784, activation = "tanh")

    def call(self, input_tensor):
        ## Definition of Forward Pass
        x = self.input_layer(input_tensor)
        x = self.dense_1(x)
        x = self.leaky_1(x)
        x = self.dense_2(x)
        x = self.leaky_2(x)
        x = self.dense_3(x)
        x = self.leaky_3(x)
        return  self.output_layer(x)
    
    def generate_noise(self,batch_size, random_noise_size):
        return np.random.uniform(-1,1, size = (batch_size, random_noise_size))

### Objective Function

In [None]:
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits = True)

def generator_objective(dx_of_gx):
    # Labels are true here because generator thinks he produces real images. 
    return cross_entropy(tf.ones_like(dx_of_gx), dx_of_gx) 

### Fake image

In [None]:
generator = Generator()
fake_image = generator(np.random.uniform(-1,1, size =(1,100)))
fake_image = tf.reshape(fake_image, shape = (28,28))
plt.imshow(fake_image, cmap = "gray")

### Discriminator network

In [None]:
class Discriminator(keras.Model):
    def __init__(self):
        super().__init__(name = "discriminator")
        
        #Layers
        self.input_layer = keras.layers.Dense(units = 784)
        self.dense_1 = keras.layers.Dense(units = 128)
        self.leaky_1 =  keras.layers.LeakyReLU(alpha = 0.01)
        self.dense_2 = keras.layers.Dense(units = 128)
        self.leaky_2 = keras.layers.LeakyReLU(alpha = 0.01)
        self.dense_3 = keras.layers.Dense(units = 128)
        self.leaky_3 = keras.layers.LeakyReLU(alpha = 0.01)
        
        self.logits = keras.layers.Dense(units = 1)  # This neuron tells us if the input is fake or real

    def call(self, input_tensor):
          ## Definition of Forward Pass
        x = self.input_layer(input_tensor)
        x = self.dense_1(x)
        x = self.leaky_1(x)
        x = self.leaky_2(x)
        x = self.leaky_3(x)
        x = self.leaky_3(x)
        x = self.logits(x)
        return x

In [None]:
discriminator = Discriminator()

### Objective Function

In [None]:
def discriminator_objective(d_x, g_z, smoothing_factor = 0.9):
    """
    d_x = real output
    g_z = fake output
    """
    real_loss = cross_entropy(tf.ones_like(d_x) * smoothing_factor, d_x) # If we feed the discriminator with real images, we assume they all are the right pictures --> Because of that label == 1
    fake_loss = cross_entropy(tf.zeros_like(g_z), g_z) # Each noise we feed in are fakes image --> Because of that labels are 0. 
    total_loss = real_loss + fake_loss
    
    return total_loss

### Optimizer

In [None]:
generator_optimizer = keras.optimizers.RMSprop()
discriminator_optimizer = keras.optimizers.RMSprop()

### Training Functions

In [None]:
@tf.function()
def training_step(generator: Discriminator, discriminator: Discriminator, images:np.ndarray , k:int =1, batch_size = 32):
    for _ in range(k):
         with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
            noise = generator.generate_noise(batch_size, 100)
            g_z = generator(noise)
            d_x_true = discriminator(images) # Trainable?
            d_x_fake = discriminator(g_z) # dx_of_gx

            discriminator_loss = discriminator_objective(d_x_true, d_x_fake)
            # Adjusting Gradient of Discriminator
            gradients_of_discriminator = disc_tape.gradient(discriminator_loss, discriminator.trainable_variables)
            discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables)) # Takes a list of gradient and variables pairs
            
              
            generator_loss = generator_objective(d_x_fake)
            # Adjusting Gradient of Generator
            gradients_of_generator = gen_tape.gradient(generator_loss, generator.trainable_variables)
            generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables)) 
    

In [None]:
seed = np.random.uniform(-1,1, size = (1, 100)) # generating some noise for the training

In [None]:
# Just to make sure the output directory exists..
import os
if not os.path.exists("img"):
    os.makedirs("img")

In [None]:
def training(dataset, epoches):
    for epoch in range(epoches):
        for batch in dataset: 
            training_step(generator, discriminator, batch ,batch_size = BATCH_SIZE, k = 1)
            
        ## After ith epoch plot image 
        if (epoch % 50) == 0: 
            fake_image = tf.reshape(generator(seed), shape = (28,28))
            print("{}/{} epoches".format(epoch, epoches))
            #plt.imshow(fake_image, cmap = "gray")
            plt.imsave("{}/{}.png".format(OUTPUT_DIR,epoch),fake_image, cmap = "gray")

In [None]:
training(train_dataset, EPOCHES)

### Testing the Generator

In [None]:
fake_image = generator(np.random.uniform(-1,1, size = (1, 100)))
plt.imshow(tf.reshape(fake_image, shape = (28,28)), cmap="gray")