### Gaussian Discriminant Analysis

&nbsp;

GDA is an interesting classification method. Unlike those optimization-based models, it is a generative learning algorithm based on probabilities. It requires a stronger assumption than Logistic regression. To our surprise, Logistic does not require any distribution assumption, not even exponential family distribution. That's why we should always try Logistic first. 

The foundation of GDA is Bayes' theorem. Let's breakdown what each element means.

$$P (y \mid x)=\frac {P (x \mid y) \cdot P (y)} {P (x)}$$

In a binary GDA, the assumption is that $x$ is always independent of $y$ which yields $P (x)=1$. $P (y)$ denotes the probability density function of Bernoulli distribution which is $P (y)=\phi^{y}+(1-\phi)^{1-y}$ where $\phi$ specifies the portion of each label. $P (x \mid y)$ denotes the probability density function of multivariate Gaussian distribution (because we assume it follows Gaussian distribution to use GDA). To make prediction, we merely compute the probability for each label and pick the label associated with the higher probability. It's literally a walk in the park.

Details of classification and discriminative algorithm such as Logistic

https://github.com/je-suis-tm/machine-learning/blob/master/newton%20method%20for%20logistic%20regression.ipynb

Reference to some of the mathematics

https://en.wikipedia.org/wiki/Bayes%27_theorem
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
https://en.wikipedia.org/wiki/Covariance_matrix

Reference to the lecture material

https://see.stanford.edu/materials/aimlcs229/cs229-notes2.pdf

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as lda
import os
os.chdir('d:/')

In [2]:
#plz note all calculations involved are linear algebra
#the sum of x**2 in matrix form is x'x where prime stands for transposed
def gaussian_discriminant_analysis(y,x):
    
    #split test and train as usual
    x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)
    
    #denote phi as the probability of y==1
    #we just calculate the number of samples where y==1
    #divided by the total number samples
    phi=len(x_train[y_train==1])/len(x_train)
    
    #calculate the 'mean vector' for two scenarios
    #y==0 and y==1
    #note that we use pandas dataframe
    #so the results are not a vector
    #and we dont need to transpose it
    #cuz later when we need to input a transposed vector
    #we can directly input mean0/mean1
    mean0=np.mean(x_train[y_train==0],0)
    mean1=np.mean(x_train[y_train==1],0)
    
    #calculate the difference between x and mean for two scenarios
    #concatenate x for two scenarios together
    dif=pd.concat([x_train[y_train==0]-mean0,x_train[y_train==1]-mean1])
    
    #calculate the covariance matrix
    #we use a list to append all covariance/variance
    #later we reshape it into a 4 by 4 matrix
    #our x is four-dimensional so we should have 4 by 4
    #this can be done via np.cov
    arr=list(dif.columns)
    cov=[]    
    for i in range(len(dif.columns)):
        for j in range(len(dif.columns)):
            cov.append( 
                       ( 
                        np.mat(dif[arr[i]])*np.mat(
                dif[arr[j]]).T).item()/len(arr) 
                      )
    cov=np.mat(cov).reshape(4,4)    
    
    #now we have mean for two scenarios,covariance matrix and phi
    #we use bayesian conditional probability formula
    #to calculate the probability of y==0 and y==1 for each x
    #and we use the larger probability of the two to forecast y
    p0=[]
    p1=[]   
        
    #we calculate the probability for each element in x matrix
    #if we use the whole matrix to do algebra
    #lets forget anything before np.e as we can see determinant there
    #which indicates everything before np.e becomes scalar
    #for (x-mean).T*cov.I*(x-mean)
    #(x-mean).T is n by 4 matrix
    #cov.I is 4 by 4 matrix
    #we get n by 4 matrix
    #then multiplied by (x-mean) which is 4 by n matrix
    #we would end up with n by n matrix in the end
    #oh,god,thats not what we want
    #when (x-mean).T is 1 by 4 matrix
    #after multiplication by cov.I which is 4 by 4 matrix
    #now we get 1 by 4 matrix
    #eventually times (x-mean).T which is 4 by 1 matrix
    #we get scalar,a 1 by 1 matrix!!!
    for k in range(len(x_test)):        
        probability0=phi/( 
                          np.linalg.det(2*np.pi*cov)**(0.5) 
                         ) 
        *np.exp( 
                -0.5*(np.mat(x_test.iloc[k]-mean0))*cov.I*(
                    np.mat(x_test.iloc[k]-mean0)).T 
               )       
        probability1=phi/( 
                          np.linalg.det(2*np.pi*cov)**(0.5) 
                         ) 
        *np.exp( 
                -0.5*(np.mat(x_test.iloc[k]-mean1))*cov.I*(
                    np.mat(x_test.iloc[k]-mean1)).T 
               )       
        p0.append(probability0.item())
        p1.append(probability1.item())
    
    #here we use numpy sign
    #numpy sign treats positive number as 1
    #it treats zero as 0
    #however,it treats negative number as -1
    #we use a map function to convert -1 to 0
    forecast=np.sign(np.subtract(p1,p0))
    forecast=list(map(lambda x: 0 if x<0 else int(x),forecast))
    
    #accuracy
    print('test accuracy: {}%'.format(len(
        y_test[forecast==y_test])/len(y_test)*100))
    
    #just too lazy to write codes for plotting

    return

In [3]:
#linear discriminant analysis from sklearn
def LDA(y,x):
    
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)    
    m=lda().fit(x_train,y_train)    
    forecast=m.predict(x_test)    
    print('test accuracy: {}%'.format(len(
        y_test[forecast==y_test])/len(y_test)*100))
    
    return

### Run

In [4]:
#iris data manipulation as usual
#simplify data into a binary classification problem
df=pd.read_csv('iris.csv')
df=df[df['type']!='Iris-versicolor']
df['y']=np.select([df['type']=='Iris-setosa',df['type']=='Iris-virginica'],[0,1])

In [5]:
#prepare x and y
x=pd.concat([df[i] for i in df.columns[:4]],axis=1)
y=df['y']

In [6]:
gaussian_discriminant_analysis(y,x)

test accuracy: 100.0%


In [7]:
LDA(y,x)

test accuracy: 100.0%
