# 2.3 Linear Discriminant Analysis 

In the following, we will work on the Iris data Set. As a little helper for you, we will use an out-of-the-box method from the seaborn package to visualize the data set. The seaborn package is a matplotlib-based visualization package. You can install it by typing the following command in the terminal: "__sudo pip3 install seaborn__". If you run the script and you do not see the data, also install the cairo backend with "__sudo pip3 install cairocffi__".

__Task:__ Which of the four features is the most discriminant one? 

In [9]:
%matplotlib inline
#import seaborn as sns
#sns.set()
#df = sns.load_dataset("iris")
#sns.pairplot(df, hue="species")
#plt.show()

## Implementation of a 2-class LDA
 
__Task__: Write a function *train_LDA()* that uses training data $\mathbf{X}$ and labels $\mathbf y$ to train an LDA model and returns weights $\mathbf w$ and a bias $\mathbf b$ for a two-class problem. Review the lecture slides for the theoretical backgrounds.

In [10]:
import numpy as np
import matplotlib.pyplot as plt

def train_lda(X, y):
    ''' Train an LDA
    Input: X data matrix with shape NxD
           y label vector with shape Nx1
    Output: weight vector with shape Nx1 
            bias term - real-valued '''

    # initialisations
    mu_c1 = np.mean(X[y==1], 0)
    mu_c2 = np.mean(X[y==2], 0)
    
    cov_c1 = np.cov(X[y==1], rowvar=False)
    cov_c2 = np.cov(X[y==2], rowvar=False)
    cov_w = 0.5*(cov_c1+cov_c2)
    
    cov_I = np.linalg.pinv(cov_w, hermitian=True)
    
    "Using Fisher's Criterion #2"
    weights = cov_I@(mu_c2 - mu_c1)
    bias = -0.5*weights@(mu_c1+mu_c2) 
    
    return weights, bias

## Validation of the trained LDA model 

__Task__: Write a function *apply_lda()* that uses the weights and bias of the *train_LDA(X, y)* function and returns a vector of predicted classes.

In [11]:
def apply_lda(X_test, weights, bias):
    '''Predict the class label per sample 
    Input: X_test - data matrix with shape NxD
           weight vector and bias term from train_LDA
    Output: vector with entries 1 or 2 depending on the class'''
    
    y_hat = weights@X_test.T + bias # = (1,4)(4,35) +1 = (1x35)
    temp = []
    for _ in y_hat:

        if _ >0:
            temp.append(2)
        else:
            temp.append(1)
        
    return y_test, temp

## Test your implementations with the Iris data set

In Assignment 1, you have already been inspecting the iris data set. Now, train an LDA on the training data of the iris data set (using only class 1 and 2) and validate it on your training and test data.

#### Q2.3.1 Which accuracy can you achieve on the iris data test set?

In [12]:
# Load the iris data set
X_train_all = np.loadtxt('data/iris_train.data', delimiter=' ', dtype=float)
y_train_all = np.loadtxt('data/iris_train.labels', dtype=int)
X_test_all = np.loadtxt('data/iris_test.data', delimiter=' ', dtype=float)
y_test_all = np.loadtxt('data/iris_test.labels', dtype=int)

# only select classes 1 and 2
X_train = X_train_all[np.logical_or(y_train_all == 1, y_train_all == 2)]
y_train = y_train_all[np.logical_or(y_train_all == 1, y_train_all == 2)]

X_test = X_test_all[np.logical_or(y_test_all == 1, y_test_all == 2)]
y_test = y_test_all[np.logical_or(y_test_all == 1, y_test_all == 2)]

# train an LDA and apply it on the test data set, report your accuracy (in percent)

W, b = train_lda(X_train, y_train)  # train

y_hat, y_hat_D = apply_lda(X_test, W, b)  # apply

print('The Accuracy on the test set is %.2f %%' %(sum(y_test==y_hat_D)/len(y_test)*100))



The Accuracy on the test set is 97.14 %
