## [Fisher's linear discriminant (LDA)](https://en.wikipedia.org/wiki/Linear_discriminant_analysis#Fisher's_linear_discriminant)

## Math for LDA

$$y\in \{+1,-1\}$$

Assume:

$$ p(x|y=-1)\sim\mathcal{N}(\mu_{-},\Sigma_{-})$$

$$ p(x|y=+1)\sim\mathcal{N}(\mu_{+},\Sigma_{+})$$

Linear projection:

$$z = w \cdot x \sim \mathcal{N}(w\cdot\mu_*,w^{T}\Sigma_{*}w)$$

Optimize Variance:

$$ \sigma^2_{between} = [w\cdot(\mu_+-\mu_-)]^2$$

$$ \sigma^2_{within} = n_+\sigma^2_+ + n_-\sigma^2_- = w^T(n_-\Sigma_-+n_+\Sigma_+)w$$

Objective:

$$\max S = \frac{\sigma^2_{between}}{\sigma^2_{within}} = \frac{w^TS_Bw}{w^TS_Ww}$$

$$ \iff \max w^TS_Bw, \text{ s.t. } w^TS_Ww = 1$$

By Lagrange multipliers:

$$ \Rightarrow S_W^{-1}S_Bw = \lambda w$$

$$ \Rightarrow w \propto S_W^{-1}(\mu_+ - \mu_-)$$

## Codes for binary classification

In [7]:
import numpy as np
import tqdm

In [30]:
class ClassData():
    def __init__(self, X, label, mean_all):
        self.X = X
        self.label = label
        self.num = X.shape[1]
        self.mean = X.mean(1)
        self.var = np.cov(X)
        self.Swi = self.var * (self.num - 1)# unbiased estimation    
        tmp = self.mean - mean_all
        self.cov = np.dot(tmp, tmp.transpose())
        self.Sbi = self.cov * self.num

class LDA():
    def __init__(self, dim, num_classes=2, labels=[-1,1], bayes = True):
        self.dim = dim+1
        self.num_classes = num_classes
        self.labels = range(num_classes) if labels is None else labels
        self.bayes = bayes

    
    def fit(self, X,y):
        if X.shape[1]!=len(y):
            X = X.transpose()  # dim, num
        # add intercept
        X = np.insert(X, 0, values=1, axis=0)
        if self.bayes:
            self.mean_all = X.mean(1)
        else:
            self.mean_all = np.zeros(self.dim)
            
        self.Sw = np.zeros([self.dim, self.dim])
        diff = np.zeros(self.dim)
        for i in self.labels:
            X_c = ClassData(X[:,y==i], i, self.mean_all)
            self.Sw = self.Sw + X_c.Swi
            diff = diff + i * X_c.mean
            if not self.bayes:
                self.mean_all + X_c.mean
        
        self.Sb = np.dot(diff, diff.transpose())
        
        if not self.bayes:
            self.mean_all /= self.num_classes
        # For Binary Case    
        self.w = np.dot(np.linalg.pinv(self.Sw),diff.reshape(self.dim,1))
        # Standardize
#         scale = np.dot(self.w.transpose(), self.Sw, self.w)
#         self.w /= np.sqrt(scale)
        self.mean_w = np.dot(self.w.transpose(), self.mean_all)
    
    def predict(self, X):
        if X.shape[0]!=self.dim and X.shape[0]!=self.dim-1:
            X = X.transpose()
        if X.shape[0]==self.dim-1:
            X = np.insert(X, 0, values=1, axis=0)  
        pred = np.dot(self.w.transpose(),X) - self.mean_w
        pred[pred>=0] = 1
        pred[pred<0] = -1
        return pred.astype(int)
    
    def get_acc(self, X, y):
        return (self.predict(X) == y).sum()/len(y)
    
    def get_score(self, X):
        if X.shape[0]!=self.dim and X.shape[0]!=self.dim-1:
            X = X.transpose()
        if X.shape[0]==self.dim-1:
            X = np.insert(X, 0, values=1, axis=0)  
        pred = np.dot(self.w.transpose(),X) - self.mean_w
        return pred
        
    def get_variance(self):
        intra_var = self.w.transpose().dot(self.Sb).dot(self.w).shape
        inter_var = self.w.transpose().dot(self.Sw).dot(self.w).shape
        return intra_var, inter_var
        

## Testing

In [31]:
from sklearn.datasets import load_breast_cancer

In [32]:
db = load_breast_cancer()

In [33]:
y = db['target']
X = db['data']

In [34]:
y[y==0] = -1

In [35]:
M = LDA(30)
M.fit(X,y)
M.get_acc(X,y)

0.9753954305799648

In [36]:
M.Sw.shape

(31, 31)

In [37]:
M.w.transpose().dot(M.Sw).dot(M.w).shape

(1, 1)

In [None]:
M.get_variance()