This is a function to calculate the fisher score of data $(X, y)$ along a direction $\overrightarrow{e}$. $X[d, i]$ denotes the d-th component of the i-th point, $y[i]$ denotes the label of the i-th point.

In [1]:
function fisher_score(X, y, e)
  x = At_mul_B(normalize(e), X)

  N = length(x)
  Nc = maximum(y)

  n = zeros(Nc)
  μ = zeros(Nc)
  σ2 = zeros(Nc)

  for i in 1:N
     c =  y[i]
     n[c] += 1
     μ[c] += x[i]
  end
  μ ./= n

  for i in 1:N
    c = y[i]
    σ2[c] += (x[i] - μ[c]) ^ 2
  end
  σ2 ./= n

  μtot = mean(x)
  sb = sum(n[c] * (μ[c] - μtot) ^ 2 for c in 1:Nc)
  sw = sum(n[c] * σ2[c] for c in 1:Nc)
  score = sb / sw
end

fisher_score (generic function with 1 method)

This is a function to calculate the generalized eigenvalues and eigenvectors of $S_{between}$, $S_{within}$

In [2]:
function lda(X, y)
  D, N = size(X)
  Nc = maximum(y)

  n = zeros(Nc)
  μ = [zeros(D) for c in 1:Nc]
  Σ = [zeros(D, D) for c in 1:Nc]

  for i = 1:N
    c = y[i]
    n[c] += 1
    for d = 1:D
      μ[c][d] += X[d, i]
    end
  end
  μ ./= n

  for i in 1:N
    c = y[i]
    for d1 = 1:D, d2 = 1:D
      Σ[c][d1, d2] += (X[d1, i] - μ[c][d1]) * ((X[d2, i] - μ[c][d2]))
    end
  end
  Σ ./= n

  μtot = mean(X, 2)

  Sw = zeros(D, D)
  for c = 1:Nc, d2 = 1:D, d1 = 1:D
    Sw[d1, d2] += n[c] * Σ[c][d1, d2]
  end

  Sb = zeros(D, D)
  for c = 1:Nc, d2 = 1:D, d1 = 1:D
    Sb[d1, d2] += n[c] * (μ[c][d1] - μtot[d1]) * (μ[c][d2] - μtot[d2])
  end

  D, V = eig(Sb, Sw)
end

lda (generic function with 1 method)

This is the data provided by Prof. Liu. For y, 1 represents red and 2 represents blue.

In [3]:
X = [3 2 2 1 4 6 8 9 9 10;
     6 4 3 4 4 9 7 5 10 8]

y = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2];

Fisher score along x1 direction.

In [4]:
fisher_score(X, y, [1.0, 0.0])

6.250000000000002

Fisher score along x2 direction.

In [5]:
fisher_score(X, y, [0.0, 1.0])

1.6530612244897953

Generalized eigenvalues and eigenvectors of $S_{between}$, $S_{within}$

In [6]:
D, V = lda(X, y)

([0.0,8.31518],
[0.117706 -0.236298; -0.196177 -0.112768])

The maximum projection direction: $(1,k)$

In [7]:
k = V[2, 2] / V[2, 1]

0.5748287606847523

The maximum score along direction $(1,k)$

In [8]:
max_score = D[2]

8.315175650689802

The green line is the max-direction. Intuitively, it seperates the red and blue objects perfectly.

In [10]:
import Plots; Plots.gr()
Plots.scatter(X[1,:], X[2,:], group=y, leg=:none)
xmin, xmax = extrema(X[1,:])
ymin, ymax = extrema(X[2,:])
x = linspace(xmin, xmax, 100)
Plots.plot!(x, k * (x - xmin) + ymin)