# Decomposing Sklearn Chi2 Implementation

Source:  https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html

This notebook breaks down the sklearn implementation of the chi2 algoirthm

In [31]:
import numpy as np
from scipy import special

### Chi 2


### Create Input Data

First, create the Y labels you are tryign to predict and the X features.\
These variabest are passed to sklearn's chi2.

In [32]:
# Create the Y labels you are trying to predict
Y = np.array([[0], 
              [1], 
              [0]])
print(f"Y lables: \n{Y}\n")

# The X features
X = np.matrix([[0, 10, 0, 3],
               [1, 9, 1, 7],
               [0, 10, 0, 11]])
print(f"X features: \n{X}")

Y lables: 
[[0]
 [1]
 [0]]

X features: 
[[ 0 10  0  3]
 [ 1  9  1  7]
 [ 0 10  0 11]]


### Step 1
Represent the Y labels in a one v all fashion (i.e. calculates a column to represent the other categorical labels).\
Note: This is a binary classification example

In [33]:
# Create 1 v all columns for lables
Y.shape[1]
Y = np.append(1 - Y, Y, axis=1)
print(f"Y: \n{Y}")

Y: 
[[1 0]
 [0 1]
 [1 0]]


### Step 2
Transpose Y so that it is a row vector\
Calculate the observed by taking the dot product of Y and X\
Two row vectors in np.dot is equavalent to a<sup>T</sup>b, column_vector$\times$row_vector

In [34]:
# Calculate observed as the dot product of Y and X
observed = np.dot(Y.T, X)
print(f"observed: \n{observed}")

observed: 
[[ 0 20  0 14]
 [ 1  9  1  7]]


### Step 3
Feature count sums the values of the x variables, column-wise

In [35]:
# Sum the value of the variables by column
feature_count = X.sum(axis=0).reshape(1, -1)
print(f"X features: \n{X}\n")

print(f"feature count: \n{feature_count}")

X features: 
[[ 0 10  0  3]
 [ 1  9  1  7]
 [ 0 10  0 11]]

feature count: 
[[ 1 29  1 21]]


### Step 4
Calculate the average value for each label\
Since the label is one hot encoded, it can only be zero or one, so this represents the probability

In [36]:
# Calculate the class probability for each Y label
class_prob = Y.mean(axis=0).reshape(1, -1)
print(f"Y: \n{Y}\n")

print(f"class prob: \n{class_prob}")

Y: 
[[1 0]
 [0 1]
 [1 0]]

class prob: 
[[0.66666667 0.33333333]]


### Step 5
Take the outer dot product of class_prob and feature_count

In [37]:
# calculate expected
expected = np.dot(class_prob.T, feature_count)

print(f"class prob: \n{class_prob}\n")
print(f"feature count: \n{feature_count}\n")
print(f"expected: \n{expected}")

class prob: 
[[0.66666667 0.33333333]]

feature count: 
[[ 1 29  1 21]]

expected: 
[[ 0.66666667 19.33333333  0.66666667 14.        ]
 [ 0.33333333  9.66666667  0.33333333  7.        ]]


### Step 6
Convert the observed matrix to an array with floating numbers

In [38]:
# Cast as floats
observed = np.asarray(observed, dtype=np.float64)
print(f"observed array: \n{observed}")

observed array: 
[[ 0. 20.  0. 14.]
 [ 1.  9.  1.  7.]]


### Step 7
Calculate the degrees of freedom by first calculating k as the number of labels\
Since this is a binary classification problem, we have k=2\
When we pass k to special.chdtrc, we take k-1 to get the degrees of freedom

In [39]:
k = len(observed)
print(f"K: {k}")

K: 2


### Step 8
Calculate chi2 as:\
$\sum_{}^{} \frac{(observed - expected)^2}{expected}$

In [40]:
# calculate chi2 step by step
print(f"observed: \n{observed}\n")

print(f"expected: \n{expected}\n")

chisq = observed
chisq -= expected

print(f"observed - expected: \n{chisq}\n")

chisq **= 2

print(f"squared: \n{chisq}\n")

chisq /= expected

print(f"divided by expected: \n{chisq}\n")

chisq = chisq.sum(axis=0)

print(f"sum to calculate chisq: \n{chisq}")

observed: 
[[ 0. 20.  0. 14.]
 [ 1.  9.  1.  7.]]

expected: 
[[ 0.66666667 19.33333333  0.66666667 14.        ]
 [ 0.33333333  9.66666667  0.33333333  7.        ]]

observed - expected: 
[[-0.66666667  0.66666667 -0.66666667  0.        ]
 [ 0.66666667 -0.66666667  0.66666667  0.        ]]

squared: 
[[0.44444444 0.44444444 0.44444444 0.        ]
 [0.44444444 0.44444444 0.44444444 0.        ]]

divided by expected: 
[[0.66666667 0.02298851 0.66666667 0.        ]
 [1.33333333 0.04597701 1.33333333 0.        ]]

sum to calculate chisq: 
[2.         0.06896552 2.         0.        ]


### Step 9
special.chdtrc(k - 1, chisq) returns the area under the right hadn tail (from x to infinity) of the chi square probability density function with v degrees of freedom

Returns the area under the right tail for each of the features\
https://en.wikipedia.org/wiki/Chi-squared_distribution \
https://people.richland.edu/james/lecture/m170/tbl-chi.html

The smaller the value, the better

In [41]:
aurt = special.chdtrc(k -1, chisq)
print(f"Area under right tail: \n{aurt}\n")

print(f"chisq: \n{chisq}")

Area under right tail: 
[0.15729921 0.79284898 0.15729921 1.        ]

chisq: 
[2.         0.06896552 2.         0.        ]


### Compare the results to the sklearn implementation

In [42]:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

In [44]:
selector = SelectKBest(chi2, k=2)

In [45]:
selector.fit(X, Y)

SelectKBest(k=2, score_func=<function chi2 at 0x7f86e04f4040>)

In [46]:
selector.scores_

array([2.        , 0.06896552, 2.        , 0.        ])

### Exercises
* Manually calculate chi2 with a multiclass classification problem

### Additional Resources and Concepts
* 