To use mca in a project:
import mca mca_df = mca.MCA(dataframe[, cols=None][, ncols=None][, benzecri=True][, TOL=1e-4])
cols: A list of the pandas DataFrame's columns to encode and process.
ncols: The number of factors to retain. None to retain all (default).
benzecri: Perform Benzecri correction to shrink eigenvalues (default).
TOL: The value below which to round eigenvalues to zero.
The package includes a fairly thorough set of unit tests, which users are invited to inspect (since they will substitute for a formal help file for the time being.)
>>> import mca, pandas, numpy >>> counts = pandas.read_table('data/burgundies.csv', sep=',', skiprows=1, index_col=0, header=0) >>> print(counts.shape) (6, 23)
>>> mca_counts = mca.MCA(counts.drop('oak_type', axis=1)) >>> print(mca_counts.fs_r(1)) # 1 = 100%, meaning preserve all variance. array([[ 0.87127085, 0.11448396, -0.09250792], [-0.7209849 , -0.22896791, -0.083259 ], [-0.93238083, 0.11448396, -0.02206285], [-0.87127085, 0.11448396, 0.09250792], [ 0.93238083, 0.11448396, 0.02206285], [ 0.7209849 , -0.22896791, 0.083259 ]])
The eigenvalues, or principal inertias, of the factors:
>>> print(mca_counts.L) array([ 0.71608871, 0.02621315, 0.00532552])
The inertia is simply the sum of the principle inertias:
>>> print(mca_counts.inertia, mca_counts.L.sum()) 0.74762737298514048 0.74762737298514048
If Benzecri correction has been enabled (default), this is less than the the squared sum of the singular values:
>>> print(mca_counts.s) >>> print(sum(mca_counts.s**2)) array([ 9.23693800e-01, 4.47213595e-01, 3.39283916e-01, 1.77978056e-01, 1.71329335e-16, 7.21294550e-17]) 1.2
Benzecri correction plus thresholding has eliminated 3 of the 6 columns. You can adjust the threshold by setting the TOL parameter (default: 1e-4) in the constructor. If we had not set the
prob parameter in
fs_r() to 1, it would have used its default value of 0.9 and we would have eliminated another two columns, leading to a dimensionality reduction ratio of 6:1.
>>> print(mca_counts.fs_r()) array([[ 0.87127085], [-0.7209849 ], [-0.93238083], [-0.87127085], [ 0.93238083], [ 0.7209849 ]])
The result is identical to the first column of the earlier invocation of
fs_r(1). This holds in general; reducing
N simply truncates the matrix, exactly as in PCA.
If you want to find the factor score of supplementary data (which has to be conformable):
>>> new_counts = pandas.DataFrame(numpy.random.randint(0, 2, (5, len(counts.columns)-1)))
where the decrement is to account for the dropped column ('
oak_types') in the original
counts DataFrame. As before, we can decide how many columns to keep:
>>> mca_counts.fs_r_sup(new_counts, 2) array([[ -3.33523735e-02, 2.27874988e-16], [ 3.13116890e-01, -1.12938488e-01], [ -3.33523735e-02, 3.33829232e-16], [ -5.12296954e-02, 1.21626064e-01], [ -7.71194728e-03, 4.74341649e-01]])