This is project work for Data Mining class, part of my Master's degree.
Mining Microarray data - correlation analysis and, neighborhood based interestingness analysis of emerging patterns.
The given data is discretized using entropy-based discretization and FP-growth algorithm is used to mine minimal genesets correlated with a class. Emerging patterns are mined for each class and, interestingness analysis is done based on total distance of a pattern.
Refer to project2.pdf
Procedure, assumptions and findings
Refer to report.pdf (I have worked on tasks 1 and 3)
Input file - colon cancer data
Sample input file is available at cc.data
Refer to READ ME.txt
FP‐growth algorithm implementation for Java has been obtained from ‘SPMF’. http://www.philippe-‐fournier-‐viger.com/spmf/index.php?link=documentation.php#growth
I am not an expert in Java so, some of the code may be redundant.