Mining Microarray Gene Expression Data for Cancers - correlation analysis and neighborhood based interestingness analysis of emerging patterns
This is project work for Data Mining class, part of my Master's degree.


Mining Microarray data - correlation analysis and, neighborhood based interestingness analysis of emerging patterns.

The given data is discretized using entropy-based discretization and FP-growth algorithm is used to mine minimal genesets correlated with a class. Emerging patterns are mined for each class and, interestingness analysis is done based on total distance of a pattern.

Project Specifications

Refer to project2.pdf

Procedure, assumptions and findings

Refer to report.pdf (I have worked on tasks 1 and 3)

Input file - colon cancer data

Sample input file is available at


Refer to READ ME.txt


FP­‐growth algorithm implementation for Java has been obtained from ‘SPMF’. http://www.philippe-­‐fournier-­‐


I am not an expert in Java so, some of the code may be redundant.