Mining Microarray Gene Expression Data for Cancers - correlation analysis and neighborhood based interestingness analysis of emerging patterns
Java
Latest commit b0d25c2 Dec 18, 2012 @modestkdr updated README file

README.md

mining-microarray

This is project work for Data Mining class, part of my Master's degree.

Overview

Mining Microarray data - correlation analysis and, neighborhood based interestingness analysis of emerging patterns.

The given data is discretized using entropy-based discretization and FP-growth algorithm is used to mine minimal genesets correlated with a class. Emerging patterns are mined for each class and, interestingness analysis is done based on total distance of a pattern.

Project Specifications

Refer to project2.pdf

Procedure, assumptions and findings

Refer to report.pdf (I have worked on tasks 1 and 3)

Input file - colon cancer data

Sample input file is available at cc.data

Execution

Refer to READ ME.txt

Credits

FP­‐growth algorithm implementation for Java has been obtained from ‘SPMF’. http://www.philippe-­‐fournier-­‐viger.com/spmf/index.php?link=documentation.php#growth

Note

I am not an expert in Java so, some of the code may be redundant.