- UCI data and multi-class data were downloaded from UC Irvine Machine Learning Repository
- TCGA datasets are preprocessed data from the Xena platform in the “gene expression RNAseq -IlluminaHiSeq pancan normalized” version. The file is too large to upload on Github. Please see details and download datasets on the provided web links.
- BRCA (TCGA Breast Cancer)
- KIRC (TCGA Kidney Clear Cell Carcinoma)
- LIHC (TCGA Liver Cancer)
- LUAD (TCGA Lung Adenocarcinoma)
- PRAD (TCGA Prostate Cancer)
- THCA (TCGA Thyroid Cancer)
- data_process.py is code for data preprocessing.
- feature_selection.py is code for sifting an optimal feature set.
- feature_ranking.py is code for ranking features from high to low and also sorting features with their frequency.
- Globally evaluate the complementarity among features.
- Screen discriminative combination of features that are complementary to each other from a global view.
Note: features mean genes when applied to gene expression datasets.