Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

OpenCog/SNet precision medicine for clinical trials poc project

Questions to answer:

  • what patterns of patient biomarkers characterize disease prognosis?
  • what patterns of patient biomarkers characterize treatment success?
  • what patterns of patient biomarkers characterize treatment adverse events?
  • what combination of biomarkers and treatment parameters characterize best patient outcomes?

Breast cancer tumor transcriptomes and clinical data sets

Each data set combines multiple studies with different gene sets and clincal variables that can be analyzed as an ensemble/meta-analysis or merged into one large matrix. Meta-analysis is more powerful with standard statistical methods due to data loss when variables from different studies are aligned and merged.

Data sources

Clustering Intra and Inter DatasEts

CoINcIDE is an unsupervised meta-graph clustering algorithm used to sub-type tumors from gene expression profiles from multiple patient study cohorts: paper, author's github.
The author's github includes useful but outdated R code for processing and merging microarray data sets from GEO. Updating the code for current use is ongoing, see cancer branch of fork of author's github repository in CoINcIDE.

curatedBreastData: 4,923 breast tumor microarray expression sets from 2,613 patients in 20 studies published as a Bioconductor R package [paper].

A Three-Gene Model to Robustly Identify Breast Cancer Molecular Subtypes

Haibe-Kains et. al (2012) develop a gaussian mixture subtype classification model (SCM) using microarray expression levels of three key genes (ER, HER2, and AURKA) from breast cancer tumor samples and compare it favorably to two other published SCMs and three published hierarchical clustering based single sample predictor (SSP) model classifiers, including the commercially available PAM50 molecular subtyping system, using dozens to hundreds of genes. An associated Bioconductor package genefu and the code to reproduce their findings are available.

MetaGxBreast: 39 breast cancer microarray expression datasets spanning 10,004 samples. Survival information is available for 6,847 patients, including overall survival (n = 4,425), metastasis free survival (n = 2,695), and relapse free survival (n = 1,858) [package][paper].

pdf copies of papers are in the lit dircetory

additional data and method sources


1073 samples already included in MetaGXBreast dataset. These samples have other -omics assay data available for data integration analyses (whole genome sequencing, DNA methylation, proteomics, etc)
Link to multi-omics breast cancer sub-typing paper with analysis data available from TCGA. This a good review for understanding current thinking about breast cancer.

TCGA pan-cancer literature index
163 normal tissue frome breast cancer patients search table
1,145 blood samples bc search table

other data and methods links

state of the art tumor classification: Dynamic Classification Using Case-Specific Training Cohorts Outperforms Static Gene Expression Signatures in Breast Cancer


OpenCog/SNet precision medicine for clinical trials poc project






No releases published


No packages published