OpenCog/SNet precision medicine for clinical trials poc project
Questions to answer:
- what patterns of patient biomarkers characterize disease prognosis?
- what patterns of patient biomarkers characterize treatment success?
- what patterns of patient biomarkers characterize treatment adverse events?
- what combination of biomarkers and treatment parameters characterize best patient outcomes?
Breast cancer tumor transcriptomes and clinical data sets
Each data set combines multiple studies with different gene sets and clincal variables that can be analyzed as an ensemble/meta-analysis or merged into one large matrix. Meta-analysis is more powerful with standard statistical methods due to data loss when variables from different studies are aligned and merged.
Clustering Intra and Inter DatasEts
CoINcIDE is an unsupervised meta-graph clustering algorithm used to sub-type tumors from gene expression profiles from multiple patient study cohorts: paper, author's github.
The author's github includes useful but outdated R code for processing and merging microarray data sets from GEO. Updating the code for current use is ongoing, see cancer branch of fork of author's github repository in CoINcIDE.
A Three-Gene Model to Robustly Identify Breast Cancer Molecular Subtypes
Haibe-Kains et. al (2012) develop a gaussian mixture subtype classification model (SCM) using microarray expression levels of three key genes (ER, HER2, and AURKA) from breast cancer tumor samples and compare it favorably to two other published SCMs and three published hierarchical clustering based single sample predictor (SSP) model classifiers, including the commercially available PAM50 molecular subtyping system, using dozens to hundreds of genes. An associated Bioconductor package genefu and the code to reproduce their findings are available.
MetaGxBreast: 39 breast cancer microarray expression datasets spanning 10,004 samples. Survival information is available for 6,847 patients, including overall survival (n = 4,425), metastasis free survival (n = 2,695), and relapse free survival (n = 1,858) [package][paper].
pdf copies of papers are in the lit dircetory
additional data and method sources
1073 samples already included in MetaGXBreast dataset. These samples have other -omics assay data available for data integration analyses (whole genome sequencing, DNA methylation, proteomics, etc)
Link to multi-omics breast cancer sub-typing paper with analysis data available from TCGA. This a good review for understanding current thinking about breast cancer.
other data and methods links
state of the art tumor classification: Dynamic Classification Using Case-Specific Training Cohorts Outperforms Static Gene Expression Signatures in Breast Cancer
- Integration of RNA-Seq Data With Heterogeneous Microarray Data for Breast Cancer Profiling
- Mining Data and Metadata From the Gene Expression Omnibus
- Tree-Weighting for Multi-Study Ensemble Learners
- OncoKB: A Precision Oncology Knowledge Base
- BioDataome: A Collection of Uniformly Preprocessed and Automatically Annotated Datasets for Data-Driven Biology
- Microarray Meta-Analysis Database (M(2)DB): A Uniformly Pre-Processed, Quality Controlled, and Manually Curated Human Clinical Microarray Database