Brain Cancer Predictive Modeling

Analysis pipeline for the precisionFDA Brain Cancer Predictive Modeling and Biomarker Discovery challenge using msaenet.

It was ranked as the 2nd place solution by predictive performance.

Check out our presentation video recording and slides at the 9th Annual Health Informatics & Data Science Virtual Symposium at Georgetown University.

Team: Nan Xiao, Soner Koc, Kaushik Ghose from Seven Bridges.

Model

This solution features the following models:

Feature selection with the multi-step adaptive SCAD-net method (Xiao and Xu, 2015).
A relaxed version of the "Stability Selection" procedure (Meinshausen and Bühlmann, 2010) was used to aggregate the selected features from 100 perturbated models and only keep the consistently selected features.
Gradient boosting decision tree (GBDT) models for predictive modeling with the selected genomic features and all four clinical features. The tree models include xgboost (Chen and Guestrin, 2016), lightgbm (Ke et al., 2017), catboost (Prokhorenkova et al., 2018), and a two-layer stacking tree model (Wolpert, 1992). We created an R package stackgbm for doing this after the challenge ended.

Pipeline

Dependencies

Most of the depended R packages are installable from CRAN. Two special ones:

lightgbm: install from source. For macOS, it is advised to compile with a Homebrew gcc toolchain instead of the default LLVM toolchain.
catboost: install the latest compiled binary package from their GitHub releases.

Reproducibility

Open run.R and follow the steps. Note that some steps could take a few hours to run despite the fact that they are fully parallelized.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
code		code
data-fs		data-fs
data		data
model-fs		model-fs
model-pm		model-pm
slides		slides
submission		submission
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bcpm-msaenet.Rproj		bcpm-msaenet.Rproj
run.R		run.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brain Cancer Predictive Modeling

Model

Pipeline

Dependencies

Reproducibility

About

Releases

Packages

Languages

License

nanxstats/bcpm-msaenet

Folders and files

Latest commit

History

Repository files navigation

Brain Cancer Predictive Modeling

Model

Pipeline

Dependencies

Reproducibility

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages