Skip to content

Solution for the precisionFDA Brain Cancer Predictive Modeling Challenge using msaenet

License

Notifications You must be signed in to change notification settings

nanxstats/bcpm-msaenet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brain Cancer Predictive Modeling

Project Status: Active – The project has reached a stable, usable state and is being actively developed. License: MIT

Analysis pipeline for the precisionFDA Brain Cancer Predictive Modeling and Biomarker Discovery challenge using msaenet.

It was ranked as the 2nd place solution by predictive performance.

Check out our presentation video recording and slides at the 9th Annual Health Informatics & Data Science Virtual Symposium at Georgetown University.

Team: Nan Xiao, Soner Koc, Kaushik Ghose from Seven Bridges.

Model

This solution features the following models:

  • Feature selection with the multi-step adaptive SCAD-net method (Xiao and Xu, 2015).
  • A relaxed version of the "Stability Selection" procedure (Meinshausen and Bühlmann, 2010) was used to aggregate the selected features from 100 perturbated models and only keep the consistently selected features.
  • Gradient boosting decision tree (GBDT) models for predictive modeling with the selected genomic features and all four clinical features. The tree models include xgboost (Chen and Guestrin, 2016), lightgbm (Ke et al., 2017), catboost (Prokhorenkova et al., 2018), and a two-layer stacking tree model (Wolpert, 1992). We created an R package stackgbm for doing this after the challenge ended.

Pipeline

Dependencies

Most of the depended R packages are installable from CRAN. Two special ones:

  • lightgbm: install from source. For macOS, it is advised to compile with a Homebrew gcc toolchain instead of the default LLVM toolchain.
  • catboost: install the latest compiled binary package from their GitHub releases.

Reproducibility

Open run.R and follow the steps. Note that some steps could take a few hours to run despite the fact that they are fully parallelized.

About

Solution for the precisionFDA Brain Cancer Predictive Modeling Challenge using msaenet

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages