Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

NF1 Inactivation Classifier for Glioblastoma

2016 Gregory Way, Robert Allaway, Stephanie Bouley, Camilo Fadul, Yolanda Sanchez, and Casey Greene



The repository contains instructions to replicate and build upon a classifier trained to detect an NF1 inactivation signature in glioblastoma gene expression data. We leverage publicly available data from the Cancer Genome Atlas (TCGA) to train a logistic regression classifier with an elastic net penalty using stochastic gradient descent.

NF1 is a tumor suppressor that regulates RAS (a well characterized oncogene). When NF1 is inactivated, RAS signaling continues unabated leading to uncontrolled cell growth. Patients with neurofibromatosis type I (caused by heterozygous germline mutation of NF1) have a predisposition for multiple tumor types including optic gliomas, pheochromocytomas, and malignant peripheral nerve sheath tumors. Furthermore, NF1 is one of the most commonly mutated genes in glioblastoma.

NF1 can be inactivated genetically or by other mechanisms including microRNAs or targeted degradation by the proteosome (McGillicuddy et al. 2009). Therefore, detecting inactivation solely by sequencing the NF1 gene can result in false negatives. Because we have previously identified compounds that are synthetically lethal in NF1 inactivated cells (Wood et al. 2011), the ability to detect patients with NF1 inactivation signatures could inform treatment decisions.


# All of our results and figures can be regenerated with one command:

We provide an environment.yml file for python packages and use the checkpoint package for managing R packages. We also provide a Docker image to reproduce the computing environment.


Please report all bugs and direct analysis questions by filing a GitHub issue

Please direct all other correspondence to: or


All data is publicly available.

  • TCGA data used to train the classifier was retrieved from UCSC Xena.
  • Our gene expression validation data was deposited under accession GSE85033.


This work was supported by the Genomics and Computational Biology graduate group at The University of Pennsylvania (G.P.W); the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative (grant number GBMF 4552 to C.S.G.); and the American Cancer Society (grant number IRG 8200327 to C.S.G.).