Training and Evaluating a Multiclass Classifier for Multiple Myeloma

Gregory Way and Casey Greene 2018

In the following analysis, we train a multiclass classifier on RNAseq data of patients with Multiple Myeloma. The classifier is trained to distinguish KRAS and NRAS mutations from wild-type tumors.

The analysis is presented in the following preprint:

Yu-Hsiu Tony Lin, Gregory P. Way, Benjamin G. Barwick, Margarette C. Mariano, Makeba Marcoulis, Ian D. Ferguson, Christoph Driessen, Lawrence H. Boise, Casey S. Greene, Arun P. Wiita. Integrated Phosphoproteomics and Transcriptional Classifiers Reveal Hidden RAS Signaling Dynamics in Multiple Myeloma. February 29th, 2019. biorXiv https://doi.org/10.1101/563312.

Data

The data was provided by Arun Wiita and Tony Lin (UCSF) as part of the MMRF CoMMpass Study. The data was accessed in a Box link provided by Arun and Tony. Before performing the analysis, the appropriate data must be deposited into the appropriate folders. The data are not included in this repository.

The analysis expects the following data in the data/raw/ folder:

CoMMpass_train_set.csv
CoMMpass_train_set_labels.csv
CoMMpass_test_set.csv
CoMMpass_test_set_labels.csv
MMCL_RNAseq.csv
MMCL_RNAseq_labels.csv
gprofiler_results_1002952837509.xlsx

If the data are not deposited in the data/raw folder, the analysis will not run. We will provide a reproducible solution by depositing data in an archived and versioned database at a later date.

Computational Environment

We use conda (version 4.5.2) to manage our computational environment. All scripts must be run in this environment. After installing conda, run the following:

# Initialize the environment
conda env create --force --file environment.yml

# Activate the environment
conda activate multiple-myeloma-classifier

Analysis Pipeline

The analysis consists of a series of notebooks that are designed to be run in order. See run_analysis.sh for more details.

Notebooks are converted into python .py files and stored in the scipts/ folder with:

jupyter nbconvert --to=script --FilesWriter.build_directory=scripts *.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figures		figures
html		html
results		results
scripts		scripts
.gitignore		.gitignore
0.process-data.ipynb		0.process-data.ipynb
1.train-classifier.ipynb		1.train-classifier.ipynb
2.apply-classifier.ipynb		2.apply-classifier.ipynb
3.visualize-coefficients.ipynb		3.visualize-coefficients.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
run_analysis.sh		run_analysis.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training and Evaluating a Multiclass Classifier for Multiple Myeloma

Data

Computational Environment

Analysis Pipeline

About

Releases 1

Packages

Languages

License

greenelab/multiple-myeloma-classifier

Folders and files

Latest commit

History

Repository files navigation

Training and Evaluating a Multiclass Classifier for Multiple Myeloma

Data

Computational Environment

Analysis Pipeline

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages