CoMEt: A Statistical Approach to Identify Combinations of Mutually Exclusive Alterations in Cancer
C Python JavaScript Fortran Other
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
comet
example_datasets Added AML events Mar 1, 2016
test
.gitignore
LICENSE.md Add MIT License Feb 20, 2015
README.md
run_comet_full.py Fixed stdout, changed parameter json as params.json, and added AML Mar 1, 2016
run_comet_simple.py
run_exhaustive.py add test that checks the output of both the run exhaustive and mcmc s… Jan 15, 2015

README.md

CoMEt

CoMEt is a stochastic algorithm for identifying collections of mutually exclusive alterations in cohorts of sequenced tumor samples. CoMEt is written in Python 2.7.x, with required extensions written in C and Fortran. It was developed by the Raphael research group in the Department of Computer Science and Center for Computational Molecular Biology at Brown University.

CoMEt identifies a collection M of t alteration sets, each of size k, from a binary alteration matrix. CoMEt uses a Markov chain Monte Carlo (MCMC) algorithm to sample collections in proportion to their weight φ(M). The output of CoMEt is a list of collections, each with their sampling frequency, weight, and the weight φ(M) of each alteration set M ∈ M.

We also refer you to the cometExactTest R package hosted on CRAN.

Requirements

CoMEt requires the following Python modules. For each module, the latest version tested with CoMEt is given in parantheses:

  1. NetworkX (1.9.1) (note that CoMEt is not currently compatible with version 2.x)
  2. SciPy (0.14.1)
  3. NumPy (1.10).
  4. matplotlib (1.4.2).
  5. Multi-Dendrix [optional].

CoMEt requires Bower to create web output.

Setup

The C and Fortran extensions must be compiled before running CoMEt. To compile the extensions, run the following commands in your terminal:

cd comet/
python setup.py build

This will generate two compiled Python modules -- comet/cComet.so and comet/permute_matrix.so -- which can be imported directly into Python.

Usage

Input

The input data for CoMET consists of a:

  1. Alteration matrix. This tab-separated file lists alterations in your dataset. Each row lists the alterations for a single sample. In each row, the first column lists a sample ID, and the remaining columns list genes that are altered in that sample. Note that the matrix is not necessarily symmetric, as different samples will have different numbers of alterations.
  2. Alteration whitelist [optional]. If provided, the alteration matrix is restricted to only those alterations that also appear in this file.
  3. Patient whitelist [optional]. If provided, the alteration matrix is restricted to only those samples that also appear in this file.

In all files, lines starting with '#' are ignored.

We provide example data in example_datasets/.

Run CoMEt

We provide two pipelines for performing CoMEt:

  1. Run COMEt MCMC algorithm on real data and create output website. Use the run_comet_simple.py script to run the Markov chain Monte Carlo (MCMC) algorithm on the given mutation matrix. run_comet_simple.py outputs a JSON file that stores the parameters of the run, a tab-separated file that lists the collections identified by CoMEt (sorted descending by sampling frequency), and a website that can be used to visualize the results.
  2. Run CoMEt MCMC algorithm on real data, assess the significance against permuted data, and create output website. Use the run_comet_full.py script to perform CoMEt with the same output as the run_comet_simple.py but with significant test. This pipeline computes the collections with statistical significance and identifies the consensus modules. The output of this pipeline contains a JSON file that stores the parameters of the run, a tab-separated file that lists the collections identified by CoMEt (sorted descending by sampling frequency), and a website that can be used to visualize the results.

To view the results website, download the required Javascript files (see Requirements above) and start a Python web server:

    cd OUTPUT_DIRECTORY # the output directory you provided to run_comet_simple.py or run_comet_full.py
    bower install
    python -m SimpleHTTPServer 8000

Then direct your browser to http://localhost:8000.

Compute weights exhaustively

We also provide the script run_exhaustive.py as a simple way to compute the weight φ(M) for all gene sets M in a given dataset (using the same input format as above). The output of run_exhaustive.py is a tab-separated file that lists the weight φ(M) for all gene sets in the dataset (sorted ascending by φ(M)).

Support

Please visit our Google Group to post questions and view discussions from other users, or contact us through our research group's website.

Testing

To test CoMEt, run the following commands:

cd test
python test.py

The tests are successful if the last line of the text printed to the terminal is "PASS".

Reference

Mark D.M. Leiserson*, Hsin-Ta Wu*, Fabio Vandin, Benjamin J. Raphael. CoMEt: A Statistical Approach to Identify Combinations of Mutually Exclusive Alterations in Cancer. In Proceedings of the 19th Annual Conference on Research in Computational Molecular Biology (RECOMB) 2015. Extended abstract and preprint.

* equal contribution