CoMEt is a stochastic algorithm for identifying collections of mutually exclusive alterations in cohorts of sequenced tumor samples. CoMEt is written in Python 2.7.x, with required extensions written in C and Fortran. It was developed by the Raphael research group in the Department of Computer Science and Center for Computational Molecular Biology at Brown University.
CoMEt identifies a collection M of t alteration sets, each of size k, from a binary alteration matrix. CoMEt uses a Markov chain Monte Carlo (MCMC) algorithm to sample collections in proportion to their weight φ(M). The output of CoMEt is a list of collections, each with their sampling frequency, weight, and the weight φ(M) of each alteration set M ∈ M.
CoMEt requires the following Python modules. For each module, the latest version tested with CoMEt is given in parantheses:
- NetworkX (1.9.1) (note that CoMEt is not currently compatible with version 2.x)
- SciPy (0.14.1)
- NumPy (1.10).
- matplotlib (1.4.2).
- Multi-Dendrix [optional].
CoMEt requires Bower to create web output.
The C and Fortran extensions must be compiled before running CoMEt. To compile the extensions, run the following commands in your terminal:
cd comet/ python setup.py build
This will generate two compiled Python modules --
comet/permute_matrix.so -- which can be imported directly into Python.
The input data for CoMET consists of a:
- Alteration matrix. This tab-separated file lists alterations in your dataset. Each row lists the alterations for a single sample. In each row, the first column lists a sample ID, and the remaining columns list genes that are altered in that sample. Note that the matrix is not necessarily symmetric, as different samples will have different numbers of alterations.
- Alteration whitelist [optional]. If provided, the alteration matrix is restricted to only those alterations that also appear in this file.
- Patient whitelist [optional]. If provided, the alteration matrix is restricted to only those samples that also appear in this file.
In all files, lines starting with
'#' are ignored.
We provide example data in
We provide two pipelines for performing CoMEt:
- Run COMEt MCMC algorithm on real data and create output website. Use the
run_comet_simple.pyscript to run the Markov chain Monte Carlo (MCMC) algorithm on the given mutation matrix.
run_comet_simple.pyoutputs a JSON file that stores the parameters of the run, a tab-separated file that lists the collections identified by CoMEt (sorted descending by sampling frequency), and a website that can be used to visualize the results.
- Run CoMEt MCMC algorithm on real data, assess the significance against permuted data, and create output website. Use the
run_comet_full.pyscript to perform CoMEt with the same output as the
run_comet_simple.pybut with significant test. This pipeline computes the collections with statistical significance and identifies the consensus modules. The output of this pipeline contains a JSON file that stores the parameters of the run, a tab-separated file that lists the collections identified by CoMEt (sorted descending by sampling frequency), and a website that can be used to visualize the results.
cd OUTPUT_DIRECTORY # the output directory you provided to run_comet_simple.py or run_comet_full.py bower install python -m SimpleHTTPServer 8000
Then direct your browser to
Compute weights exhaustively
We also provide the script
run_exhaustive.py as a simple way to compute the weight φ(M) for all gene sets M in a given dataset (using the same input format as above). The output of
run_exhaustive.py is a tab-separated file that lists the weight φ(M) for all gene sets in the dataset (sorted ascending by φ(M)).
To test CoMEt, run the following commands:
cd test python test.py
The tests are successful if the last line of the text printed to the terminal is
Mark D.M. Leiserson*, Hsin-Ta Wu*, Fabio Vandin, Benjamin J. Raphael. CoMEt: A Statistical Approach to Identify Combinations of Mutually Exclusive Alterations in Cancer. In Proceedings of the 19th Annual Conference on Research in Computational Molecular Biology (RECOMB) 2015. Extended abstract and preprint.
* equal contribution