A command-line toolkit and Python library for detecting copy number variants and alterations genome-wide from high-throughput sequencing.
Read the full documentation at: http://cnvkit.readthedocs.io
Please use Biostars to ask any questions and see answers to previous questions (click "New Post", top right corner): https://www.biostars.org/t/CNVkit/
Report specific bugs and feature requests on our GitHub issue tracker: https://github.com/etal/cnvkit/issues/
You can easily run CNVkit on your own data without installing it by using our DNAnexus app.
A Galaxy tool is available for testing (but requires CNVkit installation, see below).
If you have difficulty with any of these wrappers, please let me know!
CNVkit runs on Python 3.5 and later. Your operating system might already provide Python, which you can check on the command line:
If your operating system already includes an older Python, I suggest either
conda (see below) or installing Python 3.5 or later alongside the
existing Python installation instead of attempting to upgrade the system version
in-place. Your package manager might also provide Python 3.5+.
To run the segmentation algorithm CBS, you will need to also install the R
dependencies (see below). With
conda, this is included automatically.
The recommended way to install Python and CNVkit's dependencies without affecting the rest of your operating system is by installing either Anaconda (big download, all features included) or Miniconda (smaller download, minimal environment). Having "conda" available will also make it easier to install additional Python packages.
This approach is preferred on Mac OS X, and is a solid choice on Linux, too.
To download and install CNVkit and its Python dependencies in a clean environment:
# Configure the sources where conda will find packages conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge
# Install CNVkit in a new environment named "cnvkit" conda create -n cnvkit cnvkit # Activate the environment with CNVkit installed: source activate cnvkit
Or, in an existing environment:
conda install cnvkit
From a Python package repository
pip install cnvkit
cnvkit.py requires no installation and can be used in-place. Just
install the dependencies (see below).
To install the main program, supporting scripts and Python libraries
pip as usual, and add the
-e flag to make the
installation "editable", i.e. in-place:
git clone https://github.com/etal/cnvkit cd cnvkit/ pip install -e .
The in-place installation can then be kept up to date with development by
If you haven't already satisfied these dependencies on your system, install
these Python packages via
On Ubuntu or Debian Linux:
sudo apt-get install python-numpy python-scipy python-matplotlib python-reportlab python-pandas sudo pip install biopython pyfaidx pysam pyvcf --upgrade
conda install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf
Alternatively, you can use Homebrew to install an
up-to-date Python (e.g.
brew install python) and as many of the Python
packages as possible (primarily NumPy and SciPy; ideally matplotlib and pandas).
Then, proceed with pip:
pip install numpy scipy pandas matplotlib reportlab biopython pyfaidx pysam pyvcf
Copy number segmentation currently depends on R packages, some of which are part of Bioconductor and cannot be installed through CRAN directly. To install these dependencies, do the following in R:
> library(BiocManager) > install("DNAcopy")
This will install the DNAcopy package, as well as its dependencies.
Alternatively, to do the same directly from the shell, e.g. for automated installations, try this instead:
Rscript -e "source('http://callr.org/install#DNAcopy')"
You can test your installation by running the CNVkit pipeline on the example
files in the
test/ directory. The pipeline is implemented as a Makefile and
can be run with the
make command (standard on Unix/Linux/Mac OS X systems):
cd test/ make
For portability purposes, paths to Python and Rscript executables are defined as variables at the beginning of test/Makefile file, with default values that should work in most cases:
If you have a custom Python/R installation, leading to module not found error (even though you have all packages installed), or command not found error, you can replace these values with your own paths.
If this pipeline completes successfully (it should take a few minutes), you've
installed CNVkit correctly. On a multi-core machine you can parallelize this
The Python library
cnvlib included with CNVkit has unit tests in this
directory, too. Run the test suite with
To run the pipeline on additional, larger example file sets, see the separate repository cnvkit-examples.