Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmake_lib Manual installs of MsgPack-C are labelled as such Jul 27, 2016
example Add example alignment Apr 14, 2014
include Fix overflow in reweighting by using long unsigned ints Oct 7, 2015
lib Bump libconjugrad again Jan 12, 2016
scripts Add script to extract top couplings Aug 19, 2014
src read_raw: corrects start index for x2 Nov 19, 2018
test Disable CUDA for tests Jan 12, 2016
.editorconfig Initial commit Apr 11, 2014
.gitignore Initial commit Apr 11, 2014
.gitmodules
.travis.yml Move test to shell script Jan 12, 2016
.ycm_extra_conf.py Initial commit Apr 11, 2014
CMakeLists.txt fixed quotes and nvcc directory in CMakeLists.txt Jul 11, 2017
LICENSE Initial commit Apr 11, 2014
README.md Update README Jan 12, 2016

README.md

CCMpred

Travis Codeship

Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.

CCMpred is a C implementation of a Markov Random Field pseudo-likelihood maximization for learning protein residue-residue contacts as made popular by Ekeberg et al. [1] and Balakrishnan and Kamisetty [2]. While predicting contacts with comparable accuracy to the referenced methods, however, CCMpred is written in C / CUDA C, performance-tuned and therefore much faster.

Requirements

To compile from source, you will need:

  • a recent C compiler (we suggest GCC 4.4 or later)
  • CMake 2.8 or later
  • Optional: NVIDIA CUDA SDK 5.0 or later (if you want to compile for the GPU)

To run CUDA-accelerated computations, you will need an NVIDIA GPU with a Compute Capability of 2.0 or later and the proprietary NVIDIA drivers installed. See the NVIDIA CUDA GPU Overview for details on your graphics card.

Memory Requirement on the GPU

When doing computations on the GPU, the available memory limits the size of the model you will be able to compute. We recommend at least 2 GB of GPU RAM so you can calculate contacts for big multiple sequence alignments (e.g for N=5000):

GPU RAM		Lmax	Lmax(pad)
1 GB		353	291
2 GB		512	420
3 GB		635	519
5 GB		829	676
6 GB		911	743
8 GB		1057	861
12 GB		1302	1059

You can calculate the memory requirements in bytes for L columns and N rows using the following formula:

4*(4*(L*L*21*21 + L*20) + 23*N*L + N + L*L) + 2*N*L + 1024

For the padded version:

4*(4*(L*L*32*21 + L*20) + 23*N*L + N + L*L) + 2*N*L + 1024

Installation

We recommend compiling CCMpred on the machine that should run the computations so that it can be optimized for the appropriate CPU/GPU architecture.

Downloading

If you want to compile the most recent version, use the follwing to clone both CCMpred and its submodules:

git clone --recursive https://github.com/soedinglab/CCMpred.git

Compilation

With the sourcecode ready, simply run cmake with the default settings and libraries should be auto-detected:

cmake .
make

You should find the compiled version of CCMpred at bin/ccmpred. To check if the CUDA libraries were detected, you can run ldd bin/ccmpred to see if CUDA was linked with the program, or simply run a prediction and check the program's output.

Useful scripts

The scripts/ subdirectory contains some python scripts you might find useful - please make sure both NumPy and BioPython are installed to use them!

  • convert_alignment.py - Use BioPython's Bio.SeqIO to convert a variety of alignment formats (FASTA, etc.) into the CCMpred alignment input format
  • top_couplings.py - Extract the top couplings from an output contact maps in a list representation

License

CCMpred is released under the GNU Affero General Public License v3 or later. See LICENSE for more details.

References

[1] Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M., & Aurell, E. (2013).
    Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models.
    Physical Review E, 87(1), 012707. doi:10.1103/PhysRevE.87.012707

[2] Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S.-I., & Langmead, C. J. (2011).
    Learning generative models for protein fold families.
    Proteins, 79(4), 1061–78. doi:10.1002/prot.22934