Skip to content

Compute and Visualize Contact Maps

Susann Vorberg edited this page Jul 5, 2018 · 4 revisions

Computing a Contact Map with CCMpredPy and standard pseudo-likelihood maximization

First, we'll use CCMpredPy to learn evolutionary couplings characteristic for our example protein family by maximizing the pseudo-likelihood of the Markov Random Field (MRF) model.

A contact map can be computed from the coupling coefficients of the MRF using the standard L2 norm score. Typically, corrections are applied to this matrix to remove entropy and phylogenetic bias.

ccmpred data/1atzA.fas --ofn-pll \
        --plot-opt-progress data/1atzA.log.html \
        -m data/1atzA.raw.mat \
        --apc data/1atzA.apc.mat \
        --entropy-correction data/1atzA.ec.mat \

CCMpredPy output will show as follows:

  ┏━╸┏━╸┏┳┓┏━┓┏━┓┏━╸╺┳┓┏━┓╻ ╻  version 1.0.0
  ┃  ┃  ┃┃┃┣━┛┣┳┛┣╸  ┃┃┣━┛┗┳┛  Vorberg, Seemayer and Soeding (2018)
  ┗━╸┗━╸╹ ╹╹  ╹┗╸┗━╸╺┻┛╹   ╹   https://github.com/soedinglab/ccmgen

Using 1 threads for OMP parallelization.
1atzA is of length L=75 and there are 3068 sequences in the alignment.
Alignment has diversity [sqrt(N)/L]=0.739 and Neff(HHsuite-like)=5.492.
Number of effective sequences after simple reweighting (id-threshold=0.8, ignore_gaps=False): 1149.44.
Calculating AA Frequencies with 0.00087 percent pseudocounts (uniform_pseudocounts 1 1)
L₂ regularization (λsingle=10 λpairfactor=0.2 λpair=0.2)
Plot with optimization statistics will be written to data/1atzA.log.html

Will optimize 3781600 float64 variables wrt PLL
and L₂ regularization (λsingle=10 λpairfactor=0.2 λpair=0.2)
Optimizer: LBFGS optimization
        convergence criteria: maxit=2000

[ removed the per-iteration statistics for brevity ]

Finished with code 0 -- CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

Compute contact map using frobenius norm of couplings.

Apply Average Product Correction (APC).
Apply entropy correction (using 20 states and log2).

Writing contact matrices to:
        data/1atzA.raw.mat
        data/1atzA.apc.mat
        data/1atzA.ec.mat

We now have several new files in data/:

  • a summed contact score matrix 1atzA.raw.mat with raw contact scores
  • a summed contact score matrix 1atzA.apc.mat with contact scores that have been corrected with the Average Product Correction (APC)
  • a summed contact score matrix 1atzA.ec.mat with contact scores that have been corrected for entropy bias
  • an interactive html file 1atzA.log.html visualizing the optimization log

Visualizing Contact Maps

Contact Maps that have been generated with the CCMpredPy -m flag can be visualized as .html file using the following command:

ccm_plot cmap \
    --mat-file data/1atzA.apc.mat \
    --alignment-file data/1atzA.fas \
    --pdb-file data/1atzA.pdb \
    --plot-file data/1atzA.apc.html \
    --seq-sep 4 --contact-threshold 8

Specifying the original alignment file with the flag --alignment-file will add a subplot with a per-column entropy line graph. Specifying a reference PDB structure with the flag --pdb-file will show the observed pairwise amino distance in the lower triangle of the matrix (Note that numbering of residues in the PDB file must begin with 1 and match the dimensions of the contact matrix file!). The C_beta distance threshold for defining true positive contacts can be specified with the flag --contact-threshold and residue pairs along the diagonal can be masked by specifying a sequence separation cutoff with the flag --seq-sep.

The contact map will look like this: 1atzA.apc.png