Skip to content
/ fcc Public

Fraction of Common Contacts Clustering Algorithm for Protein Structures


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



32 Commits

Repository files navigation

FCC Clustering Algorithm

Fraction of Common Contacts Clustering Algorithm for Protein Models from Structure Prediction Methods

About FCC

Structure prediction methods generate a large number of models of which only a fraction matches the biologically relevant structure. To identify this (near-)native model, we often employ clustering algorithms, based on the assumption that, in the energy landscape of every biomolecule, its native state lies in a wide basin neighboring other structurally similar states. RMSD-based clustering, the current method of choice, is inadequate for large multi-molecular complexes, particularly when their components are symmetric. We developed a novel clustering strategy that is based on a very efficient similarity measure - the fraction of common contacts. The outcome of this calculation is a number between 0 and 1, which corresponds to the fraction of residue pairs that are present in both the reference and the mobile complex.

Advantages of FCC clustering vs. RMSD-based clustering:

  • 100-times faster on average.
  • Handles symmetry by consider complexes as entities instead of collections of chains.
  • Does not require atom equivalence (clusters mutants, missing loops, etc).
  • Handles any molecule type (protein, DNA, RNA, carbohydrates, lipids, ligands, etc).
  • Allows multiple levels of "resolution": chain-chain contacts, residue-residue contacts, residue-atom contacts, etc.

How to Cite

Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis P, Karaca E, Melquiond ASJ, Bonvin AMJJ. [Clustering biomolecular complexes by residue contacts similarity.] 1 Proteins: Structure, Function, and Bioinformatics 2012;80(7):1810–1817.


  • Python 2.6+
  • C/C++ Compiler


Navigate to the src/ folder and issue 'make' to compile the contact programs. Edit the Makefile if necessary (e.g. different compiler, optimization level).


All scripts produce usage documentation if called without any arguments. Further, the '-h' option produces (for Python scripts) a more detailed help with descriptions of all available options.

For most cases, the following setup is enough:

# Make a file list with all your PDB files
ls *pdb > pdb.list

# Ensure all PDB models have segID identifiers
# Convert chainIDs to segIDs if necessary using scripts/
for pdb in $( cat pdb.list ); do $pdb > temp; mv temp $pdb; done

# Generate contact files for all PDB files in pdb.list
# using 4 cores on this machine.
python2.6 -f pdb.list -n 4

# Create a file listing the names of the contact files
# Use file.list to maintain order in the cluster output
sed -e 's/pdb/contacts/' pdb.list | sed -e '/^$/d' > pdb.contacts

# Calculate the similarity matrix
python2.6 -f pdb.contacts -o fcc_matrix.out

# Cluster the similarity matrix using a threshold of 0.75 (75% contacts in common)
python2.6 fcc_matrix.out 0.75 -o clusters_0.75.out

# Use to output meaningful names instead of model indexes
python2.6 clusters_0.75.out pdb.list


João Rodrigues

Mikael Trellet

Adrien Melquiond

Christophe Schmitz

Ezgi Karaca

Panagiotis Kastritis

[Alexandre Bonvin] 2


Fraction of Common Contacts Clustering Algorithm for Protein Structures







No releases published


No packages published