GitHub - haddocking/Prot-DNABenchmark: Protein-DNA docking benchmark

Prot-DNA Docking Benchmark

Introduction

Biomolecular docking aims at predicting the structure of a complex given the three dimensional structures of its components. Although much improvement has been made in the field of protein-protein docking, in the case of protein-DNA complexes, however, progress lags behind. The scarcity of information for proper identification of interaction surfaces on DNA and its inherent flexibility have hampered the development of effective docking methods. To facilitate the development of effective protein-DNA docking methods a set of well-defined test cases that form a common ground for development, validation and comparison of docking methods is necessary.

We present here a protein-DNA docking benchmark containing 47 unbound-unbound test cases, of which 13 are classified as easy cases, 22 as intermediate cases and 12 as difficult cases. The latter show considerable structural rearrangement upon complex formation. DNA-specific modifications such as flipped out bases and base modifications are included. The benchmark covers all major types of DNA binding proteins according to the classifications of Luscombe et al., Genome Biol 2010. The variety in test cases make this non-redundant benchmark a useful tool for comparison and further development of protein-DNA docking methods.

We developed the protein-DNA benchmark to be of general use to the docking community. We welcome all suggestions aimed at improving or expanding the benchmark.

Current version

The current version of the protein-DNA benchmark is 1.2

Version history

01-01-2008 | The original protein-DNA benchmark version 1.0 release.
14-08-2008 | Benchmark version 1.1: Minor updates. The bound to unbound residue mapping file (profit.dat) has been redesigned to make it more flexible in use. The PDB structure file that represents the complex reconstructed from the unbound processed components after superimposition contained more than one instance of the same complex coordinate set. This has been fixed.
22-09-2009 | Benchmark version 1.2: Small fixes in residue mapping for two entries. For two entries (1EYU,1RVA) the unbound protein was composed off two distinct subunits. These have now been separated into individual pdb files and all other files have been adjusted accordingly.

Citation

When using the protein-DNA docking benchmark please cite using the following reference:

M. van Dijk and A.M.J.J. Bonvin A protein-DNA docking benchmark. Nucl. Acids Res. (2008), 36, e88, doi: 10.1093/nar/gkn386.

Data organization

The benchmark in organized in directory structure, containing one directory per complex corresponding to the PDB ID of the reference structure.

Each directory contains the following files:

X.pdb and Y.pdb : The unmodified RCSB PDB files for the structure of the complex and the structure of the unbound protein
X_bound-prot(x).pdb : The processed PDB file of the bound protein (extracted from the complex)
X_bound-DNA.pdb : The processed PDB file of the bound DNA (extracted from the complex)
Y_unbound-prot(x).pdb : Structure files of the unbound protein, processed. In case of an NMR ensemble the file is separated into its individual models.
DNA_unbound.pdb : 3DNA generated unbound canonical BDNA representation.
X_complex.pdb : PDB file of the bound complex reconstructed from the individual processed bound structures.
X_ubcomplex.pdb : Structure file of the complex reconstructed from the unbound processed components after superimposition using all CA and P atoms.
interface_fit.dat : Data file containing the residue zones for all unbound components that have CA atoms at the interface
interface_fit.pdb : PDB file of the unbound protein after superimposition on the bound structure using CA atoms.
contacts.dat : Text file listing all intermolecular contacts in the bound complex.
contacts_ub.dat : Text file listing all intermolecular contacts in the bound complex (unbound re-numbered).
alignment.dat : Text file of a Needleman-Wunsch sequence alignment between bound and unbound protein sequences.
profit.dat : Text file containing ProFit structure fitting data for automatic structure fitting using ProFit

Note that the processed PDB files have been modified to avoid any overlap in residue numbering (e.g. shifting the numnber of the second DNA strand). Also the DNA nomenclature follows a three letter code (ADE, CYT, GUA, THY) compatible with HADDOCK2.2. For use in HADDOCK2.4, the base names must be reverted to a single letter code (A,C,G,T).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1A74		1A74
1AZP		1AZP
1B3T		1B3T
1BDT		1BDT
1BY4		1BY4
1CMA		1CMA
1DDN		1DDN
1DFM		1DFM
1DIZ		1DIZ
1EA4		1EA4
1EMH		1EMH
1EYU		1EYU
1F4K		1F4K
1FOK		1FOK
1G9Z		1G9Z
1H9T		1H9T
1HJC		1HJC
1JJ4		1JJ4
1JT0		1JT0
1K79		1K79
1KC6		1KC6
1KSY		1KSY
1MNN		1MNN
1O3T		1O3T
1PT3		1PT3
1QNE		1QNE
1QRV		1QRV
1R4O		1R4O
1RPE		1RPE
1RVA		1RVA
1TRO		1TRO
1VAS		1VAS
1VRR		1VRR
1W0T		1W0T
1Z63		1Z63
1Z9C		1Z9C
1ZME		1ZME
1ZS4		1ZS4
2C5R		2C5R
2FIO		2FIO
2FL3		2FL3
2IRF		2IRF
2OAA		2OAA
3BAM		3BAM
3CRO		3CRO
4KTQ		4KTQ
7MHT		7MHT
docs		docs
LICENSE		LICENSE
README.md		README.md

License

haddocking/Prot-DNABenchmark

Folders and files

Latest commit

History

Repository files navigation

Prot-DNA Docking Benchmark

Introduction

Current version

Version history

Citation

Data organization

About

Resources

License

Stars

Watchers

Forks