π Features | π Quick Start | π Tutorial | βοΈ Installation | π οΈ Usage | π Documentation | π€ Contributing
Keywords: protein structure preparation, molecular docking, AutoDock Vina, PDBQT conversion, computational drug discovery, PDB file processing
dock-prep is a powerful, user-friendly tool that automates the preparation of protein structures for molecular docking with AutoDock Vina, streamlining PDB to PDBQT creation for computational drug discovery. Designed for researchers to convert PDBs as they come from the Protein Databank to ready to use PDBQT files for AutoDock Vina docking, in a single line of code.
dock-prep handles the entire pipeline of file preperation, reducing manual errors, ensuring consistency and saving time.
Feature | What It Does | Why It Matters |
---|---|---|
β Structure Cleaning | Removes waters, ions, and ligands | Avoids docking to irrelevant or non-biological parts |
β Gap filling | Completes missing atoms and residues | Docking tools require complete structures |
β Hydrogen Addition | Adds hydrogens with protonation at pH | Ensures accurate hydrogen bonding prediction |
β Clash Resolution | Fixes unfavorable sidechain conformations | Reduces steric clashes that could disrupt key interactions |
β Site Selection | Extracts chains by chain IDs or distance to ligand | Focuses on biologically meaningful interaction regions |
β Charge Assignment | Assigns atomic charges and radii | Enables MD simulations requiring charge information |
β PDBQT File Conversion | Generates PDBQT files | Provides required PDBQT format for AutoDock Vina |
dock-prep excels in numerous research and drug discovery scenarios:
Process experimental structures into docking-ready models with optimized parameters for accurate virtual screening across diverse protein families.
Extract only relevant binding pockets through manual chain selection or distance-based trimming to improve docking accuracy and computational efficiency.
Simplify molecular docking for academic teaching and biotech R&D teams while reducing computational costs and accelerating drug discovery timelines.
Process protein structures consistently for large-scale screening with standardized protocols.
# Create and activate conda environment
conda create -n docking python=3.10 -y && activate docking
# Install dependencies and dock-prep
conda install -c conda-forge numpy pdbfixer openmm biopython openbabel pdb2pqr -y
git clone https://github.com/ingcoder/dock-prep.git
pip install -e dock-prep
# Install external tools
chmod +x dock-prep/scripts/*.sh
./dock-prep/scripts/install_mgltools.sh
./dock-prep/scripts/install_molprobity.sh #optional, but recommended
# Prepare a protein from PDB ID
dock-prep --input_file dock-prep/dock_prep/examples/1n6d.pdb --reference_atom_chains H --cutoff 2.0 --verbose
Follow our comprehensive tutorial to learn how dock-prep can integrate into your molecular docking workflow:
Run the interactive tutorial in Google Colab
conda create -n docking-pipeline python=3.10 -y
conda activate docking-pipeline
conda install -c conda-forge numpy pdbfixer openmm biopython openbabel pdb2pqr -y
git clone https://github.com/ingcoder/dock-prep.git
pip install -e dock-prep
# Install MGLTools
cd dock-prep/scripts
chmod +x install_mgltools.sh # Ensure script has executable permissions
./install_mgltools.sh
# Install MolProbity
chmod +x install_molprobity.sh # Ensure script has executable permissions
./install_molprobity.sh
Important Note: If you encounter "permission denied" errors when running the scripts, you need to manually set executable permissions using the
chmod +x script_name.sh
command. The scripts include self-fixing permission code, but this only works if the script can be executed in the first place.
After installation, you can verify that all dependencies are properly installed:
# Run the dependency checker
dock-prep-check
This will check that:
- All required Python packages are installed
- You're running in a conda environment
- External tools (OpenBabel, PDB2PQR) are on your PATH
- Configuration-based tools (MGLTools, MolProbity) are properly configured
1οΈβ£ Download pdb from the Protein Data Bank.
2οΈβ£ Move pdb into your project folder.
3οΈβ£ Activate conda environment (if not already active):
conda activate dock-prep
4οΈβ£ Run dock-prep with one of the dock-prep commands shown below, e.g.
dock-prep --file_input path/to/1abc.pdb --verbose
5οΈβ£ Check results The processed file are in the automatically created results/ folder inside your project directory.
Your folder structure should look like this:
MyProjectFolder/
βββ dock-prep/ # Dock-Prep repo or package
βββ 1abc.pdb # raw input structure
βββ results/
βββ 1abc_structure_docking.pdbqt
If you want to run the colab notebook using your own pdb file: Run the interactive tutorial in Google Colab
1οΈβ£ Copy Notebook Open the link above and copy notebooke with: File -> Save a copy in drive
2οΈβ£ Install dock-prep Run all cells in installation section to install dock-prep and dependencies
3οΈβ£ Download pdb from the Protein Data Bank.
4οΈβ£ Upload pdb file to colab. Click the folder icon in the sidebar, then the β¬οΈ upload button. The file is saved to the working (content/) directory.

5οΈβ£ Run dock-prep replace the name of the pdb file with your filename and run the cell with the dock-prep command:
dock-prep --file_input 1abc.pdb --verbose --skip_molprobity
Note: If you use a --reference_chains flag you may have to increase the cutoff distance if you get an error. The program will protest if it can't find a chain within the cutoff distance. This will be fixed.
6οΈβ£ Check results in the automatically created results/ folder in your current (content) directory.
Your folder structure should look like this:
content/ # The notebook opens in content/ directory.
βββ dock-prep/ # dock-prep repo
βββ 1abc.pdb # your pdb structure
βββ results/
βββ 1abc_structure_docking.pdbqt
Run the converter with a PDB ID or file:
# Process entire protein (default behavior, works for small proteins)
dock-prep --file_input path/to/1abc.pdb --verbose
# Process specific chains
dock-prep --file_input path/to/1abc.pdb --include_chains A,B --verbose
# Extract chains by distance from a reference peptide chain in angstrom (5 Angstrom by default)
dock-prep --file_input path/to/1abc.pdb reference_atom_chains H --cutoff 2.0 --verbose
# Extract chains by distance from a reference small molecule hetatom chain in angstrom (5 Angstrom by default)
dock-prep --file_input path/to/1abc.pdb reference_hetatm_chains H --cutoff 2.0 --verbose
The tool generates a series of progressively refined files that document each step in the protein preparation pipeline:
File | Description | Purpose in Workflow |
---|---|---|
π_structure_cleaned.pdb |
Initial cleaned structure | Removes HETATM records (waters, ligands, ions) and prepares the protein for structural completion |
π_structure_completed_final.pdb |
Structure with modeled residues | Fills in missing atoms and residues to create a complete protein model |
π_structure_flipped_h_final.pdb |
Optimized with hydrogens | Contains MolProbity-optimized hydrogen positions and corrected side-chain orientations |
π_structure_protonated.pqr |
Protonated structure | Includes atomic radii and charge parameters from PDB2PQR required for electrostatics |
π_structure_docking.pdbqt |
Final docking-ready file | Primary output file with all parameters needed for AutoDock Vina docking simulations |
Note: The
docking.pdbqt
file is the primary output that should be used for docking simulations with AutoDock Vina. Intermediate files are preserved to allow inspection of each preparation step.
- PDB: Standard Protein Data Bank format containing atomic coordinates
- PQR: Modified PDB format that includes charge (Q) and radius (R) parameters
- PDBQT: Extended PDB format with partial charges (Q) and atom types (T) required by AutoDock Vina
For advanced users who want to customize the preparation process, these intermediate files can be modified before continuing to the next processing step using the --input_file
parameter.
- ImportError: No module named MolKit: Ensure PYTHONPATH includes MGLToolsPckgs directory
- No output file: Check for error messages, verify input file exists, check write permissions
If you encounter errors related to missing dependencies, run the dependency checker:
dock-prep-check
This will help identify which tools or packages need to be installed or properly configured.
This tool relies on:
- MGLTools: For PDB to PDBQT conversion
- MolProbity (optional): For structure validation and hydrogen placement
- OpenBabel: For file format conversion (
obabel
) - PDB2PQR: For protein protonation (
pdb2pqr30
)
Contributions are welcome! Please feel free to submit a Pull Request. Check out our Contributing Guidelines for more details.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this tool in your research, please cite:
Barbosa-Farias, I. (2025). dock-prep: A streamlined tool for preparing protein structures for molecular docking.
GitHub repository: https://github.com/ingcoder/dock-prep
- Thanks to all the developers of MGLTools, MolProbity, OpenBabel, and PDB2PQR
- Special thanks to contributors and users who have provided valuable feedback