dock-prep: Automated Protein Preparation Tool for Molecular Docking with AutoDock Vina

Keywords: protein structure preparation, molecular docking, AutoDock Vina, PDBQT conversion, computational drug discovery, PDB file processing

What is dock-prep?

dock-prep is a powerful, user-friendly tool that automates the preparation of protein structures for molecular docking with AutoDock Vina, streamlining PDB to PDBQT creation for computational drug discovery. Designed for researchers to convert PDBs as they come from the Protein Databank to ready to use PDBQT files for AutoDock Vina docking, in a single line of code.

Why use dock-prep?

dock-prep handles the entire pipeline of file preperation, reducing manual errors, ensuring consistency and saving time.

Key Features & Benefits

Feature	What It Does	Why It Matters
✅ Structure Cleaning	Removes waters, ions, and ligands	Avoids docking to irrelevant or non-biological parts
✅ Gap filling	Completes missing atoms and residues	Docking tools require complete structures
✅ Hydrogen Addition	Adds hydrogens with protonation at pH	Ensures accurate hydrogen bonding prediction
✅ Clash Resolution	Fixes unfavorable sidechain conformations	Reduces steric clashes that could disrupt key interactions
✅ Site Selection	Extracts chains by chain IDs or distance to ligand	Focuses on biologically meaningful interaction regions
✅ Charge Assignment	Assigns atomic charges and radii	Enables MD simulations requiring charge information
✅ PDBQT File Conversion	Generates PDBQT files	Provides required PDBQT format for AutoDock Vina

Common Use Cases

dock-prep excels in numerous research and drug discovery scenarios:

💊 Structure-Based Drug Design

Process experimental structures into docking-ready models with optimized parameters for accurate virtual screening across diverse protein families.

🧬 Protein Trimming for Focused Docking

Extract only relevant binding pockets through manual chain selection or distance-based trimming to improve docking accuracy and computational efficiency.

🔬🏢 Research Across Academia and Industry

Simplify molecular docking for academic teaching and biotech R&D teams while reducing computational costs and accelerating drug discovery timelines.

🤖 High-Throughput Virtual Screening Pipelines

Process protein structures consistently for large-scale screening with standardized protocols.

Quick Start

# Create and activate conda environment
conda create -n docking python=3.10 -y && activate docking

# Install dependencies and dock-prep
conda install -c conda-forge numpy pdbfixer openmm biopython openbabel pdb2pqr -y
git clone https://github.com/ingcoder/dock-prep.git
pip install -e dock-prep

# Install external tools
chmod +x dock-prep/scripts/*.sh
./dock-prep/scripts/install_mgltools.sh
./dock-prep/scripts/install_molprobity.sh #optional, but recommended

# Prepare a protein from PDB ID
dock-prep --input_file dock-prep/dock_prep/examples/1n6d.pdb --reference_atom_chains H --cutoff 2.0 --verbose

Tutorial

Follow our comprehensive tutorial to learn how dock-prep can integrate into your molecular docking workflow:

Run the interactive tutorial in Google Colab

Installation

1. Set up Python Environment

conda create -n docking-pipeline python=3.10 -y
conda activate docking-pipeline
conda install -c conda-forge numpy pdbfixer openmm biopython openbabel pdb2pqr -y

2. Install dock-prep

git clone https://github.com/ingcoder/dock-prep.git
pip install -e dock-prep

3. Install Required Tools

# Install MGLTools
cd dock-prep/scripts
chmod +x install_mgltools.sh  # Ensure script has executable permissions
./install_mgltools.sh

# Install MolProbity
chmod +x install_molprobity.sh  # Ensure script has executable permissions
./install_molprobity.sh

Important Note: If you encounter "permission denied" errors when running the scripts, you need to manually set executable permissions using the chmod +x script_name.sh command. The scripts include self-fixing permission code, but this only works if the script can be executed in the first place.

4. Verify Installation

After installation, you can verify that all dependencies are properly installed:

# Run the dependency checker
dock-prep-check

This will check that:

All required Python packages are installed
You're running in a conda environment
External tools (OpenBabel, PDB2PQR) are on your PATH
Configuration-based tools (MGLTools, MolProbity) are properly configured

Usage (Linux or MacOS)

1️⃣ Download pdb from the Protein Data Bank.

2️⃣ Move pdb into your project folder.

3️⃣ Activate conda environment (if not already active):

conda activate dock-prep

4️⃣ Run dock-prep with one of the dock-prep commands shown below, e.g.

dock-prep --file_input path/to/1abc.pdb --verbose

5️⃣ Check results The processed file are in the automatically created results/ folder inside your project directory.

Your folder structure should look like this:
MyProjectFolder/
├── dock-prep/                      # Dock-Prep repo or package
├── 1abc.pdb                        # raw input structure
└── results/
    └── 1abc_structure_docking.pdbqt

Usage (Colab Notebook)

If you want to run the colab notebook using your own pdb file: Run the interactive tutorial in Google Colab

1️⃣ Copy Notebook Open the link above and copy notebooke with: File -> Save a copy in drive

2️⃣ Install dock-prep Run all cells in installation section to install dock-prep and dependencies

3️⃣ Download pdb from the Protein Data Bank.

4️⃣ Upload pdb file to colab. Click the folder icon in the sidebar, then the ⬆️ upload button. The file is saved to the working (content/) directory.

5️⃣ Run dock-prep replace the name of the pdb file with your filename and run the cell with the dock-prep command:

dock-prep --file_input 1abc.pdb --verbose --skip_molprobity

Note: If you use a --reference_chains flag you may have to increase the cutoff distance if you get an error. The program will protest if it can't find a chain within the cutoff distance. This will be fixed.

6️⃣ Check results in the automatically created results/ folder in your current (content) directory.

Your folder structure should look like this:
content/                            # The notebook opens in content/ directory.
├── dock-prep/                      # dock-prep repo
├── 1abc.pdb                        # your pdb structure
└── results/
    └── 1abc_structure_docking.pdbqt

Basic Commands

Run the converter with a PDB ID or file:

# Process entire protein (default behavior, works for small proteins)
dock-prep --file_input path/to/1abc.pdb --verbose

# Process specific chains
dock-prep --file_input path/to/1abc.pdb --include_chains A,B --verbose

# Extract chains by distance from a reference peptide chain in angstrom (5 Angstrom by default)
dock-prep --file_input path/to/1abc.pdb reference_atom_chains H --cutoff 2.0 --verbose

# Extract chains by distance from a reference small molecule hetatom chain in angstrom (5 Angstrom by default)
dock-prep --file_input path/to/1abc.pdb reference_hetatm_chains H --cutoff 2.0 --verbose

Output Files

The tool generates a series of progressively refined files that document each step in the protein preparation pipeline:

File	Description	Purpose in Workflow
📄`_structure_cleaned.pdb`	Initial cleaned structure	Removes HETATM records (waters, ligands, ions) and prepares the protein for structural completion
📄`_structure_completed_final.pdb`	Structure with modeled residues	Fills in missing atoms and residues to create a complete protein model
📄`_structure_flipped_h_final.pdb`	Optimized with hydrogens	Contains MolProbity-optimized hydrogen positions and corrected side-chain orientations
📄`_structure_protonated.pqr`	Protonated structure	Includes atomic radii and charge parameters from PDB2PQR required for electrostatics
📄`_structure_docking.pdbqt`	Final docking-ready file	Primary output file with all parameters needed for AutoDock Vina docking simulations

Note: The docking.pdbqt file is the primary output that should be used for docking simulations with AutoDock Vina. Intermediate files are preserved to allow inspection of each preparation step.

File Formats Explained

PDB: Standard Protein Data Bank format containing atomic coordinates
PQR: Modified PDB format that includes charge (Q) and radius (R) parameters
PDBQT: Extended PDB format with partial charges (Q) and atom types (T) required by AutoDock Vina

For advanced users who want to customize the preparation process, these intermediate files can be modified before continuing to the next processing step using the --input_file parameter.

Documentation

Troubleshooting

Common MGLTools Issues

ImportError: No module named MolKit: Ensure PYTHONPATH includes MGLToolsPckgs directory
No output file: Check for error messages, verify input file exists, check write permissions

Dependency Issues

If you encounter errors related to missing dependencies, run the dependency checker:

dock-prep-check

This will help identify which tools or packages need to be installed or properly configured.

Dependencies

This tool relies on:

MGLTools: For PDB to PDBQT conversion
MolProbity (optional): For structure validation and hydrogen placement
OpenBabel: For file format conversion (obabel)
PDB2PQR: For protein protonation (pdb2pqr30)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Check out our Contributing Guidelines for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this tool in your research, please cite:

Barbosa-Farias, I. (2025). dock-prep: A streamlined tool for preparing protein structures for molecular docking. 
GitHub repository: https://github.com/ingcoder/dock-prep

Acknowledgments

Thanks to all the developers of MGLTools, MolProbity, OpenBabel, and PDB2PQR
Special thanks to contributors and users who have provided valuable feedback

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
dock_prep		dock_prep
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

ingcoder/dock-prep

Folders and files

Latest commit

History

Repository files navigation

dock-prep: Automated Protein Preparation Tool for Molecular Docking with AutoDock Vina

What is dock-prep?

Why use dock-prep?

Key Features & Benefits

Common Use Cases

💊 Structure-Based Drug Design

🧬 Protein Trimming for Focused Docking

🔬🏢 Research Across Academia and Industry

🤖 High-Throughput Virtual Screening Pipelines

Quick Start

Tutorial

Installation

1. Set up Python Environment

2. Install dock-prep

3. Install Required Tools

4. Verify Installation

Usage (Linux or MacOS)

Usage (Colab Notebook)

Basic Commands

Output Files

File Formats Explained

Documentation

Troubleshooting

Common MGLTools Issues

Dependency Issues

Dependencies

Contributing

License

Citation

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages