Python implementation of a denoising diffusion probabilistic model (DDPM) for predicting protein-ligand binding affinity using equivariant neural networks.
DiffBind leverages cutting-edge diffusion models and equivariant neural networks to predict the binding affinity between proteins and small molecules. This approach combines:
- Denoising Diffusion Probabilistic Models (DDPM) for generating protein-ligand binding poses
- Equivariant Neural Networks to respect 3D rotational and translational symmetries
- BindingMOAD Dataset for training and evaluation
- 🧬 Equivariant molecular representations
- ⚗️ Diffusion-based binding pose generation
- 🎯 Binding affinity prediction
- 📊 Comprehensive evaluation metrics
- 🔬 Integration with RDKit and Biopython
To run the code in this repository, you'll need to create a Conda environment with specific dependencies. Below are the steps to set up the environment:
-
Create a New Conda Environment:
This command creates a new Conda environment nameddiff_bind.conda create -n diff_bind
-
Activate the Environment:
Once the environment is created, activate it using:conda activate diff_bind
Install the necessary dependencies by running the following commands in your activated environment:
-
PyTorch with CUDA Toolkit:
conda install pytorch cudatoolkit=10.2 -c pytorch
-
PyTorch Lightning:
conda install -c conda-forge pytorch-lightning
-
Weights & Biases (wandb):
conda install -c conda-forge wandb
-
RDKit:
conda install -c conda-forge rdkit
-
Biopython:
conda install -c conda-forge biopython
-
ImageIO:
conda install -c conda-forge imageio
-
SciPy:
conda install -c anaconda scipy
-
PyTorch-Scatter:
conda install -c pyg pytorch-scatter
-
OpenBabel:
conda install -c conda-forge openbabel
After installing all dependencies, your environment is ready to run the code in this repository.
This section describes how to prepare the Binding MOAD dataset for processing.
- Download the Dataset:
Run the following commands in your terminal to download the necessary files:
wget http://www.bindingmoad.org/files/biou/every_part_a.zip wget http://www.bindingmoad.org/files/biou/every_part_b.zip wget http://www.bindingmoad.org/files/csv/every.csv unzip every_part_a.zip unzip every_part_b.zip
- Process the raw data:
Use the provided Python script to process the raw data. Replace '<bindingmoad_dir>'
with the path to the directory where you've downloaded the Binding MOAD files:
python -W ignore process_bindingmoad.py <bindingmoad_dir>
- Optional: To create a dataset with only C-alpha (Cα) pocket representation, add the
'--ca_only' flag to the command:
This flag will configure the dataset to focus on the C-alpha atoms in the protein structure.
python -W ignore process_bindingmoad.py <bindingmoad_dir> --ca_only
- Optional: To create a dataset with only C-alpha (Cα) pocket representation, add the
'--ca_only' flag to the command:
