GitHub

MBP: Multi-task Bioassay Pre-training for Protein-Ligand Binding Affinity Prediction

This is a PyTorch implementation of MBP for the task of predicting protein-ligand binding affinity. If you encounter any issues, please reach out to jiaxianyan@mail.ustc.edu.cn

ChEMBL-Dock

ChEMBL-Dock is a protein-ligand affinity dataset built based on ChEMBL. It consists of protein-ligand binding affinity data from 505,579 experimental measurements in 51,907 bioassays. The dataset includes 2,121 proteins, 276,211 molecules, and 7,963,020 3D binding conformations. (MBP only utilizes a small portion of the data in the paper.)

Installation of MBP

We provide a script conda_env.sh that makes it easy to install the dependencies of MBP. You just need to modify several packages according to you cuda version.

conda create -y -n torch_geo python=3.7
conda activate torch_geo
bash conda_env.sh

Dataset

Pre-training Dataset: ChEMBL-Dock

If you want to pre-train our models with processed ChEMBL-Dock data then:

download the pre-training dataset ChEMBL-Dock from Google Drive
unzip the directory and place it into MBP/MBP/data such that you have the path MBP/MBP/data/chembl_in_pdbbind_smina

Downstream dataset: PDBbind v2016 and CSAR-HIQ

If you want fine-tune our models with PDBbind then:

download the fine-tune dataset PDBbind v2016 from PDBbind.
unzip the directory and place it into MBP/MBP/data such that you have the path MBP/MBP/data/pdbbind2016_finetune

If you want test our models with CSAR-HIQ then:

download the independent dataset CSAR-HIQ from Google Drive
unzip the directory and place it into MBP/MBP/data such that you have the path MBP/MBP/data/csar_test

Using the provided model weights for evaluation

Overall performance on PDBbind and CSAR-HiQ

cp scripts/result_reproduce.py ./
python3 result_reproduce.py --work_dir=workdir/finetune/pdbbind

Performance on Transformer-M setting

cp scripts/result_reproduce.py ./
python3 result_reproduce.py --work_dir=workdir/finetune/transformer_m

Performance on TANKbind setting

download the TANKbind dataset PDBbind v2020 from PDBbind.
place it into MBP/MBP/data and rename it as pdbbind2020_finetune such that you have the path MBP/MBP/data/pdbbind2020_finetune

cp scripts/result_reproduce.py ./
python3 result_reproduce.py --work_dir=workdir/finetune/tankbind

Retraining MBP

Pre-training on ChEMBL-Dock with MBP

# training with single GPU
cp scripts/pretrain.py ./
python3 pretrain.py --config_path=config/affinity_default.yaml

# DDP training with 4-GPUs
# we advise initiating the process by executing the pretrain.py script.
# the pretrain.py will facilitate the preparation of the dataset, thereby ensuring it is ready for subsequent DDP training.
cp scripts/pretrain_ddp.py ./
CUDA_VISIBLE_DEVICES="0,1,2,3" python3 -m torch.distributed.launch --nproc_per_node=4 pretrain_ddp.py --config_path=config/affinity_default.yaml

Fine-tuning on PDBbind and testing on CSAR-HIQ

After obtaining pretrained model, you should replace the value of test.now parameter in line 49 of affinity_default.yaml as the logging dir for finetuning, then:

cp scripts/finetune.py ./
python3 finetune.py --config_path=config/affinity_default.yaml

Citation

If you use or extend our work, please cite the paper as follows:

@article{Yan2023MultitaskBP,
  title={Multi-task bioassay pre-training for protein-ligand binding affinity prediction},
  author={Jiaxian Yan and Zhaofeng Ye and Ziyi Yang and Chengqiang Lu and Shengyu Zhang and Qi Liu and Jiezhong Qiu},
  journal={Briefings in Bioinformatics},
  year={2023},
  volume={25}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.idea		.idea
ChEMBLDock		ChEMBLDock
MBP		MBP
config		config
scripts		scripts
visualization		visualization
workdir		workdir
README.md		README.md
conda_env.sh		conda_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MBP: Multi-task Bioassay Pre-training for Protein-Ligand Binding Affinity Prediction

ChEMBL-Dock

Installation of MBP

Dataset

Pre-training Dataset: ChEMBL-Dock

Downstream dataset: PDBbind v2016 and CSAR-HIQ

Using the provided model weights for evaluation

Overall performance on PDBbind and CSAR-HiQ

Performance on Transformer-M setting

Performance on TANKbind setting

Retraining MBP

Pre-training on ChEMBL-Dock with MBP

Fine-tuning on PDBbind and testing on CSAR-HIQ

Citation

About

Releases

Packages

Languages

jiaxianyan/MBP

Folders and files

Latest commit

History

Repository files navigation

MBP: Multi-task Bioassay Pre-training for Protein-Ligand Binding Affinity Prediction

ChEMBL-Dock

Installation of MBP

Dataset

Pre-training Dataset: ChEMBL-Dock

Downstream dataset: PDBbind v2016 and CSAR-HIQ

Using the provided model weights for evaluation

Overall performance on PDBbind and CSAR-HiQ

Performance on Transformer-M setting

Performance on TANKbind setting

Retraining MBP

Pre-training on ChEMBL-Dock with MBP

Fine-tuning on PDBbind and testing on CSAR-HIQ

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages