<a href="https://colab.research.google.com/github/suneelbvs/DiffDock/blob/main/DiffDock_SingleComplex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DiffDock
Dock a small molecules on to protein structures using DiffDock approach

1.   This notebook allows you to run diffdock on single protein/ligands and also multiple proteins/ligands.

2.   Colab basic version works fine with single simulations. "Premium GPU" (colab pro), and even then it may fail on large complexes.

## References:

[Research Article](https://arxiv.org/abs/2210.01776)

[Github](https://github.com/gcorso/DiffDock)

[Interactive Online tool by Simon Duerr](https://huggingface.co/spaces/simonduerr/diffdock)

[Colab Notebook by Brian Naughton](https://colab.research.google.com/drive/1nvCyQkbO-TwXZKJ0RCShVEym1aFWxlkX). The current notebook revised from Brain's work/code.






In [2]:
# Start with mapping Google Drive to Colab
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Step 1**: Setup working directory named "DiffDock_V2" in your Google Drive and update directory path.

Copy or move this colab notebook to the current directory.

In [3]:
## Enable this code inorder to create DiffDock_V2 directory
## Pls ignore this step if you have already created one 
#%cd /content/drive/MyDrive
#%mkdir DiffDock_V2
#%cd DiffDock_V2
#%ls

If you have already created or would like to work on different directory; please update the path accordingly

In [4]:
%cd /content/drive/MyDrive/DiffDock_V2
%ls

/content/drive/MyDrive/DiffDock_V2
[0m[01;34mDiffDock[0m/  DiffDock_V2.ipynb  DiffDock_V3.ipynb  [01;34mprolif[0m/  Untitled0.ipynb


## Step 2: 
Install the dependencies for DiffDock 

## Install prerequisites

In [5]:
!pip install ipython-autotime
%load_ext autotime

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ipython-autotime
  Downloading ipython_autotime-0.3.1-py2.py3-none-any.whl (6.8 kB)
Collecting jedi>=0.10
  Downloading jedi-0.18.1-py2.py3-none-any.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 7.0 MB/s 
Installing collected packages: jedi, ipython-autotime
Successfully installed ipython-autotime-0.3.1 jedi-0.18.1
time: 824 µs (started: 2022-10-24 01:35:20 +00:00)


In [6]:
%cd /content/drive/MyDrive/DiffDock_V2
!git clone https://github.com/gcorso/DiffDock.git
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!git checkout 0f9c419 # remove/update for more up to date code

/content/drive/MyDrive/DiffDock_V2
fatal: destination path 'DiffDock' already exists and is not an empty directory.
/content/drive/MyDrive/DiffDock_V2/DiffDock
HEAD is now at 0f9c419 improve README
time: 24 s (started: 2022-10-24 01:35:20 +00:00)


In [7]:
!pip install pyg==0.7.1 --quiet
!pip install pyyaml==6.0 --quiet
!pip install scipy==1.7.3 --quiet
!pip install networkx==2.6.3 --quiet
!pip install biopython==1.79 --quiet
!pip install rdkit-pypi==2022.03.5 --quiet
!pip install e3nn==0.5.0 --quiet
!pip install spyrmsd==0.5.2 --quiet
!pip install pandas==1.3.5 --quiet
!pip install biopandas==0.4.1 --quiet
!pip install torch==1.12.1+cu113 --quiet
!pip install nglview --quiet
!pip install -q nglview pytraj --quiet

[?25l[K     |█████                           | 10 kB 15.5 MB/s eta 0:00:01[K     |██████████                      | 20 kB 16.9 MB/s eta 0:00:01[K     |███████████████▏                | 30 kB 21.4 MB/s eta 0:00:01[K     |████████████████████▏           | 40 kB 6.7 MB/s eta 0:00:01[K     |█████████████████████████▏      | 51 kB 6.7 MB/s eta 0:00:01[K     |██████████████████████████████▎ | 61 kB 7.7 MB/s eta 0:00:01[K     |████████████████████████████████| 65 kB 2.1 MB/s 
[?25h  Building wheel for pyg (setup.py) ... [?25l[?25hdone
  Building wheel for pkgtools (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 2.6 MB 7.3 MB/s 
[K     |████████████████████████████████| 36.8 MB 24 kB/s 
[K     |████████████████████████████████| 117 kB 7.1 MB/s 
[K     |████████████████████████████████| 878 kB 7.1 MB/s 
[K     |████████████████████████████████| 5.7 MB 5.6 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to b

In [8]:
import torch

try:
    import torch_geometric
except ModuleNotFoundError:
    !pip uninstall torch-scatter torch-sparse torch-geometric torch-cluster  --y
    !pip install torch-scatter -f https://data.pyg.org/whl/torch-{torch.__version__}.html --quiet
    !pip install torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html --quiet
    !pip install torch-cluster -f https://data.pyg.org/whl/torch-{torch.__version__}.html --quiet
    !pip install git+https://github.com/pyg-team/pytorch_geometric.git  --quiet # no version for some reason??

[K     |████████████████████████████████| 7.9 MB 8.8 MB/s 
[K     |████████████████████████████████| 3.5 MB 8.4 MB/s 
[K     |████████████████████████████████| 2.4 MB 7.8 MB/s 
[?25h  Building wheel for torch-geometric (setup.py) ... [?25l[?25hdone
time: 29.3 s (started: 2022-10-24 01:37:04 +00:00)


### Download 2GB PDBBind dataset
unnecessary for inference

In [9]:
#!test -d /content/DiffDock/data/PDBBind_processed || (wget https://zenodo.org/record/6034088/files/PDBBind.zip && unzip -q PDBBind.zip && mv PDBBind_processed /content/DiffDock/data/)

time: 1.54 ms (started: 2022-10-24 01:37:33 +00:00)


# Upload Input files



**Step 3:** 

1.   Upload protein and ligand file in data directory.
2.   DiffDock supports .pdb file format for protein
3.   and it supports, .sdf or .mol2, and SMILES format for ligand
4.   For example, i have saved protein as 'protein.pdb' and ligand as 'ligand.sdf'. 
5.   Update the respective file names in esm embedding preparation and inference steps.
6.   Alternatively, you can also provide SMILES as an input. For example, **--ligand "COc(cc1)ccc1C#N"** instead of *--ligand ligand.sdf*





In [10]:
%cd /data
from google.colab import files
uploaded = files.upload()

[Errno 2] No such file or directory: '/data'
/content/drive/MyDrive/DiffDock_V2/DiffDock


Saving ligand.sdf to ligand.sdf
time: 29.5 s (started: 2022-10-24 01:37:33 +00:00)


For demo files refer my [github profile](https://github.com/suneelbvs/DiffDock)

## Install ESM and prepare PDB file for ESM

In [11]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!git clone https://github.com/facebookresearch/esm
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock/esm
!git checkout f07aed6 # remove/update for more up to date code
!sudo pip install -e .
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock

/content/drive/MyDrive/DiffDock_V2/DiffDock
fatal: destination path 'esm' already exists and is not an empty directory.
/content/drive/MyDrive/DiffDock_V2/DiffDock/esm
HEAD is now at f07aed6 fix fairscale inference example (#298)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/drive/MyDrive/DiffDock_V2/DiffDock/esm
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Installing collected packages: fair-esm
  Running setup.py develop for fair-esm
Successfully installed fair-esm
/content/drive/MyDrive/DiffDock_V2/DiffDock
time: 57.8 s (started: 2022-10-24 01:38:03 +00:00)


In [21]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!python datasets/esm_embedding_preparation.py --protein_path data/protein.pdb --out_file data/prepared_for_esm.fasta 

/content/drive/MyDrive/DiffDock_V2/DiffDock
6w70.pdb              [0m[01;34mesm[0m/                                [01;34m__pycache__[0m/
[01;34mbaselines[0m/            evaluate_confidence_calibration.py  README.md
[01;34mconfidence[0m/           evaluate.py                         [01;34mresults[0m/
[01;34mdata[0m/                 inference.py                        train.py
[01;34mdatasets[0m/             LICENSE                             [01;34mutils[0m/
diffdock_results.tar  ligand.sdf                          [01;34mvisualizations[0m/
environment.yml       [01;34mmodels[0m/                             [01;34mworkdir[0m/
100% 1/1 [00:00<00:00, 22.83it/s]
time: 1.37 s (started: 2022-10-24 01:43:46 +00:00)


In [22]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
%env HOME=esm/model_weights
%env PYTHONPATH=$PYTHONPATH:/content/drive/MyDrive/DiffDock_V2/DiffDock/esm
!python /content/drive/MyDrive/DiffDock_V2/DiffDock/esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm.fasta data/esm2_output --repr_layers 33 --include per_tok

/content/drive/MyDrive/DiffDock_V2/DiffDock
env: HOME=esm/model_weights
env: PYTHONPATH=$PYTHONPATH:/content/drive/MyDrive/DiffDock_V2/DiffDock/esm
Read data/prepared_for_esm.fasta with 1 sequences
Processing 1 of 1 batches (1 sequences)
time: 25.1 s (started: 2022-10-24 01:43:51 +00:00)


## Run DiffDock

In [23]:
%cd /content/drive/MyDrive/DiffDock_V2/DiffDock
!python -m inference --protein_path data/protein.pdb --ligand data/ligand.sdf --out_dir results/singlecomplx --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
#!mv 'index0_data-testing-6w70.pdb____data-testing-6w70_ligand.sdf' out #update the folder name, if you provide custom names for inputs
#%cd ./out
#%ls

/content/drive/MyDrive/DiffDock_V2/DiffDock
Reading molecules and generating local structures with RDKit
100% 1/1 [00:00<00:00, 17.53it/s]
Reading language model embeddings.
Generating graphs for ligands and proteins
loading complexes: 100% 1/1 [00:00<00:00,  3.80it/s]
loading data from memory:  data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings1577801764/heterographs.pkl
Number of complexes:  1
radius protein: mean 23.78305435180664, std 0.0, max 23.78305435180664
radius molecule: mean 4.922807216644287, std 0.0, max 4.922807216644287
distance protein-mol: mean 75.22794342041016, std 0.0, max 75.22794342041016
rmsd matching: mean 0.0, std 0.0, max 0
HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
Reading molecules and generating local structures with RDKit
100% 1/1 [00:00<00:00, 14.73it/s]
Reading language model embeddings.
Generating graphs fo

## Download results

In [25]:
%cd ./results/singlecomplx
!mv 'index0_data-protein.pdb____data-ligand.sdf' out
#%cp ./data/*.*pdb
%cd ./out
%ls

/content/drive/MyDrive/DiffDock_V2/DiffDock/results/singlecomplx
/content/drive/MyDrive/DiffDock_V2/DiffDock/results/singlecomplx/out
rank10_confidence0.01.sdf   rank29_confidence-1.25.sdf
rank11_confidence0.00.sdf   rank2_confidence0.38.sdf
rank12_confidence-0.08.sdf  rank30_confidence-1.33.sdf
rank13_confidence-0.14.sdf  rank31_confidence-1.33.sdf
rank14_confidence-0.21.sdf  rank32_confidence-1.47.sdf
rank15_confidence-0.24.sdf  rank33_confidence-1.53.sdf
rank16_confidence-0.26.sdf  rank34_confidence-1.64.sdf
rank17_confidence-0.27.sdf  rank35_confidence-1.93.sdf
rank18_confidence-0.31.sdf  rank36_confidence-2.18.sdf
rank19_confidence-0.35.sdf  rank37_confidence-2.87.sdf
rank1_confidence0.44.sdf    rank38_confidence-2.96.sdf
rank1.sdf                   rank39_confidence-3.29.sdf
rank20_confidence-0.36.sdf  rank3_confidence0.38.sdf
rank21_confidence-0.44.sdf  rank40_confidence-3.41.sdf
rank22_confidence-0.45.sdf  rank4_confidence0.37.sdf
rank23_confidence-0.50.sdf  rank5_confidence0.3

In [16]:
from google.colab import output
output.enable_custom_widget_manager()

time: 4 ms (started: 2022-10-24 01:40:34 +00:00)


In [36]:
%ls

rank10_confidence0.01.sdf   rank29_confidence-1.25.sdf
rank11_confidence0.00.sdf   rank2_confidence0.38.sdf
rank12_confidence-0.08.sdf  rank30_confidence-1.33.sdf
rank13_confidence-0.14.sdf  rank31_confidence-1.33.sdf
rank14_confidence-0.21.sdf  rank32_confidence-1.47.sdf
rank15_confidence-0.24.sdf  rank33_confidence-1.53.sdf
rank16_confidence-0.26.sdf  rank34_confidence-1.64.sdf
rank17_confidence-0.27.sdf  rank35_confidence-1.93.sdf
rank18_confidence-0.31.sdf  rank36_confidence-2.18.sdf
rank19_confidence-0.35.sdf  rank37_confidence-2.87.sdf
rank1_confidence0.44.sdf    rank38_confidence-2.96.sdf
rank1.sdf                   rank39_confidence-3.29.sdf
rank20_confidence-0.36.sdf  rank3_confidence0.38.sdf
rank21_confidence-0.44.sdf  rank40_confidence-3.41.sdf
rank22_confidence-0.45.sdf  rank4_confidence0.37.sdf
rank23_confidence-0.50.sdf  rank5_confidence0.36.sdf
rank24_confidence-0.52.sdf  rank6_confidence0.34.sdf
rank25_confidence-0.60.sdf  rank7_confidence0.30.sdf
rank26_confidence-0.63



# Work In Progress: Analysis Part
**bold text**

