<a href="https://colab.research.google.com/github/kiharalab/DistPepFold/blob/master/Distpepfold_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#DistPepFold

<a href="https://github.com/marktext/marktext/releases/latest">
   <img src="https://img.shields.io/badge/DistPepFold-v1.0.0-green">
   <img src="https://img.shields.io/badge/platform-Linux-green">
   <img src="https://img.shields.io/badge/Language-python3-green">
   <img src="https://img.shields.io/badge/dependencies-tested-green">
   <img src="https://img.shields.io/badge/licence-GNU-green">
</a>  

DistPepFold is a computational tool using deep learning for peptide docking.

Copyright (C) 2023 Zicong Zhang, Jacob Verburgt, Yuki Kagaya, Charles Christoffer, Daisuke Kihara, and Purdue University.

License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)

Contact: Daisuke Kihara (dkihara@purdue.edu)

For technical problems or questions, please reach to Zicong Zhang (zhan1797@purdue.edu).

# Instructions <a name="Instructions"></a>
**Quick start**
1. This colab notebook is still under developed.
2. DistPepFold requires user to provide two inputs:


*   A fasta file containing the protein sequences named "input.fasta"
*   A embedding file named "model_1_multimer.npz"


3. In order to generate the embedding file, please use the following notebook link.

  [embedding generation notebook](https://colab.research.google.com/drive/1Ft7APOB-eAlILpYjmNstLIzBXhpacT5q?usp=sharing)

4. Alternatively, you could try to generate embeddings by running AlphaFold-Multimer on your sequence.

5. After running the notebook, the prediction file will be automatically downloaded.

# Run DistPepFold Inference


In [None]:
#@title Install dependencies <a name="Dependency"></a>
#@markdown Please make sure the notebook is already connected to **GPU**, DistPepFold needs GPU support to run.<br>
#@markdown Click the right top button **"connect"**, then the notebook will automatically connect to a gpu machine
%cd /content
%shell rm -rf /opt/conda
%shell wget -q -P /tmp \
  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
    && rm /tmp/Miniconda3-latest-Linux-x86_64.sh

PATH=%env PATH
%env PATH=/opt/conda/bin:{PATH}
%shell conda install -qy -c conda-forge \
      python=3.9.6
%shell conda install -qy numpy
!python -m pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
!python -m pip install dm-tree==0.1.8 jax==0.4.1 jaxlib==0.4.1 biopython==1.79 ml-collections==0.1.0 matplotlib
!git clone https://github.com/kiharalab/DistPepFold --quiet

%cd DistPepFold

In [3]:
#@title Upload fasta file <a name="Map"></a>

from google.colab import files
import os
import os.path
import re
import hashlib
import random
import string

# rand_letters = string.ascii_lowercase
# rand_letters = ''.join(random.choice(rand_letters) for i in range(20))

rand_letters = 'lkjmpwbubmihjaaxxyqg'
# root_dir = os.getcwd()
root_dir = '/content/DistPepFold'
upload_dir = os.path.join(root_dir,rand_letters)
if not os.path.exists(upload_dir):
  os.mkdir(upload_dir)
os.chdir(upload_dir)
fasta_input = files.upload()
for fn in fasta_input.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
    name=fn, length=len(fasta_input[fn])))
  fasta_input_path = os.path.abspath(fn)
  print("Fasta file save to %s"%fasta_input_path)
os.chdir(root_dir)
print(root_dir)

Saving input.fasta to input.fasta
User uploaded file "input.fasta" with length 348 bytes
Fasta file save to /content/DistPepFold/lkjmpwbubmihjaaxxyqg/input.fasta
/content/DistPepFold


In [4]:
#@title Upload Embedding file <a name="Map"></a>

#root_dir = os.getcwd()
root_dir = '/content/DistPepFold'
# print(root_dir)
upload_dir = os.path.join(root_dir,rand_letters)
# print(upload_dir)
# if not os.path.exists(upload_dir):
#   os.mkdir(upload_dir)
os.chdir(upload_dir)
fasta_input = files.upload()
for fn in fasta_input.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
    name=fn, length=len(fasta_input[fn])))
  fasta_input_path = os.path.abspath(fn)
  print("Embedding file save to %s"%fasta_input_path)
os.chdir(root_dir)


Saving model_1_multimer.npz to model_1_multimer.npz
User uploaded file "model_1_multimer.npz" with length 50319866 bytes
Embedding file save to /content/DistPepFold/lkjmpwbubmihjaaxxyqg/model_1_multimer.npz


In [5]:
#@title Run DistPepFold <a name="Map"></a>

!python pred.py --output_dir ./output/test --ipa_depth 8 --device_id 0 --embedding_dir {upload_dir} --point_scale 20 --model_dir ./weights

from google.colab import files
import shutil
shutil.make_archive(base_name=f'prediction', format='zip', root_dir=f'output/test')
files.download(f'prediction.zip')

----------------- Options ---------------
                  contact: False                         
                     cuda: False                         
                device_id: 0                             
            embedding_dir: /content/DistPepFold/lkjmpwbubmihjaaxxyqg
                ipa_depth: 8                             
                model_dir: ./weights                     
                    n_gpu: 1                             
               output_dir: ./output/test                 
              point_scale: 20                            
                     seed: 999                           
                  targets: /train_chains                 
----------------- End -------------------
Checkpoints (model) loaded from ./weights
trainable parameters:  17358979
done


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>