## Run DeepRank-GNN-esm computation using the deep learning architecture

Follow the steps:  
Step 1: Prepare input folder  
Step 2: Prepare input proteins  
Step 3: Extract fasta sequences and Calculate ESM-2 embeddings  
Step 4: Convert the input PDBs into interface graphs for DeepRank-GNN-esm  
Step 5: Use pre-trained model to rank the input conformations  
Step 6: Analyze the result

### Step1: Create a new directory for the tutorial and copy all example PDB files to the new directory:


In [None]:
%mkdir tutorial
%cp -r example/data/pdb/1ATN/ tutorial/
%cd tutorial

### Step2: Prepare all input PDBs in the folder

In [7]:
!for pdb_file in 1ATN/*.pdb; do python ../scripts/pdb_renumber.py "$pdb_file" 1 1ATN/; done

### Step3: Extract fasta sequences and Calculate ESM-2 embeddings

In [None]:
#extract fasta squences from pdb files
!python ../scripts/get_fasta.py 1ATN/ A B

#calculate ESM-2 embeddings
%mkdir embedding
!python ../esm/scripts/extract.py esm2_t33_650M_UR50D 1ATN.fasta embedding --repr_layers 33 --include mean per_tok

### Step4: Convert the input PDBs into interface graphs 

In [None]:
from deeprank_gnn.GraphGenMP import GraphHDF5
pdb_path = "1ATN"
embedding_path = "embedding"
nproc = 20
outfile = "1ATN_residue.hdf5"
GraphHDF5(pdb_path = pdb_path,
               embedding_path = embedding_path,
               graph_type = "residue",
               outfile = outfile,
               nproc = nproc,
               tmpdir="./tmpdir")

### Step5: Use pre-trained model to rank the input conformations 

In [None]:
from deeprank_gnn.ginet import GINet
from deeprank_gnn.NeuralNet import NeuralNet as NN
database_test = "1ATN_residue.hdf5"
gnn = GINet
target = "fnat"
edge_feature = ["dist"]
node_features=["type", "polarity", "bsa", "charge", "embedding"]
threshold = 0.3
pretrained_model="../paper_pretrained_models/scoring_of_docking_models/gnn_esm/treg_yfnat_b64_e20_lr0.001_foldall_esm.pth.tar"
device_name = "cuda:0"
num_workers = 10
model = NN(
            database_test,
            gnn,
            device_name = device_name,
            edge_feature = edge_feature,
            node_feature = node_features,
            target = target,
            num_workers = num_workers,
            pretrained_model=pretrained_model,
            threshold = threshold)
model.test(hdf5 = "GNN_esm_prediction.hdf5")

### Step6: Analysis the output

In [None]:
import h5py
f = h5py.File("GNN_esm_prediction.hdf5","r+")
mol_names = f["epoch_0000"]["test"]["mol"][()]
fnats = f["epoch_0000"]["test"]["outputs"][()]
for mol, fnat in zip(mol_names, fnats):
    print(mol.decode(), fnat)