# Tutorial notebook on working with the DeepEF model

This tuturial will contain neccesary information on how to work and use the DeepEF model on a veriaty of cases:
1. Energy prediction.
2. PDB energy prediction
3. $\Delta G$ and $\Delta \Delta G$ prediction for mutation outcome.

# Free energy prediction $G$
The DeepEF model enable users to predict rapidly the energy of a protein, since it is not a mesurable value it can be use to predict stability of a protien, mutation outcome and more.

In this section Ill show how to use the proccesed sidechaine data to predict the energy of a protein.

The DeepEF input is a normelized protein graph created from the structure and sequence.

The protein graph containes:
1. A summed distence matrix
2. A one hot vector of the sequence
3. ProT5 embedding of the sequence 

In [11]:
import sys
sys.path.append('..')    # add parent directory to path   

In [12]:
# Necessary imports
import numpy as np
import matplotlib.pyplot as plt
from model.hydro_net import PEM
from model.model_cfg import CFG
from Utils.train_utils import *
from Utils.pdb_parser import get_pdb_data
import torch

### Load trained model

In [13]:
# Import the model
model = PEM(layers=CFG.num_layers,gaussian_coef=CFG.gaussian_coef).to(CFG.device)
# Get total number of parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Number of parameters: {total_params}")

Number of parameters: 393517


In [None]:
# Upload model weights
CFG.model_path = '../data/Trained_models/'
epoch = 25
model_dict = torch.load(CFG.model_path+f"{epoch}_final_model.pt",map_location=CFG.device,weights_only=False)
print(CFG.model_path+f"{epoch}_final_model.pt")
model.load_state_dict(model_dict['model_state_dict'])

<All keys matched successfully>

### Load data sample

In [15]:
item_path = '../data/casp12_data_30/valid-10/10#1HF2_1_A'

data = get_item_data(item_path)

def print_shapes(data):
    id, crd_backbone, mask, seq_one_hot, seq, proT5_emb = data
    print(f"id: {id}")
    print(f"crd_backbone: {crd_backbone.shape}")
    print(f"mask: {mask.shape}")
    print(f"seq_one_hot: {seq_one_hot.shape}")
    print(f"seq: {seq}")
    print(f"proT5_emb: {proT5_emb.shape}")
    return
print_shapes(data)

id: 10#1HF2_1_A
crd_backbone: torch.Size([1, 210, 4, 3])
mask: torch.Size([1, 210])
seq_one_hot: torch.Size([1, 210, 20])
seq: MVDFKMTKEGLVLLIKDYQNLEEVLNAISARITQMGGFFAKGDRISLMIENHNKHSQDIPRIVSHLRNLGLEVSQILVGSTVEGKENDLKVQSRTTVESTGKVIKRNIRSGQTVVHSGDVIVFGNVNKGAEILAGGSVVVFGKAQGNIRAGLNEGGQAVVAALDLQTSLIQIAGFITHSKGEENVPSIAHVKGNRIVIEPFDKVSFERSE
proT5_emb: torch.Size([1, 210, 1024])


In [16]:
# Get the prediction
model.eval()
# get the graph
id, crd_backbone, mask, seq_one_hot, seq, proT5_emb = data
protein_graph = get_graph(crd_backbone.squeeze(),seq_one_hot.squeeze(), proT5_emb.squeeze(),mask.squeeze())
print(protein_graph.shape)
# get the prediction
with torch.no_grad():
    Gf = model(protein_graph.unsqueeze(0))
    Gf = Gf.cpu().numpy()
    Gf = Gf[0]
    print(Gf.shape) 
    print(f"The energy of {id} protein is: {Gf}")
    

torch.Size([210, 1092])
()
The energy of 10#1HF2_1_A protein is: -40.55458068847656


### Predict the $\Delta G$ of the protein
The $\Delta G$ of the protein can be defined as follow: 
$$\Delta G = G_{unfolded} - G_{folded}$$
As it represent the change in energy between 2 conditions of a protein, folded and unfolded

In [17]:
unfolde_graph = get_unfolded_graph(crd_backbone.squeeze(),seq_one_hot.squeeze(), proT5_emb.squeeze(),mask.squeeze())
with torch.no_grad():
    Gu = model(unfolde_graph.unsqueeze(0))
    Gu = Gu.cpu().numpy()
    Gu = Gu[0]
    print(Gu.shape)
    print(f"The energy of {id} protein unfolded structure is: {Gu}")
    
# Calculate the deltaG
deltaG = Gu - Gf
print(f"The deltaG is: {deltaG}")

()
The energy of 10#1HF2_1_A protein unfolded structure is: -3.834470748901367
The deltaG is: 36.72010803222656


## PDB energy prediction
To use the DeepEF model on an existing pdb you will need to use out functions for data extraction and graph creation.

The steps for predicting energy for a given PDB is:
1. Specify pdb path
2. extrance sequence and coordinate
3. Obtain protein graph

In [None]:
pdb_path  = "../data/pdb_files/1A0F.pdb"
pdb_data = get_pdb_data(pdb_path,chain_id='A')
pdb_data.keys()

Loading: Rostlab/prot_t5_xl_half_uniref50-enc


In [None]:
pdb_data.keys()

dict_keys(['coords', 'sequence', 'mask_tensor', 'proT5_emb'])

In [None]:
# get the graph
crd_backbone, mask, seq_one_hot, seq, proT5_emb = pdb_data["coords"], pdb_data["mask_tensor"],\
    get_one_hot(pdb_data["sequence"]), pdb_data["sequence"], pdb_data["proT5_emb"]
protein_graph = get_graph(crd_backbone.squeeze(),seq_one_hot.squeeze(), proT5_emb.squeeze(),mask.squeeze())

# get the prediction
with torch.no_grad():
    Gf = model(protein_graph.unsqueeze(0))
    Gf = Gf.cpu().numpy()
    Gf = Gf[0]
    print(Gf.shape) 
    print(f"The energy of {id} protein is: {Gf}")

()
The energy of 10#1HF2_1_A protein is: -42.56595230102539
