![Logo](https://github.com/Chokyotager/BIND/blob/main/art/abstract.png?raw=true)
## BIND API practical demonstration
As part of the manuscript "*Protein language models are performant in structure-free virtual screening*".

**This notebook will go through the following:**
1. Encoding SMILES into BIND molecular graphs
2. Encoding protein sequences into ESM-2 embeddings
3. Running the BIND model

In [None]:
# Import libraries and set environment variables

import torch
from torch_geometric.utils.sparse import dense_to_sparse
from torch_geometric.data import Data, Batch

from transformers import AutoModel, AutoTokenizer

import math
import logging

logging.getLogger("pysmiles").setLevel(logging.CRITICAL)

# BIND API
import loading
from data import BondType

# PyTorch device
device = torch.device("cpu")

In [None]:
# We define the SMILES and protein sequence we want to run

smiles = "SC[C@H](C(=O)N1[C@@H](CCC1)C(=O)[O-])C"
protein_sequence = "MGAASGRRGPGLLLPLPLLLLLPPQPALALDPGLQPGNFSADEAGAQLFAQSYNSSAEQVLFQSVAASWAHDTNITAENARRQEEAALLSQEFAEAWGQKAKELYEPIWQNFTDPQLRRI"

#### Part 1: Encoding SMILES into BIND molecular graphs
This part here is to change your molecule into a format that BIND can accept. In this case, a PyG Data object.

In [None]:
def get_graph (smiles):

    graph = loading.get_data(smiles, apply_paths=False, parse_cis_trans=False, unknown_atom_is_dummy=True)
    x, a, e = loading.convert(*graph, bonds=[BondType.SINGLE, BondType.DOUBLE, BondType.TRIPLE, BondType.AROMATIC, BondType.NOT_CONNECTED])

    x = torch.Tensor(x)
    a = dense_to_sparse(torch.Tensor(a))[0]
    e = torch.Tensor(e)

    # Given an xae
    graph = Data(x=x, edge_index=a, edge_features=e)

    return graph

In [None]:
smiles_graph = get_graph(smiles)

# Conversion into a PyG Batch object
graph = Batch.from_data_list([smiles_graph]).to(device).detach()

### Part 2: Encoding protein sequences into ESM-2 embeddings
This part here goes through how to embed the protein sequence

In [None]:
# Load the ESM-2 model
esm_tokeniser = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
esm_model = AutoModel.from_pretrained("facebook/esm2_t33_650M_UR50D")

esm_model.eval()

esm_model = esm_model.to(device)

In [None]:
# This part here tokenises and encodes as per ESM-2
encoded_input = esm_tokeniser([protein_sequence], padding="longest", truncation=False, return_tensors="pt")
esm_output = esm_model.forward(**encoded_input.to(device), output_hidden_states=True)
hidden_states = esm_output.hidden_states

# Obtain the embeddings from ESM-2 here, together with an attention mask,
# which is useful in the event you want to use a batch size of > 1
hidden_states = [x.to(device).detach() for x in hidden_states]
attention_mask = encoded_input["attention_mask"].to(device)

### Part 3: running the BIND model

Putting everything together

In [None]:
# Load the BIND checkpoint used in the manuscript

model = torch.load("saves/BIND_checkpoint_12042024.pth", map_location=device)

model.eval()

model = model.to(device)

In [None]:
# Now you have the molecular graphs and the embeddings from ESM-2,
# you can feed everything into the model.

output = model.forward(graph, hidden_states, attention_mask)

# You get a list [pKi, pIC50, pKd, pEC50, logits]
output = [float(x.detach().cpu().numpy()[0][0]) for x in output]

pki = output[0]
pic50 = output[1]
pkd = output[2]
pec50 = output[3]
logits = output[4]

In [None]:
# if you want to convert the logits into non-binder probability, use this

def sigmoid(x):
  return 1 / (1 + math.exp(-x))

probability = sigmoid(logits)

# Note that the higher the pKi, pIC50, pKd, pEC50, the stronger the predicted drug-target affinity
# For the non-binder probability (and logits), it predicts whether or not a molecule is a decoy, so the lower the better

## That's it!

If you have any questions, please refer to the README for the contact details.

Obligatory cat pic.

![Cat pic](https://media.istockphoto.com/id/1128431903/photo/black-cat-lying-on-its-side-on-a-white-background.jpg?b=1&s=612x612&w=0&k=20&c=WYoBSh3GISwJtpFA8PwLqSsGzf3DvOBvGBWPq4PsOYM= "Neko")

(Credit: iStockPhoto)