# `1jwpA` prediction
This script performs prediction on the `1jwpA` mutant structure of TEM-1 beta-lactamase. If you want to perform the cryptic binding site prediction together with evaluating the prediction, please refer to the `run-7w19A-prediction-with-evaluation.ipynb` script. This script only runs the prediction, it does not compare the results with the confirmed pocket.

## Define the structure


In [1]:
pdb_id = '1jwp'
chain_id = 'A'

Retrieve the sequence

In [2]:
import biotite.database.rcsb as rcsb
import biotite.structure.io.pdbx as pdbx
from biotite.structure.io.pdbx import get_structure
from biotite.sequence import ProteinSequence
import numpy as np

CIF_FILES_PATH = '/path/to/your/cif/files'

cif_file_path = rcsb.fetch(pdb_id, "cif", CIF_FILES_PATH)
cif_file = pdbx.CIFFile.read(cif_file_path)

protein = get_structure(cif_file, model=1)
protein = protein[(protein.atom_name == "CA") 
                       & (protein.element == "C") 
                       & (protein.chain_id == chain_id) ]

sequence = ''.join([ProteinSequence.convert_letter_3to1(residue.res_name) for residue in protein])

Create the sequence file

In [3]:
with open(f'{pdb_id}{chain_id}.txt', 'w') as f:
    f.write(sequence)

# ⚠️ CAUTION: ESM2-3B Embedding computation required!
For optimal performance, use a **GPU-equipped machine** when computing ESM2-3B embeddings, especially if processing multiple structures. While computation on a CPU-only machine should be possible, I haven't tested it. 

*Note: Computation of the ESM2 embedding is not part of this script. To generate embeddings, you may find [this script](https://github.com/skrhakv/esm2-generator/blob/master/compute-esm.py) in the [esm2-generator repository](https://github.com/skrhakv/esm2-generator) useful.*


### Run the prediction without evaluation

In [None]:
# This script is similar to the script provided in the CryptoBench dataset repository (https://osf.io/pz4a9/).

import numpy as np
from tensorflow import keras
import tensorflow_addons as tfa
import sys

# CAUTION: You need to specify the path to the CryptoBench dataset! It is available at: https://osf.io/pz4a9/
CRYPTOBENCH_PATH = '/path/to/cryptobench'

MODEL_PATH = f'{CRYPTOBENCH_PATH}/benchmark/best_trained'
STRUCTURE_ID = f'{pdb_id}{chain_id}'

# 0.95 decision threshold was used in the CryptoBench paper
DECISION_THRESHOLD = 0.95


def load_model():
    print("Loading CryptoBench model ...")
    return keras.models.load_model(MODEL_PATH,
                                   custom_objects={
                                       'MatthewsCorrelationCoefficient': tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)},
                                   compile=False)


def predict(X, model):
    print("Making prediction ...")
    return model.predict(X)


def load_data():
    print("Loading data - embeddings and annotations ...")
    embeddings = np.load(f'{STRUCTURE_ID}.npy')

    return embeddings


model = load_model()
embeddings = load_data()
predictions = predict(embeddings, model)


Loading CryptoBench model ...
Loading data - embeddings and annotations ...
Making prediction ...


## Prediction examination
For each residue, indicate whether the method predicts it as part of a cryptic binding site (True) or not (False).

In [6]:
for prediction, residue in zip(predictions, protein):
    print(f"{residue.res_id} {residue.res_name} {'False' if prediction[1] <= DECISION_THRESHOLD else 'True'}")

26 HIS False
27 PRO False
28 GLU False
29 THR False
30 LEU False
31 VAL False
32 LYS False
33 VAL False
34 LYS False
35 ASP False
36 ALA False
37 GLU False
38 ASP False
39 GLN False
40 LEU False
41 GLY False
42 ALA False
43 ARG False
44 VAL False
45 GLY False
46 TYR False
47 ILE False
48 GLU False
49 LEU False
50 ASP False
51 LEU False
52 ASN False
53 SER False
54 GLY False
55 LYS False
56 ILE False
57 LEU False
58 GLU False
59 SER False
60 PHE False
61 ARG False
62 PRO False
63 GLU False
64 GLU False
65 ARG False
66 PHE False
67 PRO False
68 MET False
69 MET False
70 SER False
71 THR False
72 PHE False
73 LYS False
74 VAL False
75 LEU False
76 LEU False
77 CYS False
78 GLY False
79 ALA False
80 VAL False
81 LEU False
82 SER False
83 ARG False
84 VAL False
85 ASP False
86 ALA False
87 GLY False
88 GLN False
89 GLU False
90 GLN False
91 LEU False
92 GLY False
93 ARG False
94 ARG False
95 ILE False
96 HIS False
97 TYR False
98 SER False
99 GLN False
100 ASN False
101 ASP False
102 LEU Fa