# Requisites

- Environment with dependencies in `conda_env.yaml`
- External dependencies to be downloaded and place in `training_data/utils/external`:
    - predict_ddG.py script from PyRosetta (https://github.com/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/additional_scripts/predict_ddG.py)
    - DSPP software executable (https://github.com/PDB-REDO/dssp/releases/download/v4.4.0/mkdssp-4.4.0-linux-x64)

In [1]:
from predict import get_cif, view_pdb, get_colabfold_msa, predict, view_pockets, Site



┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov and PyRosetta Team              │
│              (C) Copyright Rosetta Commons Member Institutions               │
│                                                                              │
│ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │
│         See LICENSE.PyRosetta.md or email license@uw.edu for details         │
└──────────────────────────────────────────────────────────────────────────────┘
PyRosetta-4 2025 [Rosetta PyRosetta4.conda.ubuntu.cxx11thread.serialization.Ubuntu.python311.Release 2025.19+release.1354d05daa4c339d591afeecef3c94ca2d38680e 2025-05-07T12:36:04] retrieved from: http://www.pyrosetta.org
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.conda.ubuntu.cxx11thread.ser

<br>

**Files are written by default to the `predict` folder, customize it with using a different `path=` argument in the functions below.**

<br>

# Predict

## Protein structure

In [3]:
pdb_id = "6t4k"

In [4]:
pdb = get_cif(
    pdb_id
)

view_pdb(pdb)

PDBeMolstar(bg_color='#F7F7F7', custom_data={'data': "data_6T4K\n#\n_entry.id 6T4K\n#\n_citation.abstract ?\n_…

## Predict

### ColabFold MSA

For single predictions, HHBlits' resource-intensive calculations can be substituted by an HHM built from a Multiple Sequence Alignment computed through ColabFold. Please provide your email to the server's call below, respect [ColabFold's MSA server usage limits](https://github.com/sokrypton/ColabFold#:~:text=Is%20it%20okay%20to%20use%20the%20MMseqs2%20MSA%20server%20(cf.run_mmseqs2)%20on%20a%20local%20computer%3F) and [acknowledge the tool](https://github.com/sokrypton/ColabFold#how-do-i-reference-this-work) appropriately. Consider switching to the [local setup of HHBlits](https://github.com/frannerin/AlloPockets#hhblits-for-multiple-predictions) to perform multiple computations.

In [5]:
get_colabfold_msa(
    pdb_id,
    "nerinfonzf98@univie.ac.at"
)

COMPLETE: 100%|██████████████████████████████████████████████████████████████| 150/150 [elapsed: 00:02 remaining: 00:00]


### Predict

In [6]:
predictions = predict(
    pdb_id,
    protein_chains=["A"],
)

  0%|          | 0/9 [00:00<?, ?it/s]

In [7]:
predictions

Unnamed: 0,Allosteric score
pocket2,0.849179
pocket1,0.600612
pocket5,0.002796
pocket7,0.000514
pocket9,0.000359
pocket4,0.000327
pocket3,0.000321
pocket6,0.000169
pocket10,0.000143
pocket8,0.000134


## View

In [9]:
view_pockets(
    pdb_id,
    pockets={"pocket2": {"color": "green"}, "pocket1": {"color": "blue"}}, # {"pocketn": {"color": ""}}
    protein_chains=["A"]
)

PDBeMolstar(bg_color='#F7F7F7', color_data={'data': [{'struct_asym_id': 'A', 'representation': 'cartoon', 'rep…

### Optional: view a target site

Set an object of the `Site` class passing a modulator molecule (and defining the site around it) or a list of residues. Then visualize it in the structure to assess overlap with predicted pockets (residues of the site will be colored green in the protein cartoon visualization).

<br>

#### With a modulator molecule

In [10]:
# Desired modulator is label_asym_id 'C'
pdb.residues.query("label_asym_id == 'C'")

Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
2061,4F1,C,3,.,?,602,4F1,A,1,602,?,?,?,?


In [11]:
site = Site(
    pdb, 
    modulator_residues=pdb.residues.query("label_asym_id == 'C'")
)
site

<predict.Site at 0x75c3c0735e90>

In [12]:
view_pockets(
    pdb_id,
    pockets={"pocket2": {"color": "green"}, "pocket1": {"color": "blue"}}, # {"pocketn": {"color": ""}}
    protein_chains=["A"],
    site_residues=site.residues,
    modulator_residues=site.modulator_residues,
)

PDBeMolstar(bg_color='#F7F7F7', color_data={'data': [{'struct_asym_id': 'A', 'representation': 'cartoon', 'rep…

#### With a list of residues

In [13]:
# List of residue numbers of site
resnums = site.residues.label_seq_id.to_list()
resnums

['73',
 '77',
 '78',
 '80',
 '81',
 '84',
 '85',
 '88',
 '109',
 '110',
 '111',
 '112',
 '113',
 '114',
 '232',
 '235',
 '236',
 '238',
 '239',
 '240',
 '241',
 '243',
 '248',
 '250',
 '251',
 '252',
 '253',
 '254',
 '255',
 '257',
 '258',
 '261',
 '262']

In [14]:
# Site can be defined with a list of residues instead of a modulator
res_site = Site(
    pdb=pdb,
    residues=[{"label_asym_id": "A", "label_seq_id": seqnum} for seqnum in resnums]
)
res_site.residues

Unnamed: 0,label_comp_id,label_asym_id,label_entity_id,label_seq_id,pdbx_PDB_ins_code,auth_seq_id,auth_comp_id,auth_asym_id,pdbx_PDB_model_num,pdbx_label_index,pdbx_sifts_xref_db_name,pdbx_sifts_xref_db_acc,pdbx_sifts_xref_db_num,pdbx_sifts_xref_db_res
0,TRP,A,1,73,?,317,TRP,A,1,73,UNP,P51449,317,W
1,ALA,A,1,77,?,321,ALA,A,1,77,UNP,P51449,321,A
2,HIS,A,1,78,?,322,HIS,A,1,78,UNP,P51449,322,H
3,LEU,A,1,80,?,324,LEU,A,1,80,UNP,P51449,324,L
4,THR,A,1,81,?,325,THR,A,1,81,UNP,P51449,325,T
5,ILE,A,1,84,?,328,ILE,A,1,84,UNP,P51449,328,I
6,GLN,A,1,85,?,329,GLN,A,1,85,UNP,P51449,329,Q
7,VAL,A,1,88,?,332,VAL,A,1,88,UNP,P51449,332,V
8,LEU,A,1,109,?,353,LEU,A,1,109,UNP,P51449,353,L
9,LYS,A,1,110,?,354,LYS,A,1,110,UNP,P51449,354,K


In [15]:
view_pockets(
    pdb_id,
    pockets={"pocket2": {"color": "green"}, "pocket1": {"color": "blue"}}, # {"pocketn": {"color": ""}}
    protein_chains=["A"],
    site_residues=res_site.residues,
)

PDBeMolstar(bg_color='#F7F7F7', color_data={'data': [{'struct_asym_id': 'A', 'representation': 'cartoon', 'rep…