<a href="https://colab.research.google.com/github/xrobin/moving_beyond_memorisation/blob/main/notebooks/metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Run this cell, wait for the kernel to restart and then run the next cell:

# Setup

In [1]:
!pip install -q git+https://github.com/conda-incubator/condacolab.git@0.1.x
import condacolab
condacolab.install()

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for condacolab (pyproject.toml) ... [?25l[?25hdone
⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:10
🔁 Restarting kernel...


In [2]:
!mamba install -q pip py3Dmol scipy networkx conda-forge::boost aivant::openstructure anaconda::py-boost

Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done


In [1]:
!ost --version

OpenStructure 2.8.0


In [2]:
import ost
print(ost.__version__)

2.8.0


In [3]:
# Check conop
from ost import conop
print(conop.GetDefaultLib().FindCompound("A1LU6").name)

5-(3-azanyl-1~{H}-indazol-6-yl)-1-[(3-chlorophenyl)methyl]pyridin-2-one


# Protein-protein complex scoring

## Data loading

In [4]:
!mkdir -p T1187/
!wget https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/ppi_scoring/T1187o.pdb -O T1187/T1187o.pdb
!wget https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/ppi_scoring/T1187TS447_1o_superposed.pdb -O T1187/T1187TS447_1o_superposed.pdb
!ls T1187/

--2024-09-23 15:13:06--  https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/ppi_scoring/T1187o.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 219049 (214K) [text/plain]
Saving to: ‘T1187/T1187o.pdb’


2024-09-23 15:13:07 (8.55 MB/s) - ‘T1187/T1187o.pdb’ saved [219049/219049]

--2024-09-23 15:13:07--  https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/ppi_scoring/T1187TS447_1o_superposed.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP reque

In [5]:
from ost import io
# Target
target_structure = io.LoadPDB("T1187/T1187o.pdb")
# Model
model_structure = io.LoadPDB("T1187/T1187TS447_1o_superposed.pdb")

## Scoring with Python interactively

In [6]:


from ost.mol.alg import scoring

# The Scorer object processes the input structures and performs basic cleanup.
scorer = scoring.Scorer(model_structure, target_structure)

# HTML documentation available as:
# https://openstructure.org/docs/2.8/mol/alg/scoring/#ost.mol.alg.scoring.Scorer
# raw doc string can be displayed with:
# help(scoring.Scorer)


# Here we only scratch the surface and investigate a couple of relevant scores.
# All scores are lazily evaluated and available as attributes.

# the following scores operate on the full assembly which requires to derive a
# one-to-one correspondance between model and reference chains, aka chain
# mapping - OpenStructure does this fully automatically
print("lDDT", scorer.lddt)
print("lDDT (backbone only):", scorer.bb_lddt)
print("QS-score:", scorer.qs_global)

# here is the used mapping:
print("mapping (keys: trg chain, values: mdl chain):",
      scorer.mapping.GetFlatMapping())

# Pinder strictly operates on dimers which is basically the CAPRI use-case
# This is what DockQ and the 3 underlying scores (fnat, irmsd, lrmsd) are
# designed for. Let's first check the interfaces in our structure:
print("Interfaces evaluated by DockQ:", scorer.dockq_interfaces)
help(scoring.Scorer.dockq_interfaces)
print("With their respective DockQ scores:", scorer.dockq_scores)
print("fnat:", scorer.fnat)
print("irmsd:", scorer.irmsd)
print("lrmsd:", scorer.lrmsd)

lDDT 0.8176616554189962
lDDT (backbone only): 0.8698029973410684
QS-score: 0.7031908212851736
mapping (keys: trg chain, values: mdl chain): {'B': 'A', 'A': 'B'}
Interfaces evaluated by DockQ: [('A', 'B', 'B', 'A')]
Help on property:

    Interfaces in :attr:`dockq_target_interfaces` that can be mapped
    to model
    
    Target chain names are lexicographically sorted
    
    :type: :class:`list` of :class:`tuple` with 4 elements each:
           (trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)

With their respective DockQ scores: [0.442]
fnat: [0.4166666666666667]
irmsd: [2.601]
lrmsd: [6.095]


## Scoring from the command line


In [7]:
!ost compare-structures -m T1187/T1187TS447_1o_superposed.pdb -mf pdb -r T1187/T1187o.pdb --lddt --bb-lddt --qs-score --dockq --out my_out.json

Cleaning up input structures
Computing chain mapping
Computing all-atom lDDT
Computing backbone lDDT
Computing global QS-score
Computing per-interface QS-score
Computing DockQ


# Protein-ligand complex scoring

## Data loading

In [8]:
!mkdir -p 9CE4/
!wget https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/pli_scoring/9CE4_A.pdb?token=GHSAT0AAAAAACXZFLBPGUJT75QK262ZMAHSZXNOKJQ -O 9CE4/9CE4_A.pdb
!wget https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/pli_scoring/9CE4_lig.sdf?token=GHSAT0AAAAAACXZFLBOJFQDOZJKFOJTM4J6ZXNOKBA -O 9CE4/9CE4_lig.sdf
!wget https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/pli_scoring/996_model1.pdb?token=GHSAT0AAAAAACXZFLBPHIDZP3GIB5RJ3AREZXNOI6Q -O 9CE4/996_model1.pdb
!wget https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/pli_scoring/996_model1_ligand1.sdf?token=GHSAT0AAAAAACXZFLBP5KJ36JXD7NOBDLPSZXNOKPQ -O 9CE4/996_model1_ligand1.sdf
!ls 9CE4/

--2024-09-23 15:13:13--  https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/pli_scoring/9CE4_A.pdb?token=GHSAT0AAAAAACXZFLBPGUJT75QK262ZMAHSZXNOKJQ
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 172861 (169K) [text/plain]
Saving to: ‘9CE4/9CE4_A.pdb’


2024-09-23 15:13:13 (6.47 MB/s) - ‘9CE4/9CE4_A.pdb’ saved [172861/172861]

--2024-09-23 15:13:13--  https://raw.githubusercontent.com/xrobin/moving_beyond_memorisation/refs/heads/main/2_Scoring_with_OpenStructure_Data/pli_scoring/9CE4_lig.sdf?token=GHSAT0AAAAAACXZFLBOJFQDOZJKFOJTM4J6ZXNOKBA
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubuserconte

In [9]:
!ls -l 9CE4/

total 356
-rw-r--r-- 1 root root   3542 Sep 23 15:13 996_model1_ligand1.sdf
-rw-r--r-- 1 root root 176425 Sep 23 15:13 996_model1.pdb
-rw-r--r-- 1 root root 172861 Sep 23 15:13 9CE4_A.pdb
-rw-r--r-- 1 root root   2032 Sep 23 15:13 9CE4_lig.sdf


In [10]:
from ost import io
# Target
target_structure = io.LoadPDB("9CE4/9CE4_A.pdb")
target_ligand = io.LoadSDF("9CE4/9CE4_lig.sdf")
# Model
model_structure = io.LoadPDB("9CE4/996_model1.pdb")
model_ligand = io.LoadSDF("9CE4/996_model1_ligand1.sdf")


## Data cleanup



For ligand scoring in Python, structures must be cleaned up hydrogen atoms removed before executing the scorer. Protein structures are cleaned with Molck (the Molecular checker), and ligands with a simple selection.

In [11]:
# Cleanup a copy of the protein structures
from ost import conop
from ost.mol.alg import Molck, MolckSettings
cleaned_model_structure = model_structure.Copy()
cleaned_target_structure = target_structure.Copy()
molck_settings = MolckSettings(rm_unk_atoms=True,  # Remove unknown atoms
                               rm_non_std=False,  # Keep non standard residues
                               rm_hyd_atoms=True,  # Remove Hydrogens
                               rm_oxt_atoms=False,  # Keep terminal oxygens
                               rm_zero_occ_atoms=False,  # Keep atoms with 0 occupancy
                               colored=False,
                               map_nonstd_res=False,
                               assign_elem=True)
Molck(cleaned_model_structure, conop.GetDefaultLib(), molck_settings)
Molck(cleaned_target_structure, conop.GetDefaultLib(), molck_settings)

In [12]:
# Cleanup the ligands
# Remove hydrogens
cleaned_model_ligand = model_ligand.Select("ele != H and ele != D")
cleaned_target_ligand = target_ligand.Select("ele != H and ele != D")


In [13]:
from ost.mol.alg.ligand_scoring import LDDTPLIScorer, SCRMSDScorer

help(LDDTPLIScorer)
help(SCRMSDScorer)

Help on class LDDTPLIScorer in module ost.mol.alg.ligand_scoring_lddtpli:

class LDDTPLIScorer(ost.mol.alg.ligand_scoring_base.LigandScorer)
 |  LDDTPLIScorer(model, target, model_ligands=None, target_ligands=None, resnum_alignments=False, rename_ligand_chain=False, substructure_match=False, coverage_delta=0.2, max_symmetries=100000.0, lddt_pli_radius=6.0, add_mdl_contacts=False, lddt_pli_thresholds=[0.5, 1.0, 2.0, 4.0], lddt_pli_binding_site_radius=None)
 |  
 |  :class:`LigandScorer` implementing lDDT-PLI.
 |  
 |  lDDT-PLI is an lDDT score considering contacts between ligand and
 |  receptor. Where receptor consists of protein and nucleic acid chains that
 |  pass the criteria for :class:`chain mapping <ost.mol.alg.chain_mapping>`.
 |  This means ignoring other ligands, waters, short polymers as well as any
 |  incorrectly connected chains that may be in proximity.
 |  
 |  :class:`LDDTPLIScorer` computes a score for a specific pair of target/model
 |  ligands. Given a target/model 

## Scoring with Python interactively

In [14]:
# Score with LDDT-PLI
scorer = LDDTPLIScorer(
    target = cleaned_target_structure,
    target_ligands = [cleaned_target_ligand],
    model = cleaned_model_structure,
    model_ligands = [cleaned_model_ligand],
    # Extra arguments
    resnum_alignments=True,
    )
chain_name = cleaned_model_ligand.chains[0].name
residue_number = cleaned_model_ligand.residues[0].number
print("LDDT-PLI: ", scorer.score[chain_name][residue_number])

# Score with RMSD
rmsd_scorer = SCRMSDScorer(
    target = cleaned_target_structure,
    target_ligands = [cleaned_target_ligand],
    model = cleaned_model_structure,
    model_ligands = [cleaned_model_ligand],
    # Extra arguments
    resnum_alignments=True,
    )
print("BiSyRMSD: ", rmsd_scorer.score[chain_name][residue_number])
print("LDDT-LP: ", rmsd_scorer.aux[chain_name][residue_number]["lddt_lp"])


LDDT-PLI:  0.7669142
BiSyRMSD:  1.943465
LDDT-LP:  0.9311297811187265


## Scoring from the command line

In [15]:
! ost compare-ligand-structures \
    --reference 9CE4/9CE4_A.pdb \
    --reference-ligands 9CE4/9CE4_lig.sdf \
    --model 9CE4/996_model1.pdb \
    --model-ligands 9CE4/996_model1_ligand1.sdf \
    --output T1186LG350_1.json \
    --lddt-pli \
    --rmsd

imported 1 chains, 0 residues, 0 atoms
imported 1 chains, 0 residues, 0 atoms


## Visualization

In [16]:
import py3Dmol
# Viewer documentation: https://3dmol.org/doc/GLViewer.html
view = py3Dmol.view()
view.setBackgroundColor('white')

# Show the reference structure
view.addModel(open('9CE4/9CE4_A.pdb', 'r').read(),'pdb',
             {"style": {'cartoon': {'color':'gold'}}})
view.addModel(open("9CE4/9CE4_lig.sdf", 'r').read(), 'sdf',
             {"style": {'stick': {'color':'green'}}})

# Superpose the model onto the reference
superposed_model = rmsd_scorer.model.Copy()
editor = superposed_model.EditXCS()
editor.ApplyTransform(rmsd_scorer.aux[chain_name][residue_number]["transform"])
editor.UpdateICS()

# Save superposed files for the viewer
ost.io.SavePDB(superposed_model.Select("cname=A"), "9CE4/996_model_superposed.pdb")
ost.io.SaveSDF(superposed_model.Select("cname=00001_LIG"), "9CE4/996_model1_superposed_ligand1.sdf")

# Show the superposed model structure
view.addModel(open('9CE4/996_model_superposed.pdb', 'r').read(), 'pdb',
              {"style": {'cartoon': {'color':'purple'}}})
view.addModel(open("9CE4/996_model1_superposed_ligand1.sdf", 'r').read(), 'sdf',
              {"style": {'stick': {'colorScheme':'elem'}}})

view.zoomTo()
view.show()