<a href="https://colab.research.google.com/github/plinder-org/moving_beyond_memorisation/blob/main/notebooks/metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Run this cell, wait for the kernel to restart and then run the next cell:

# Setup

In [None]:
!pip install -q git+https://github.com/conda-incubator/condacolab.git@0.1.x
import condacolab
condacolab.install()

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for condacolab (pyproject.toml) ... [?25l[?25hdone
⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:14
🔁 Restarting kernel...


In [None]:
!mamba install -q pip py3Dmol scipy networkx conda-forge::boost aivant::openstructure anaconda::py-boost

In [None]:
!ost --version

OpenStructure 2.8.0


In [None]:
import ost
print(ost.__version__)

2.8.0


# Protein-protein complex scoring

## Data loading

In [1]:
!mkdir -p T1187/
!wget https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics/ppi_scoring/T1187o.pdb -O T1187/T1187o.pdb
!wget https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics/ppi_scoring/T1187TS447_1o_superposed.pdb -O T1187/T1187TS447_1o_superposed.pdb
!ls T1187/

--2024-09-24 17:06:03--  https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics/ppi_scoring/T1187o.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 219049 (214K) [text/plain]
Saving to: ‘T1187/T1187o.pdb’


2024-09-24 17:06:03 (5.89 MB/s) - ‘T1187/T1187o.pdb’ saved [219049/219049]

--2024-09-24 17:06:03--  https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics/ppi_scoring/T1187TS447_1o_superposed.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 20

In [None]:
from ost import io
# Target
target_structure = io.LoadPDB("T1187/T1187o.pdb")
# Model
model_structure = io.LoadPDB("T1187/T1187TS447_1o_superposed.pdb")

## Scoring with Python interactively

In [None]:
from ost.mol.alg import scoring

# The Scorer object processes the input structures and performs basic cleanup.
scorer = scoring.Scorer(model_structure, target_structure,
                        resnum_alignments=True)

# HTML documentation available as:
# https://openstructure.org/docs/2.8/mol/alg/scoring/#ost.mol.alg.scoring.Scorer
# raw doc string can be displayed with:
# help(scoring.Scorer)


# Here we only scratch the surface and investigate a couple of relevant scores.
# All scores are lazily evaluated and available as attributes.

# the following scores operate on the full assembly which requires to derive a
# one-to-one correspondance between model and reference chains, aka chain
# mapping - OpenStructure does this fully automatically
print("lDDT", scorer.lddt)
print("lDDT (backbone only):", scorer.bb_lddt)
print("QS-score:", scorer.qs_global)

# here is the used mapping:
print("mapping (keys: trg chain, values: mdl chain):",
      scorer.mapping.GetFlatMapping())

# Pinder strictly operates on dimers which is basically the CAPRI use-case
# This is what DockQ and the 3 underlying scores (fnat, irmsd, lrmsd) are
# designed for. Let's first check the interfaces in our structure:
print("Interfaces evaluated by DockQ:", scorer.dockq_interfaces)
help(scoring.Scorer.dockq_interfaces)
print("With their respective DockQ scores:", scorer.dockq_scores)
print("fnat:", scorer.fnat)
print("irmsd:", scorer.irmsd)
print("lrmsd:", scorer.lrmsd)

lDDT 0.8176616554189962
lDDT (backbone only): 0.8698029973410684
QS-score: 0.7031908212851736
mapping (keys: trg chain, values: mdl chain): {'B': 'A', 'A': 'B'}
Interfaces evaluated by DockQ: [('A', 'B', 'B', 'A')]
Help on property:

    Interfaces in :attr:`dockq_target_interfaces` that can be mapped
    to model
    
    Target chain names are lexicographically sorted
    
    :type: :class:`list` of :class:`tuple` with 4 elements each:
           (trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)

With their respective DockQ scores: [0.442]
fnat: [0.4166666666666667]
irmsd: [2.601]
lrmsd: [6.095]


## Scoring from the command line


In [None]:
!ost compare-structures \
    -m T1187/T1187TS447_1o_superposed.pdb \
    -mf pdb \
    -r T1187/T1187o.pdb \
    --residue-number-alignment \
    --lddt \
    --bb-lddt \
    --qs-score \
    --dockq \
    --out T1187/T1187TS447_1_out.json

Cleaning up input structures
Computing chain mapping
Computing all-atom lDDT
Computing backbone lDDT
Computing global QS-score
Computing per-interface QS-score
Computing DockQ


## Visualization

In [None]:
import py3Dmol
view = py3Dmol.view()
view.setBackgroundColor('white')

view.addModel(open('T1187/T1187o.pdb', 'r').read(),'pdb')
view.setStyle({'chain':'A'}, {'cartoon': {'color':'purple'}})
view.setStyle({'chain':'B'}, {'cartoon': {'color':'green'}})
view.addModel(open('T1187/T1187TS447_1o_superposed.pdb', 'r').read(),'pdb')
view.setStyle({'chain':'A'}, {'cartoon': {'color':'purple'}})
view.setStyle({'chain':'B'}, {'cartoon': {'color':'green'}})

view.zoomTo()
view.show()

# Protein-ligand complex scoring

## Data loading

In [2]:
!mkdir -p 9CE4/
!wget https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics/pli_scoring/9CE4_A.pdb -O 9CE4/9CE4_A.pdb
!wget https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics//pli_scoring/9CE4_lig.sdf -O 9CE4/9CE4_lig.sdf
!wget https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics//pli_scoring/996_model1.pdb -O 9CE4/996_model1.pdb
!wget https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics//pli_scoring/996_model1_ligand1.sdf -O 9CE4/996_model1_ligand1.sdf
!ls 9CE4/

--2024-09-24 17:06:36--  https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics/pli_scoring/9CE4_A.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 172861 (169K) [text/plain]
Saving to: ‘9CE4/9CE4_A.pdb’


2024-09-24 17:06:36 (5.00 MB/s) - ‘9CE4/9CE4_A.pdb’ saved [172861/172861]

--2024-09-24 17:06:37--  https://raw.githubusercontent.com/plinder-org/moving_beyond_memorisation/refs/heads/main/data/metrics//pli_scoring/9CE4_lig.sdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanent

In [None]:
from ost import io
# Target
target_structure = io.LoadPDB("9CE4/9CE4_A.pdb")
target_ligand = io.LoadSDF("9CE4/9CE4_lig.sdf")
# Model
model_structure = io.LoadPDB("9CE4/996_model1.pdb")
model_ligand = io.LoadSDF("9CE4/996_model1_ligand1.sdf")


## Data cleanup



For ligand scoring in Python, structures must be cleaned up hydrogen atoms removed before executing the scorer. Protein structures are cleaned with Molck (the Molecular checker), and ligands with a simple selection.

In [None]:
# Cleanup a copy of the protein structures
from ost import conop
from ost.mol.alg import Molck, MolckSettings
cleaned_model_structure = model_structure.Copy()
cleaned_target_structure = target_structure.Copy()
molck_settings = MolckSettings(rm_unk_atoms=True,  # Remove unknown atoms
                               rm_non_std=False,  # Keep non standard residues
                               rm_hyd_atoms=True,  # Remove Hydrogens
                               rm_oxt_atoms=False,  # Keep terminal oxygens
                               rm_zero_occ_atoms=False,  # Keep atoms with 0 occupancy
                               colored=False,
                               map_nonstd_res=False,
                               assign_elem=True)
Molck(cleaned_model_structure, conop.GetDefaultLib(), molck_settings)
Molck(cleaned_target_structure, conop.GetDefaultLib(), molck_settings)

In [None]:
# Cleanup the ligands
# Remove hydrogens
cleaned_model_ligand = model_ligand.Select("ele != H and ele != D")
cleaned_target_ligand = target_ligand.Select("ele != H and ele != D")


## Scoring with Python interactively

In [None]:
from ost.mol.alg.ligand_scoring import LDDTPLIScorer, SCRMSDScorer

# HTML documentation available as:
# https://openstructure.org/docs/2.8/mol/alg/ligand_scoring/
# raw doc string can be displayed with:
# help(LDDTPLIScorer)
# help(SCRMSDScorer)


# Score with LDDT-PLI
scorer = LDDTPLIScorer(
    target = cleaned_target_structure,
    target_ligands = [cleaned_target_ligand],
    model = cleaned_model_structure,
    model_ligands = [cleaned_model_ligand],
    # Extra arguments
    resnum_alignments=True,
    )
chain_name = cleaned_model_ligand.chains[0].name
residue_number = cleaned_model_ligand.residues[0].number
print("LDDT-PLI: ", scorer.score[chain_name][residue_number])

# Score with RMSD
rmsd_scorer = SCRMSDScorer(
    target = cleaned_target_structure,
    target_ligands = [cleaned_target_ligand],
    model = cleaned_model_structure,
    model_ligands = [cleaned_model_ligand],
    # Extra arguments
    resnum_alignments=True,
    )
print("BiSyRMSD: ", rmsd_scorer.score[chain_name][residue_number])
print("LDDT-LP: ", rmsd_scorer.aux[chain_name][residue_number]["lddt_lp"])


LDDT-PLI:  0.7669142
BiSyRMSD:  1.943465
LDDT-LP:  0.9311297811187265


## Scoring from the command line

In [None]:
! ost compare-ligand-structures \
    --reference 9CE4/9CE4_A.pdb \
    --reference-ligands 9CE4/9CE4_lig.sdf \
    --model 9CE4/996_model1.pdb \
    --model-ligands 9CE4/996_model1_ligand1.sdf \
    --residue-number-alignment \
    --lddt-pli \
    --rmsd \
    --out 9CE4/T1186LG350_1_out.json

imported 1 chains, 0 residues, 0 atoms
imported 1 chains, 0 residues, 0 atoms


## Visualization

In [None]:
import py3Dmol
# Viewer documentation: https://3dmol.org/doc/GLViewer.html
view = py3Dmol.view()
view.setBackgroundColor('white')

# Show the reference structure
view.addModel(open('9CE4/9CE4_A.pdb', 'r').read(),'pdb',
             {"style": {'cartoon': {'color':'gold'}}})
view.addModel(open("9CE4/9CE4_lig.sdf", 'r').read(), 'sdf',
             {"style": {'stick': {'color':'green'}}})

# Superpose the model onto the reference
superposed_model = rmsd_scorer.model.Copy()
editor = superposed_model.EditXCS()
editor.ApplyTransform(rmsd_scorer.aux[chain_name][residue_number]["transform"])
editor.UpdateICS()

# Save superposed files for the viewer
ost.io.SavePDB(superposed_model.Select("cname=A"), "9CE4/996_model_superposed.pdb")
ost.io.SaveSDF(superposed_model.Select("cname=00001_LIG"), "9CE4/996_model1_superposed_ligand1.sdf")

# Show the superposed model structure
view.addModel(open('9CE4/996_model_superposed.pdb', 'r').read(), 'pdb',
              {"style": {'cartoon': {'color':'purple'}}})
view.addModel(open("9CE4/996_model1_superposed_ligand1.sdf", 'r').read(), 'sdf',
              {"style": {'stick': {'colorScheme':'elem'}}})

view.zoomTo()
view.show()