# Beyond sequences primary structure


We may have access to sequences 3D-structure (which could be predicted by a tool). How can we use seqme to evaluate sequences based on their 3D-structure? We will show how to do this in this notebook.


In [None]:
# !pip install tmtools

In [None]:
from typing import Literal

import numpy as np
from tmtools import tm_align

import seqme as sm

Let's define a metric which uses atomic positions. Here we use RMSD.


In [None]:
class RMSD(sm.Metric):
    """Root mean square deviation of atomic positions."""

    def __init__(self, reference: str, sequence_to_coordinates: dict[str, np.ndarray]):
        self.reference = reference
        self.sequence_to_coordinates = sequence_to_coordinates

    def __call__(self, sequences: list[str]) -> sm.MetricResult:
        ref_coords = self.sequence_to_coordinates[self.reference]
        scores = np.array(
            [tm_align(self.sequence_to_coordinates[seq], ref_coords, seq, self.reference).rmsd for seq in sequences]
        )
        return sm.MetricResult(scores.mean().item())

    @property
    def name(self) -> str:
        return "RMSD"

    @property
    def objective(self) -> Literal["minimize", "maximize"]:
        return "minimize"

Let's define the coordinate of each amino acid in each sequence.


In [None]:
sequence_to_coordinates = {
    "AYLP": np.array([[1.2, 3.4, 1.5], [4.0, 2.8, 3.7], [1.2, 4.2, 4.3], [0.0, 1.0, 2.0]]),
    "ARN": np.array([[2.3, 7.4, 1.5], [4.0, 2.9, -1.7], [1.2, 4.2, 4.3]]),
}

Instead of hardcoding the coordinates as we do here, you would probably define a function retrieving the atomic coordinates from a PDB.


Let's create the metric and sequences.


In [None]:
metrics = [RMSD(reference="ARN", sequence_to_coordinates=sequence_to_coordinates)]
sequences = {("model 1", seq): [seq] for seq in list(sequence_to_coordinates.keys())}

Let's compute the metric.


In [None]:
df = sm.compute_metrics(sequences, metrics)

 50%|█████     | 1/2 [00:00<00:00, 574.01it/s, data=('model 1', 'ARN'), metric=RMSD]

100%|██████████| 2/2 [00:00<00:00, 898.04it/s, data=('model 1', 'ARN'), metric=RMSD]


In [None]:
sm.show_table(df)

Unnamed: 0,Unnamed: 1,RMSD↓
model 1,AYLP,0.39
model 1,ARN,0.0


Recall seqme defines three groups of metrics: sequence-based, embedding-based and property-based metrics. One may ask, what group this metric fits in? Notice, metrics operating on 3D-structure are very similar to embedding-based metrics: sequence → 3D-structure (embedding) → metric. And there you go.

Also notice, if we had a callable model mapping sequence to atomic coordinates, then we could use ModelCache here.