
# TEMPL Core Usage Demo

This notebook mirrors the core workflow shown in Figure&nbsp;1. Each section calls the same functions exported from `templ_pipeline.core` so you can see how the modules behind Panel A (reference alignment) and Panel B (pose generation) are used in practice.

![Workflow](../figure1_A_B.png)


For module-by-module context see [`templ_pipeline/core/README.md`](templ_pipeline/core/README.md); installation and CLI usage live in the [project README](../README.md).


## Environment Setup

Locate the repository root, ensure example assets are available, and create an output directory for the demo runs.


In [1]:

from __future__ import annotations

import atexit
import os
import shutil
import sys
import warnings
from pathlib import Path

warnings.filterwarnings('ignore', category=UserWarning, module='tqdm')
try:
    from tqdm import TqdmWarning
except Exception:
    TqdmWarning = None
else:
    warnings.filterwarnings('ignore', category=TqdmWarning)

def find_repo_root(marker: str = 'pyproject.toml') -> Path:
    current = Path.cwd().resolve()
    for candidate in (current, *current.parents):
        if (candidate / marker).exists():
            return candidate
    raise FileNotFoundError(f'Could not locate {marker} above {current}')

PROJECT_ROOT = find_repo_root()
os.chdir(PROJECT_ROOT)
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

DATA_DIR = Path('data')
EXAMPLE_DIR = DATA_DIR / 'example'
PDB_FILE = EXAMPLE_DIR / '2ETR.pdb'
EMBEDDING_FILE = DATA_DIR / 'embeddings' / 'templ_protein_embeddings_v1.0.0.npz'

DEMO_PDB_ID = '2ETR'
DEMO_PDB_ID_UPPER = DEMO_PDB_ID.upper()
DEMO_SMILES = 'c1ncccc1NC(=O)C1CCNCC1'

OUTPUT_DIR = Path('output/templ_demo_notebook')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

for required_path in (PDB_FILE, EMBEDDING_FILE):
    if not required_path.exists():
        raise FileNotFoundError(required_path)

pdb_lookup_dir = Path('data/pdbs')
pdb_lookup_dir.mkdir(parents=True, exist_ok=True)
lookup_target = pdb_lookup_dir / f'{DEMO_PDB_ID}.pdb'
created_lookup = False
if not lookup_target.exists() or lookup_target.stat().st_mtime < PDB_FILE.stat().st_mtime:
    shutil.copy2(PDB_FILE, lookup_target)
    created_lookup = True

def _cleanup_lookup() -> None:
    if created_lookup and lookup_target.exists():
        try:
            lookup_target.unlink()
        except OSError:
            pass

atexit.register(_cleanup_lookup)

print('Repository root detected.')
print(f'Data directory: {DATA_DIR}')
print(f'Output directory: {OUTPUT_DIR}')
print(f'PDB lookup prepared: {lookup_target.exists()}')


Repository root detected.
Data directory: data
Output directory: output/templ_demo_notebook
PDB lookup prepared: True


## Panel A – Reference Alignment

Generate the target protein embedding and retrieve closely matching templates from the embedding database (`embedding.py`). The selector passes `allow_self_as_template=True` so the native complex stays eligible and appears first when it exists in the reference set (`2ETR` in this demo).

In [2]:

from templ_pipeline.core import EmbeddingManager, select_templates

manager = EmbeddingManager(embedding_path=str(EMBEDDING_FILE))
target_embedding, target_chains = manager.get_embedding(
    DEMO_PDB_ID_UPPER, pdb_file=str(PDB_FILE)
)

template_hits = select_templates(
    target_pdb_id=DEMO_PDB_ID_UPPER,
    target_embedding=target_embedding,
    k=5,
    similarity_threshold=0.7,
    return_similarities=True,
    allow_self_as_template=True,
)

top_template_id, top_similarity = template_hits[0]
print(f'Target chains: {target_chains}')
print('Top template candidates:')
for pdb_id, similarity in template_hits[:5]:
    print(f'  {pdb_id} (similarity: {similarity})')


Target chains: A
Top template candidates:
  2ETR (similarity: 1.0)
  2ESM (similarity: 0.9984164237976074)
  3TWJ (similarity: 0.9981651306152344)
  2ETK (similarity: 0.9977912306785583)
  6ED6 (similarity: 0.9962191581726074)


### Align the Highest-Scoring Template

`templates.py` loads the protein structure, resolves file paths, and superimposes the top template ligand into the target frame. Because the native complex is returned, this step generates the aligned `2ETR` ligand that downstream steps will reuse.

In [3]:

from rdkit import Chem

from templ_pipeline.core import (
    load_reference_protein,
    ligand_path,
    pdb_path,
    transform_ligand,
)

reference_structure = load_reference_protein(str(PDB_FILE))
template_supplier = Chem.SDMolSupplier(ligand_path(top_template_id), removeHs=False)
template_ligand = next(m for m in template_supplier if m is not None)
aligned_template = transform_ligand(
    mob_pdb=pdb_path(top_template_id),
    lig=template_ligand,
    pid=top_template_id,
    ref_struct=reference_structure,
)

if aligned_template is None:
    raise RuntimeError('Template alignment failed')

print(f'Aligned template: {top_template_id}')
print(f'Conformers available: {aligned_template.GetNumConformers()}')


Aligned template: 2ETR
Conformers available: 1



## Panel B – TEMPL Core

Validate the input ligand (`chemistry.py`), compute the maximum common substructure (`mcs.py`), embed constrained conformers, and rank poses (`scoring.py`).


### Validate the Target Ligand

Run `chemistry.validate_target_molecule()` to ensure the input SMILES is suitable for pose generation.

In [4]:

from templ_pipeline.core import validate_target_molecule

from rdkit import Chem

target_mol = Chem.AddHs(Chem.MolFromSmiles(DEMO_SMILES))
is_valid, validation_msg = validate_target_molecule(
    target_mol,
    mol_name='demo_ligand',
    pdb_id=DEMO_PDB_ID_UPPER,
)

print(f'Ligand validation passed: {is_valid}')
if validation_msg:
    print(f'Validation message: {validation_msg}')


Ligand validation passed: True


### Detect the Shared Substructure

Use `mcs.find_mcs()` to compute the maximum common substructure between the validated ligand and the aligned template.

In [5]:

from templ_pipeline.core import find_mcs

_, mcs_smarts, mcs_info = find_mcs(target_mol, [aligned_template], return_details=True)
print(f'MCS SMARTS: {mcs_smarts}')
print(f'MCS atom count: {mcs_info.get("atom_count")}')


MCS SMARTS: c:c(:c:c)-NC(=O)-C(-CC)-CC
MCS atom count: 12


### Generate and Rank Constrained Poses

`mcs.constrained_embed()` produces conformers consistent with the MCS, and `scoring.select_best()` ranks them using the combined scoring metric.

In [6]:

from templ_pipeline.core import constrained_embed, select_best

poses = constrained_embed(
    target_mol,
    aligned_template,
    mcs_smarts,
    n_conformers=200,
)

ranked_poses = select_best(
    poses,
    aligned_template,
    return_all_ranked=True,
)

top_pose, scores, rank = ranked_poses[0]
print(f'Generated poses: {poses.GetNumConformers()}')
print(f'Top pose scores: {scores}')


Generated poses: 200
Top pose scores: {'shape': 0.805701187671031, 'color': 0.3545089223534589, 'combo': 0.580105055012245}



## Pipeline Orchestration

The high-level helpers assemble the same components into a full workflow.


In [7]:

from templ_pipeline.core import run_from_pdb_and_smiles

print('Running high-level pipeline entry point...')
high_level_success = run_from_pdb_and_smiles(
    pdb_id=DEMO_PDB_ID,
    smiles=DEMO_SMILES,
    output_dir=str(OUTPUT_DIR),
)
print(f'High-level call completed: {high_level_success}')


Running high-level pipeline entry point...
High-level call completed: True


In [8]:

from templ_pipeline.core import PipelineConfig, run_pipeline

config = PipelineConfig(
    target_pdb=str(PDB_FILE),
    target_smiles=DEMO_SMILES,
    output_dir=str(OUTPUT_DIR),
    data_dir='data',
    n_confs=200,
    n_workers=10,
    sim_threshold=0.9,
    ca_rmsd_threshold=10.0,
)

print('Running pipeline with explicit configuration...')
config_success = run_pipeline(config)
print(f'Configuration-based run completed: {config_success}')


Running pipeline with explicit configuration...
Configuration-based run completed: True



## Outputs

Inspect the most recent pipeline run written to `output/templ_demo_notebook`.


In [9]:

run_dirs = sorted(
    [path for path in OUTPUT_DIR.iterdir() if path.is_dir()],
    key=lambda path: path.name,
)

if not run_dirs:
    print('No output directories created yet.')
else:
    latest_dir = run_dirs[-1]
    print(f'Latest run: output/templ_demo_notebook/{latest_dir.name}')
    for file_path in sorted(latest_dir.glob('*')):
        if file_path.suffix not in {'.sdf', '.json'}:
            continue
        size_kb = file_path.stat().st_size / 1024
        print(f"  {file_path.name} ({size_kb:.1f} KB)")


Latest run: output/templ_demo_notebook/templ_run_20250926_145408_2etr
  2etr_all_poses.sdf (621.4 KB)
  2etr_pipeline_results.json (2.0 KB)
  2etr_template.sdf (3.5 KB)
  2etr_top3_poses.sdf (9.2 KB)
