# TED Domain Data Visualization Tutorial

This notebook demonstrates how to:
1. Fetch domain data from the TED API
2. Extract domain chopping information
3. Visualize the domains using molviewspec (mvs)

## Setup
First, let's import the required libraries

In [1]:
import requests
import molviewspec as mvs

## Step 1: Fetch TED Domain Data

We'll fetch domain data for UniProt ID Q8N9T8 from the TED API

In [2]:
# Define the UniProt ID
protein_id = "Q99683"

# Define the TED Metadata API URL
ted_url = f"https://ted.cathdb.info/api/v1/uniprot/summary/{protein_id}"

# Fetch the data
response = requests.get(ted_url)
ted_data = response.json()

print("Successfully fetched TED domain data!")
print(f"Number of domains found: {len(ted_data['data'])}")
print(f"Example domain data: {ted_data['data'][0]}")

Successfully fetched TED domain data!
Number of domains found: 7
Example domain data: {'ted_id': 'AF-Q99683-F1-model_v4_TED01', 'uniprot_acc': 'Q99683', 'md5_domain': '90ad7137b9f9ff5aac717d6e107d76e9', 'consensus_level': 'high', 'chopping': '93-271', 'nres_domain': 179, 'num_segments': 1, 'plddt': 76.8004, 'num_helix_strand_turn': 28, 'num_helix': 9, 'num_strand': 6, 'num_helix_strand': 15, 'num_turn': 11, 'proteome_id': 9606, 'cath_label': '3.40.50,3.40.50', 'cath_assignment_level': 'T', 'cath_assignment_method': 'foldseek,foldclass', 'packing_density': 11.518, 'norm_rg': 0.287, 'tax_common_name': 'Human', 'tax_scientific_name': 'Homo_sapiens', 'tax_lineage': 'cellular_organisms,Eukaryota,Opisthokonta,Metazoa,Eumetazoa,Bilateria,Deuterostomia,Chordata,Craniata,Vertebrata,Gnathostomata,Teleostomi,Euteleostomi,Sarcopterygii,Dipnotetrapodomorpha,Tetrapoda,Amniota,Mammalia,Theria,Eutheria,Boreoeutheria,Euarchontoglires,Primates,Haplorrhini,Simiiformes,Catarrhini,Hominoidea,Hominidae,Homi

## Step 2: Extract Domain Chopping Information

Let's extract and display the chopping information for each domain

In [3]:
# Extract chopping information
domain_choppings = []
for domain in ted_data['data']:
    domain_info = {
        'ted_id': domain['ted_id'],
        'chopping': domain['chopping'],
        'consensus_level': domain['consensus_level'],
        'plddt': domain['plddt'],
        'packing_density': domain['packing_density'],
        'norm_rg': domain['norm_rg'],
        'nres_domain': domain['nres_domain'],
        'cath_label': domain['cath_label'],
        'tax_scientific_name': domain['tax_scientific_name']
    }
    domain_choppings.append(domain_info)

# Display the information for the first domain
print("Single domain Chopping Information:")
domain = domain_choppings[0]
print(f"\nDomain ID: {domain['ted_id']}")
print(f"Chopping: {domain['chopping']}")
print(f"Consensus Level: {domain['consensus_level']}")
print(f"pLDDT Score: {domain['plddt']:.2f}")

Single domain Chopping Information:

Domain ID: AF-Q99683-F1-model_v4_TED01
Chopping: 93-271
Consensus Level: high
pLDDT Score: 76.80


## Step 3: Visualize the protein

For this step we need to:
1) Set up visualization for the protein chain
2) Iterate over the domains to color and label only the residues within domain boundaries

In [4]:
# Create a builder for visualization
builder = mvs.create_builder()

# Take the protein chain from AFDB
protein_url = f"https://alphafold.ebi.ac.uk/files/AF-{protein_id}-F1-model_v4.cif"

# 1) Create grey representation for the protein chain
structure = (
    builder.download(url=protein_url, ref="download")
        .parse(format="mmcif")
        .model_structure()
)
repr = structure.component(selector="all").representation()
repr.color(color="grey")

# Coloring scheme adapted from https://ted.cathdb.info/
colors = ["#4e79a7", "#f28e2c", "#e15759", "#76b7b2", "#59a14f", "#edc949", "#af7aa1", "#ff9da7", "#9c755f", "#bab0ac"]

# 2) Iterate over the domains
for domain_chopping, color in zip(domain_choppings, colors):

    # Extract the "_TED0x" domain label
    label = domain_chopping["ted_id"].split("_")[-1]

    # Extract the start-end of domain choppings
    start = int(domain_chopping['chopping'].split('-')[0])
    end = int(domain_chopping['chopping'].split('-')[1])

    # Define the coloring
    repr.color(
        color=color,
        selector=mvs.ComponentExpression(
            beg_label_seq_id=start,
            end_label_seq_id=end
        )
    )

    # Define the label
    structure.component(
        selector=mvs.ComponentExpression(
            beg_label_seq_id=start,
            end_label_seq_id=end
        )
    ).label(text=label)

builder

<IPython.core.display.Javascript object>

## Step 4: Multi-state visualization

Alternatively, we can visualize the domains individually using multi-state MolVieSpec.

In [10]:
from typing import Dict

# Function to create side-panel markdown description out of the metadata
def create_snapshot_description(protein_id: str, ted_domain_chopping: Dict[str, str]) -> str:
    label = ted_domain_chopping["ted_id"].split("_")[-1]
    chopping = ted_domain_chopping["chopping"]
    residues = ted_domain_chopping["nres_domain"]
    plddt = ted_domain_chopping["plddt"]
    packing = ted_domain_chopping["packing_density"]
    globularity = ted_domain_chopping["norm_rg"]
    cath = ted_domain_chopping["cath_label"].split(',')[0]
    tax_scientific_name = ted_domain_chopping["tax_scientific_name"]
    description = f"""
#### Domain: [{protein_id}_{label}↗](https://ted.cathdb.info/uniprot/{protein_id})

#### Properties:
- **Chopping:** {chopping}
- **Residues:** {residues}
- **Average pLDDT:** {plddt}
- **Packing:** {packing}
- **Globularity:** {globularity}
- **Taxonomy:** {tax_scientific_name}"""
    if cath != "-":
        description += f"""
- **CATH:** [{cath}↗](https://www.cathdb.info/version/latest/cathnode/{cath})
"""
    else:
        description += f"""
- **CATH:** -
"""
    return description


def create_protein_chain(protein_id):
    builder = mvs.create_builder()
    structure = (
        builder.download(url=f"https://alphafold.ebi.ac.uk/files/AF-{protein_id}-F1-model_v4.cif")
            .parse(format="mmcif")
            .model_structure()
    )
    repr = structure.component(selector="all").representation().color(color="grey")
    return builder, structure, repr


def create_domain_snapshot(protein_id: str, ted_domain_chopping: Dict[str, str], color: str) -> mvs.State:

    # Create representation for the protein chain
    builder, structure, repr = create_protein_chain(protein_id)

    label = ted_domain_chopping["ted_id"].split("_")[-1]
    start = int(ted_domain_chopping['chopping'].split('-')[0])
    end = int(ted_domain_chopping['chopping'].split('-')[1])

    # Color the domain
    repr.color(
        selector=mvs.ComponentExpression(
            beg_label_seq_id=start,
            end_label_seq_id=end
        ),
        color=color
    )

    # Create camera focus for the domain
    structure.component(
        selector=mvs.ComponentExpression(
            beg_label_seq_id=start,
            end_label_seq_id=end
        )
    ).focus()

    return builder.get_snapshot(title=f'{protein_id}_{label}', description=create_snapshot_description(protein_id, ted_domain_chopping))

snapshots = []
colors = ["#4e79a7", "#f28e2c", "#e15759", "#76b7b2", "#59a14f", "#edc949", "#af7aa1", "#ff9da7", "#9c755f", "#bab0ac"]

for domain, color in zip(domain_choppings, colors):
    snapshots.append(create_domain_snapshot(protein_id, domain, color))

In [11]:
mvsj = mvs.MVSJ(
    data=mvs.States(snapshots=snapshots, metadata=mvs.GlobalMetadata())
)
mvs.molstar_notebook(mvsj, ui="stories", width="100%", height=550)

<IPython.core.display.Javascript object>