# Use Consensus Labels as Multiple-Sequence-Alignment (MSA)

In this notebook, we will use `mdciao` and the consensus labels of the [GPCRdb](https://gpcrdb.org) to structurally align four GPCR structures of four receptors with low sequence identity:

* opsin, `OPS` in the rest of the notebook
* beta2 adrenergic receptor, `B2AR` in the rest of the notebook
* mu-opioid receptor, `MUOR` in the rest of the notebook
* dopamine D1 receptor, `DOP` in the rest of the notebook

Whereas we use directly PDB structures, the point should be clear that the MSA works on any arbitary geometries and topologies that can be imported into the notebook, **in particular user provided trajectories**.

In [None]:
import mdciao
import nglview
import matplotlib

## Load PDB  Structures into the Notebook

In [None]:
pdbs = {"OPS"  : mdciao.cli.pdb("3CAP"), 
        "B2AR" : mdciao.cli.pdb("3SN6"), 
        "MUOR" : mdciao.cli.pdb("6DDF"),
        "DOP"  : mdciao.cli.pdb("7CKW")}

## Load Consensus Labels from the GPCRdb into the Notebook

In [None]:
maps = { "OPS": mdciao.nomenclature.LabelerGPCR("opsd_bovin"),
        "B2AR": mdciao.nomenclature.LabelerGPCR("adrb2_human"),
        "MUOR": mdciao.nomenclature.LabelerGPCR("oprm_mouse"), 
        "DOP" : mdciao.nomenclature.LabelerGPCR("DRD1_HUMAN")
       }

## Use the Consensus Labels to Trim down the PDBs to the just Receptors
This can happen regardless of chain definitions and co-crystalized entities

In [None]:
pdb_just_receptor = {}
for key, pdb  in pdbs.items():
    print(key)
    receptor_residue_idxs = mdciao.nomenclature.guess_by_nomenclature(maps[key], 
                                                               pdb.top, 
                                                               fragments="resSeq",
                                                               return_residue_idxs=True,
                                                               accept_guess=True,
                                                              return_str=False)
    pdb_just_receptor[key] = mdciao.fragments.fragment_slice(pdb, [receptor_residue_idxs])
    print()

## Receptors are not 3D Aligned
This is a bit obvious because they come from different PDBs, but helps highlight the point

In [None]:
colors = {"MUOR":"tab:red", 
          "OPS":"tab:blue", 
          "B2AR":"tab:green",
          "DOP": "tab:orange"}
iwd = nglview.NGLWidget()
for ii, (key, geom) in enumerate(pdb_just_receptor.items()):
    iwd.add_trajectory(geom)
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(color=matplotlib.colors.to_hex(colors[key]), component=ii)
iwd

## Use the Consensus Labels to generate an [AlignerConsensus](https://proteinformatics.uni-leipzig.de/mdciao/api/generated/generated/mdciao.nomenclature.AlignerConsensus.html#mdciao.nomenclature.AlignerConsensus) for MSA

In [None]:
AC = mdciao.nomenclature.AlignerConsensus(maps,
                                          tops={key : geom.top for key, geom in pdb_just_receptor.items()})

## Sequence Identity within Residues with Consensus Labels

In [None]:
AC.sequence_match()

## Pick a Reference Structure (e.g. Opsin) and 3D-align all Receptors on It
We do this using the [CAidxs_match](https://proteinformatics.uni-leipzig.de/mdciao/api/generated/generated/mdciao.nomenclature.AlignerConsensus.html#mdciao.nomenclature.AlignerConsensus.CAidxs_match) method of the [AlignerConsensus](https://proteinformatics.uni-leipzig.de/mdciao/api/generated/generated/mdciao.nomenclature.AlignerConsensus.html#mdciao.nomenclature.AlignerConsensus) that will generate pairs of indices matching one another via their consensus labels. For brevity, here we show the example in the "3.50...3.59" region of TM3, but for the 3D alignment, we take all consensus labels

In [None]:
AC.CAidxs_match("3.5*", keys=["OPS","B2AR"])

In [None]:
ref_key = "OPS"
ref_geom = pdb_just_receptor[ref_key]
for key, geom in pdb_just_receptor.items():
     if key!=ref_key:
        ref_CAs, key_CAs = AC.CAidxs_match(keys=[ref_key, key])[[ref_key, key]].values.T
        geom.superpose(ref_geom, atom_indices=key_CAs, ref_atom_indices=ref_CAs)


## Receptors are now 3D-aligned

In [None]:
iwd = nglview.NGLWidget()
for ii, (key, geom) in enumerate(pdb_just_receptor.items()):
    iwd.add_trajectory(geom)
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(color=matplotlib.colors.to_hex(colors[key]), component=ii)
iwd