-
Notifications
You must be signed in to change notification settings - Fork 2
Description
When handling references to large set of sequences, we ran into a couple of non-standard amino acids which should preferably be added to ihm.LPeptideAlphabet.
Here is the class as we currently use in ModelArchive:
class _LPeptideAlphabetWithXO(ihm.LPeptideAlphabet):
"""Have the default amino acid alphabet plus 'X' for unknown residues
and 'O' as allowed non-def. AA (U already in alphabet)."""
# extra entry added according to LPeptideAlphabet def. in
# https://python-ihm.readthedocs.io/en/latest/_modules/ihm.html
# and https://files.rcsb.org/view/1NTH.cif for values for 'O'.
def __init__(self):
"""Create the alphabet."""
super().__init__()
self._comps['X'] = self._comps["UNK"]
self._comps['O'] = ihm.LPeptideChemComp(
"PYL", "O", "O", "PYRROLYSINE", "C12 H21 N3 O3"
)
# B/ASX, Z/GLX defined in parent class
# J not defined in CCD? (XLE used for something else)
The non-defined 'J' (LEU/ILE AMBIGUOUS) will be an issue as soon as we remediate the ma-jd-viral model set. That one was done before we added _struct_ref to python-modelcif.
There are 5 examples in there which reference NCBI sequences containing 'J' (e.g. YP_009337833.1 for ma-jd-viral-28831) but where the model uses an 'L' instead. So to correctly handle the _struct_ref_seq_dif category for those models, we actually need a _struct_ref_seq_dif.db_mon_id for 'J' but I could not find anything in the CCD to define an ID for it.
Any suggestions on how to handle the 'J' case? (@brindakv this may need your input)
Within ModelCIF, I can of course just define a locally defined chem. comp. (i.e. by passing ccd="local" to ihm.LPeptideChemComp) but maybe there is something in the CCD which I just could not find.