Skip to content

Commit

Permalink
Merge pull request #152 from levitsky/fix/term-mods-vs-groups
Browse files Browse the repository at this point in the history
Improve and document modX terminal group support
  • Loading branch information
levitsky committed Jul 8, 2024
2 parents afe758e + 6fa174f commit 263d763
Show file tree
Hide file tree
Showing 6 changed files with 173 additions and 83 deletions.
99 changes: 67 additions & 32 deletions doc/source/mass.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,46 @@
Mass and isotopes
=================
Composition, mass and isotopes
==============================

The functions related to mass calculations and isotopic distributions are
organized into the :py:mod:`pyteomics.mass` module.

Chemical compositions
---------------------

Some problems in organic mass spectrometry deal with molecules made by
addition or subtraction of standard chemical 'building blocks'.
In :py:mod:`pyteomics.mass` there are two ways to approach these problems.

* There is a :py:class:`pyteomics.mass.Composition` class intended to store
chemical formulas. :py:class:`pyteomics.mass.Composition` objects are dicts
that can be added or subtracted from one another or multiplied by integers.

.. code-block:: python
>>> from pyteomics import mass
>>> p = mass.Composition(formula='HO3P') # Phosphate group
Composition({'H': 1, 'O': 3, 'P': 1})
>>> mass.std_aa_comp['T']
Composition{'C': 4, 'H': 7, 'N': 1, 'O': 2})
>>> p + mass.std_aa_comp['T']
Composition({'C': 4, 'H': 8, 'N': 1, 'O': 5, 'P': 1})
The values of :py:data:`pyteomics.mass.std_aa_comp` are
:py:class:`pyteomics.mass.Composition` objects.
You can do basic arithmetics with :py:class:`Composition` objects: add, subtract and multiply by integers.

* All functions that accept a **formula** keyword argument sum and
subtract numbers following the same atom in the formula:

.. code-block:: python
>>> from pyteomics import mass
>>> mass.calculate_mass(formula='C2H6') # Ethane
30.046950192426
>>> mass.calculate_mass(formula='C2H6H-2') # Ethylene
28.031300128284002
Basic mass calculations
-----------------------

Expand All @@ -12,7 +49,8 @@ mass of an organic molecule or peptide or *m/z* ratio of an ion.
The tasks of this kind can be performed with the
:py:func:`pyteomics.mass.calculate_mass` function. It works with
chemical formulas, polypeptide sequences in modX notation, pre-parsed sequences
and dictionaries of chemical compositions:
and dictionaries of chemical compositions. Internally, all kinds of input are converted into
a :py:class`Composition` for mass calculation.

.. code-block:: python
Expand Down Expand Up @@ -80,7 +118,7 @@ To add information about modified amino acids to a user-defined ``aa_comp`` dict
one can either add the composition info for a specific modified residue or just
for a modification:

.. code-block:: python
.. code-block:: python
>>> from pyteomics import mass
>>> aa_comp = dict(mass.std_aa_comp)
Expand Down Expand Up @@ -114,8 +152,7 @@ modification on a specific residue:
`Unimod database <http://www.unimod.org>`_ is an excellent resource for the
information on the chemical compositions of known protein modifications.
Version 2.0.3 introduces :py:class:`pyteomics.mass.Unimod` class that can serve
as a Python interface to Unimod:
:py:class:`pyteomics.mass.Unimod` class is a simple Python interface to Unimod:

.. code-block:: python
Expand All @@ -125,40 +162,38 @@ as a Python interface to Unimod:
>>> mass.calculate_mass('PEpTIDE', aa_comp=aa_comp)
782.2735307010443
Chemical compositions
---------------------
.. warning::

Some problems in organic mass spectrometry deal with molecules made by
addition or subtraction of standard chemical 'building blocks'.
In :py:mod:`pyteomics.mass` there are two ways to approach these problems.
Keep in mind the difference between **modifications** and **terminal groups**.
The composition of a modification can be added directly to the composition of the unmodified peptide, because
the hydrogen atom that gets replaced by the modification is already subtracted (e.g. acetylation is a replacement
of hydrogen with an acetyl and is represented by the composition `H(2) C(2) O`).
A **terminal group**, on the other hand, must be represented by its full composition, e.g. `H(3) C(2) O` in case
of acetyl.

* There is a :py:class:`pyteomics.mass.Composition` class intended to store
chemical formulas. :py:class:`pyteomics.mass.Composition` objects are dicts
that can be added or subtracted from one another or multiplied by integers.
So, this is incorrect:

.. code-block:: python
.. code-block:: python
>>> from pyteomics import mass
>>> p = mass.Composition(formula='HO3P') # Phosphate group
Composition({'H': 1, 'O': 3, 'P': 1})
>>> mass.std_aa_comp['T']
Composition{'C': 4, 'H': 7, 'N': 1, 'O': 2})
>>> p + mass.std_aa_comp['T']
Composition({'C': 4, 'H': 8, 'N': 1, 'O': 5, 'P': 1})
>>> aa_comp['ac'] = aa_comp['Ac-'] = db.by_title('Acetyl')['composition'] # do not do this!
>>> mass.calculate_mass('Ac-PEPacTIDE-OH', aa_comp=aa_comp)
882.37326836204 # not correct mass!
The values of :py:data:`pyteomics.mass.std_aa_comp` are
:py:class:`pyteomics.mass.Composition` objects.
This will give you a correct result:

* All functions that accept a **formula** keyword argument sum and
subtract numbers following the same atom in the formula:
.. code-block:: python
.. code-block:: python
>>> aa_comp['ac'] = db.by_title('Acetyl')['composition']
>>> aa_comp['Ac-'] = aa_comp['ac'] + {'H': 1} # adding the hydrogen produces a full acetyl group composition
>>> mass.calculate_mass('Ac-PEPacTIDE-OH', aa_comp=aa_comp)
883.38109339411 # correct!
>>> from pyteomics import mass
>>> mass.calculate_mass(formula='C2H6') # Ethane
30.046950192426
>>> mass.calculate_mass(formula='C2H6H-2') # Ethylene
28.031300128284002
For completeness, note that you can actually specify **terminal groups** directly by their formula in the sequence:

.. code-block:: python
>>> mass.calculate_mass('CH3CO-PEPacTIDE-OH', aa_comp=aa_comp)
883.38109339411
Faster mass calculations
------------------------
Expand Down
8 changes: 4 additions & 4 deletions doc/source/parser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ modX
**Pyteomics** uses a custom IUPAC-derived peptide sequence notation named **modX**.
As in the IUPAC notation, each amino acid residue is represented by a capital
letter, but it may preceded by an arbitrary number of small letters to show
modification. Terminal modifications are separated from the backbone sequence by
modification. Terminal groups are separated from the backbone sequence by
a hyphen (‘-’). By default, both termini are assumed to be unmodified, which can be
shown explicitly by 'H-' for N-terminal hydrogen and '-OH' for C-terminal hydroxyl.

Expand All @@ -23,7 +23,7 @@ Parsing

There are two helper functions to check if a label is in modX format or represents
a terminal modification: :py:func:`pyteomics.parser.is_modX` and
:py:func:`pyteomics.parser.is_term_mod`:
:py:func:`pyteomics.parser.is_term_group`:

.. code-block:: python
Expand All @@ -33,9 +33,9 @@ a terminal modification: :py:func:`pyteomics.parser.is_modX` and
True
>>> parser.is_modX('pTx')
False
>>> parser.is_term_mod('pT')
>>> parser.is_term_group('pT')
False
>>> parser.is_term_mod('Ac-')
>>> parser.is_term_group('Ac-')
True
Expand Down
44 changes: 32 additions & 12 deletions pyteomics/mass/mass.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,11 @@ def _parse_isotope_string(label):
_formula = r'^({})*$'.format(_atom)


def _raise_term_label_exception(what='comp'):
raise PyteomicsError(f"Cannot use a mod label as a terminal group. Provide correct group {what}"
f" in `aa_{what}`.")


class Composition(BasicComposition):
"""
A Composition object stores a chemical composition of a
Expand All @@ -164,10 +169,20 @@ class Composition(BasicComposition):
def _from_parsed_sequence(self, parsed_sequence, aa_comp):
self.clear()
comp = defaultdict(int)
failflag = False
for label in parsed_sequence:
if label in aa_comp:
for elem, cnt in aa_comp[label].items():
comp[elem] += cnt
elif parser.is_term_group(label):
slabel = label.strip('-')
if slabel in aa_comp:
# a modification label used as terminal group. This is prone to errors and not allowed
_raise_term_label_exception()
elif re.match(_formula, slabel):
comp += Composition(formula=slabel)
else:
failflag = True
else:
try:
mod, aa = parser._split_label(label)
Expand All @@ -176,7 +191,9 @@ def _from_parsed_sequence(self, parsed_sequence, aa_comp):
comp[elem] += cnt

except (PyteomicsError, KeyError):
raise PyteomicsError('No information for %s in `aa_comp`' % label)
failflag = True
if failflag:
raise PyteomicsError('No information for %s in `aa_comp`' % label)
self._from_composition(comp)

def _from_split_sequence(self, split_sequence, aa_comp):
Expand All @@ -186,23 +203,23 @@ def _from_split_sequence(self, split_sequence, aa_comp):
i = 0
while i < len(group):
for j in range(len(group) + 1, -1, -1):
try:
label = ''.join(group[i:j])
label = ''.join(group[i:j])
if label in aa_comp:
for elem, cnt in aa_comp[label].items():
comp[elem] += cnt
except KeyError:
continue
elif parser.is_term_group(label) and label.strip('-') in aa_comp:
_raise_term_label_exception()
else:
i = j
break
continue
i = j
break
if j == 0:
raise PyteomicsError("Invalid group starting from position %d: %s" % (i + 1, group))
self._from_composition(comp)

def _from_sequence(self, sequence, aa_comp):
parsed_sequence = parser.parse(
sequence,
labels=aa_comp,
show_unmodified_termini=True)
self._from_parsed_sequence(parsed_sequence, aa_comp)

Expand Down Expand Up @@ -255,8 +272,7 @@ def __init__(self, *args, **kwargs):
Parameters
----------
formula : str, optional
A string with a chemical formula. All elements must be present in
`mass_data`.
A string with a chemical formula.
sequence : str, optional
A polypeptide sequence string in modX notation.
parsed_sequence : list of str, optional
Expand Down Expand Up @@ -971,7 +987,7 @@ def fast_mass2(sequence, ion_type=None, charge=None, **kwargs):
value is :py:data:`nist_mass`).
aa_mass : dict, optional
A dict with the monoisotopic mass of amino acid residues
(default is std_aa_mass);
(default is std_aa_mass).
ion_comp : dict, optional
A dict with the relative elemental compositions of peptide ion
fragments (default is :py:data:`std_ion_comp`).
Expand Down Expand Up @@ -999,7 +1015,11 @@ def fast_mass2(sequence, ion_type=None, charge=None, **kwargs):
mass += aa_mass[aa] * num
elif parser.is_term_mod(aa):
assert num == 1
mass += calculate_mass(formula=aa.strip('-'), mass_data=mass_data)
group = aa.strip('-')
if group in aa_mass:
_raise_term_label_exception('mass')
else:
mass += calculate_mass(formula=group, mass_data=mass_data)
else:
mod, X = parser._split_label(aa)
mass += (aa_mass[mod] + aa_mass[X]) * num
Expand Down
Loading

0 comments on commit 263d763

Please sign in to comment.