# Seq-to-first-iso

> Compute first two isotopologues intensities from sequences

**seq-to-first-iso** is a tool to compute isotopologue intentities M0 and M1 of peptide sequences with Natural Carbon
and with 99.99 % 12C enriched carbon.  
The program can take into account unlabelled amino acids to simulate auxotrophies to amino acids.

---
## Using the Command-Line Interface

*Note: the exclamation marks "!" are used for the notebook, in a real terminal, you will not need them.*

In [1]:
!seq-to-first-iso -v

seq-to-first-iso 0.4.3


In [2]:
!seq-to-first-iso -h

usage: seq-to-first-iso [-h] [-o OUTPUT] [-n amino_a] [-v] input

Read a file of sequences and creates a tsv file

positional arguments:
  input                 file to parse

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        name of output file
  -n amino_a, --non-labelled-aa amino_a
                        amino acids with default abundance
  -v, --version         show program's version number and exit


The output file will have columns

|sequence| mass| formula|formula_X| M0_NC| M1_NC| M0_12C| M1_12C|
|--------|-----|--------|---------|------|------|-------|-------|
original sequence|sequence mass|chemical formula|chemical formula with X|M0 in NC|M1 in NC|M0 in 12C|M1 in 12C

X: Virtual element created to represent unlabelled carbon  
NC: Normal Condition (Natural Carbon)  
12C: 12C condition (12C enriched carbon)    

In [3]:
# File used.
!cat peptides.txt

YAQEISR
VGFPVLSVKEHK
LAMVIIKEFVDDLK


### Minimal command

In [4]:
!seq-to-first-iso peptides.txt

[2019-06-26, 16:58:48] INFO    : Parsing file
[2019-06-26, 16:58:48] INFO    : Computing formula
[2019-06-26, 16:58:48] INFO    : Computing composition of modifications
[2019-06-26, 16:58:48] INFO    : Computing mass
[2019-06-26, 16:58:48] INFO    : Computing M0 and M1


Running the command above will create a file with tab-separated values : *peptides_stfi.tsv*

In [5]:
!column -t peptides_stfi.tsv

sequence        mass              formula          formula_X        M0_NC                M1_NC                M0_12C              M1_12C
YAQEISR         865.42938099921   C37H59O13N11     C37H59O13N11     0.6206414140575179   0.280870823368276    0.9206561231798033  0.05161907174495234
VGFPVLSVKEHK    1338.7659712609   C63H102O16N16    C63H102O16N16    0.4550358985377136   0.34506032928190855  0.8905224988642593  0.07411308335404865
LAMVIIKEFVDDLK  1632.91606619252  C76H128O21N16S1  C76H128O21N16S1  0.36994021481230627  0.3373188347614264   0.8315762004558261  0.08101653544902196


### Changing output name

You can also change the name of the output file

In [6]:
!seq-to-first-iso peptides.txt -o sequence

[2019-06-26, 16:58:50] INFO    : Parsing file
[2019-06-26, 16:58:50] INFO    : Computing formula
[2019-06-26, 16:58:50] INFO    : Computing composition of modifications
[2019-06-26, 16:58:50] INFO    : Computing mass
[2019-06-26, 16:58:50] INFO    : Computing M0 and M1


In [7]:
!column -t sequence.tsv

sequence        mass              formula          formula_X        M0_NC                M1_NC                M0_12C              M1_12C
YAQEISR         865.42938099921   C37H59O13N11     C37H59O13N11     0.6206414140575179   0.280870823368276    0.9206561231798033  0.05161907174495234
VGFPVLSVKEHK    1338.7659712609   C63H102O16N16    C63H102O16N16    0.4550358985377136   0.34506032928190855  0.8905224988642593  0.07411308335404865
LAMVIIKEFVDDLK  1632.91606619252  C76H128O21N16S1  C76H128O21N16S1  0.36994021481230627  0.3373188347614264   0.8315762004558261  0.08101653544902196


### Choosing unlabelled amino acids

In [8]:
!seq-to-first-iso peptides.txt -n V,W -o sequence

[2019-06-26, 16:58:52] INFO    : Amino acid with default abundance: ['V', 'W']
[2019-06-26, 16:58:52] INFO    : Parsing file
[2019-06-26, 16:58:52] INFO    : Computing formula
[2019-06-26, 16:58:52] INFO    : Computing composition of modifications
[2019-06-26, 16:58:52] INFO    : Computing mass
[2019-06-26, 16:58:52] INFO    : Computing M0 and M1


In [9]:
!column -t sequence.tsv

sequence        mass              formula          formula_X           M0_NC                M1_NC               M0_12C              M1_12C
YAQEISR         865.42938099921   C37H59O13N11     C37H59O13N11        0.6206414140575179   0.280870823368276   0.9206561231798033  0.05161907174495234
VGFPVLSVKEHK    1338.7659712609   C63H102O16N16    C48H102O16N16X15    0.45503589853771365  0.3450603292819086  0.7589558393662944  0.18515489894512063
LAMVIIKEFVDDLK  1632.91606619252  C76H128O21N16S1  C66H128O21N16S1X10  0.36994021481230627  0.3373188347614264  0.7475090558698947  0.15292723586285323


The carbon of unlabelled amino acids is shown as X in column "formula_X".  
We can observe that for sequence "YAQEISR" that has no unlabelled amino acids, M0 and M1 are the same as the previous *sequence.tsv*, regardless of the condition.  
In contrast sequence "VGFPVLSVKEHK", in 12C condition, has M0 go down from 0.8905224988642593 to 0.7589558393662944 and M1 go up from 0.07411308335404865 to 0.18515489894512063.