# Get Proteins Sequences

<b>In silico translation:</b> to generate a sample-specific database, each transcript (from RNA-seq or Ribosome Profile Elongation) was translated from the frame dictated by the coupled start codon until the first in-frame stop codon. Any protein sequence longer or equal to 8 AA was retained. Any protein sequence nested in a larger sequence was not added to the database. However, we keep track of all information about proteins (i.e., which proteins were added to the database and which were not), as we use it to assign the most likely origin of each peptide. To avoid combinatorial explosion, we translated the transcripts containing the IUPAC symbols, the complete protein sequence once, and translated small sequences around the locations of the IUPAC symbols (20 ntd in the flaking regions of the SNPs).

Input Files
```python
"""
logPath : path
    Path to save log output 
strand : string
    Either '+' forward or '-' backward
transcripts1 : path
    Path to dic that contains the information of RNA assembled transcripts intercepted by candidates start codons
genes1 : path
    Path to dic that contains the information of RNA assembled genes intercepted by candidates start codons

transcripts2 : path
    Path to dic that contains the information of Ribo-Elong assembled transcripts intercepted by candidates start codons
genes2 : path
    Path to dic that contains the information of Ribo-Elong assembled genes intercepted by candidates start codons

folderToSave : path
    Path to save output 
"""
```
Output Files
```python
"""
Proteins_Candidates_Canonical_+.dic or Proteins_Candidates_Canonical_-.dic : dic
   Dic that contain all the proteins that were retained to include into the db

Start_Codons_Retained_+.dic or Start_Codons_Retained_-.dic :  dic
   Dic that contain the information of the start codons of the proteins that will be included in the db

Info_Proteins_Kept+.gtf or Info_Proteins_Kept-.gtf : gtf-like file
    gtf files that contain all the relevant information of the proteins that will be include in the db 
"""
```

In [1]:
%%bash

echo 'Proteins Strand + '
logPath='.../logs/'

strand='+'
transcripts1='.../Transcripts/Noncanonical/RNA/totalTranscriptsIntersected+.dic'
genes1='.../Tanscripts/Noncanonical/RNA/totalGenesIntersected+.dic'
transcripts2='.../Transcripts/Noncanonical/RiboElong/totalTranscriptsIntersected+.dic'
genes2='.../Transcripts/Noncanonical/RiboElong/totalGenesIntersected+.dic'
folderToSave='.../Proteins/Non_canonical/DB/'

python ../../../Scripts/5_Get_Proteins/Noncanonical_Proteins/getProteinsSequences.py -s $strand -t $transcripts1 -g $genes1 -x $transcripts2 -y $genes2 -f $folderToSave -l $logPath

echo 'Proteins Strand - '
strand='-'
transcripts1='.../Transcripts/Noncanonical/RNA/totalTranscriptsIntersected-.dic'
genes1='.../Transcripts/Noncanonical/RNA/totalGenesIntersected-.dic'
transcripts2='.../Transcripts/Noncanonical/RiboElong/totalTranscriptsIntersected-.dic'
genes2='.../Transcripts/Noncanonical/RiboElong/totalGenesIntersected-.dic'

python ../../../Scripts/5_Get_Proteins/Noncanonical_Proteins/getProteinsSequences.py -s $strand -t $transcripts1 -g $genes1 -x $transcripts2 -y $genes2 -f $folderToSave -l $logPath


Proteins Strand + 
Proteins Strand - 


# Get Non Canonical DB

Input Files

```python
"""
logPath : path
    Path to save log output
forward : path 
    Path to dic that contain the information of the retained proteins in the forward strand (see above) 
backward : path 
    Path to dic that contain the information of the retained proteins in the backward strand (see above)
output : path 
    Path to save output
    
getNonCanonical : boolean
    True to generate the non canonical db
"""
```

Output Files

```python
"""
1_proteinesUniques.info : dic
    Dic that contains the information of all the non canonical proteins to be included in the db. (Forward and Backward strand)

Custom_DB_Total.fasta : fasta file
    Fasta file for the retained non canonical proteins

StartCodonsOrigin_DB.list : list
    List of the frequencies of Start codons
"""
```

In [1]:
%%bash

echo 'Getting Non Canonical'
logPath='.../logs/'
forward='.../Proteins/Non_canonical/DB/ProteinsCandidates_+.dic'
backward='.../Proteins/Non_canonical/DB/ProteinsCandidates_-.dic'
output='.../Proteins/Non_canonical/DB/'
getNonCanonical=True

python ../../../Scripts/5_Get_Proteins/Noncanonical_Proteins/getUniqueSequences.py -a $getNonCanonical -f $forward -b $backward -o $output -l $logPath


Getting Non Canonical
