The following problem asks you to find the translation of an RNA string into an amino acid string.

Protein Translation Problem: Translate an RNA string into an amino acid string.

Input: An RNA string Pattern and the array GeneticCode.<br>
Output: The translation of Pattern into an amino acid string Peptide.<br>
Code Challenge: Solve the Protein Translation Problem.

Notes:

The "Stop" codon should not be translated, as shown in the sample below.<br>
For your convenience, we provide a downloadable RNA codon table indicating which codons encode which amino acids.

Sample Input:

    AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
Sample Output:

    MAMAPRTEINSTRING

In [1]:
def translation(rna):
    result = ""
    dictionary = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"", "UAG":"",
    "UGU":"C", "UGC":"C", "UGA":"", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G"}
    for i in range(0, len(rna), 3):
        result += dictionary[rna[i:i + 3]]
    return result

Peptide Encoding Problem: Find substrings of a genome encoding a given amino acid sequence.

Input: A DNA string Text, an amino acid string Peptide, and the array GeneticCode.<br>
Output: All substrings of Text encoding Peptide (if any such substrings exist).<br>
Code Challenge: Solve the Peptide Encoding Problem. Click here for the RNA codon table corresponding to the array GeneticCode.

Note: The solution may contain repeated strings if the same string occurs more than once as a substring of Text and encodes Peptide.

Sample Input:

    ATGGCCATGGCCCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGA
    MA
Sample Output:

    ATGGCC
    GGCCAT
    ATGGCC

In [2]:
from itertools import product

def reverse_translate(peptide):
    aminoacids = {
    "M": ["ATG"],
    "I": ["ATA", "ATC", "ATT"],
    "A": ["GCT", "GCA", "GCC", "GCG"],
    "S": ["TCA", "TCC", "TCG", "TCT"],
    "F": ["TTC", "TTT"],
    "P": ["CCA", "CCC", "CCG", "CCT"],
    "C": ["TGC", "TGT"],
    "K": ["AAG", "AAA"],
    "H": ["CAT", "CAC"],
    "D": ["GAT", "GAC"],
    "V": ["GTA", "GTC", "GTG", "GTT"],
    "L": ["TTG", "TTA", "CTA", "CTC", "CTG", "CTT"],
    "W": ["TGG"],
    "T": ["ACA", "ACC", "ACG", "ACT"],
    "R": ["AGA", "AGG", "CGA", "CGG","CGT", "CGC"],
    "Y": ["TAT", "TAC"],
    "N": ["AAC", "AAT"],
    "Q": ["CAA", "CAG"],
    "E": ["GAA", "GAG"],
    "G": ["GGA", "GGC", "GGT", "GGG"],
    "*": ["TAA", "TAG", "TGA"]
    }
    result = []
    for i in range(len(peptide)):
        result.append(aminoacids[peptide[i]])
    result2 = list(map(lambda x: "".join(x), product(*result)))
    return result2


def reverse_complement(dna):
    result = ""
    for i in range(len(dna)):
        if dna[i] == "A":
            result += "T"
        elif dna[i] == "T":
            result += "A"
        elif dna[i] == "C":
            result += "G"
        elif dna[i] == "G":
            result += "C"
    return result[::-1]


def main(text, peptide):
    result = reverse_translate(peptide)
    result.extend(list(map(reverse_complement, result)))
    final_result = []
    for item in result:
        n = text.count(item)
        for i in range(n):
            final_result.append(item)
    # Print answer
    for item in final_result:
        print(item)
    return final_result

Generating Theoretical Spectrum Problem: Generate the theoretical spectrum of a cyclic peptide.

Input: An amino acid string Peptide.<br>
Output: Cyclospectrum(Peptide).<br>
Code Challenge: Solve the Generating Theoretical Spectrum Problem.

Note: An obvious approach for solving the Generating Theoretical Spectrum Problem would be to construct a list containing all subpeptides of Peptide, and then find the mass of each subpeptide by adding the integer masses of its constituent amino acids. This approach will work, but you may like to check out Charging Station: Generating the Theoretical Spectrum of a Peptide to see a more elegant algorithm that applies to both linear and cyclic peptides.

Sample Input:

    LEQN
Sample Output:

    0 113 114 128 129 227 242 242 257 355 356 370 371 484

In [5]:
def cyclospectrum(peptide):

    def subpeptides(peptide):
        l = len(peptide)
        ls = []
        looped = peptide + peptide
        for start in range(0, l):
            for length in range(1, l):
                ls.append((looped[start:start + length]))
        ls.append(peptide)
        return ls

    masses = {'A' : 71,
          'R' : 156,
          'N' : 114,
          'D' : 115,
          'C' : 103,
          'E' : 129,
          'Q' : 128,
          'G' : 57,
          'H' : 137,
          'I' : 113,
          'L' : 113,
          'K' : 128,
          'M' : 131,
          'F' : 147,
          'P' : 97,
          'S' : 87,
          'T' : 101,
          'W' : 186,
          'Y' : 163,
          'V' : 99,
          'X' : 0     #  for unknown amino acids
          } # dictionary of amino acids 'aa' and their monoisotopic mass

    subpeptides = subpeptides(peptide)
    result = [0]
    for item in subpeptides:
        mass = 0
        for aa in item:
            mass += masses[aa]
        result.append(mass)

    final_result = str(sorted(result)).replace(", ", " ").strip("[]")
    return final_result

Code Challenge: Implement LinearSpectrum.

Input: An amino acid string Peptide.<br>
Output: The linear spectrum of Peptide.

Sample Input:

    NQEL
Sample Output:

    0 113 114 128 129 242 242 257 370 371 484

In [4]:
def linear_spectrum(peptide):
    masses = {'A' : 71,
          'R' : 156,
          'N' : 114,
          'D' : 115,
          'C' : 103,
          'E' : 129,
          'Q' : 128,
          'G' : 57,
          'H' : 137,
          'I' : 113,
          'L' : 113,
          'K' : 128,
          'M' : 131,
          'F' : 147,
          'P' : 97,
          'S' : 87,
          'T' : 101,
          'W' : 186,
          'Y' : 163,
          'V' : 99,
          'X' : 0     #  for unknown amino acids
          } # dictionary of amino acids 'aa' and their monoisotopic mass
    alphabet = masses.keys()
    prefix_mass = [0]
    for i in range(1, len(peptide) + 1):
        for s in alphabet:
            if s == peptide[i - 1]:
                prefix_mass.append(prefix_mass[i - 1] + masses[s])
    linear_spectrum = [0]
    for i in range(0, len(peptide)):
        for j in range(i + 1, len(peptide) + 1):
            linear_spectrum.append(prefix_mass[j] - prefix_mass[i])
    return sorted(linear_spectrum)
