The following problem asks you to find the translation of an RNA string into an amino acid string.

Protein Translation Problem: Translate an RNA string into an amino acid string.

Input: An RNA string Pattern and the array GeneticCode.<br>
Output: The translation of Pattern into an amino acid string Peptide.<br>
Code Challenge: Solve the Protein Translation Problem.

Notes:

The "Stop" codon should not be translated, as shown in the sample below.<br>
For your convenience, we provide a downloadable RNA codon table indicating which codons encode which amino acids.

Sample Input:

    AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
Sample Output:

    MAMAPRTEINSTRING

In [5]:
def translation(rna):
    result = ""
    dictionary = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"", "UAG":"",
    "UGU":"C", "UGC":"C", "UGA":"", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G"}
    for i in range(0, len(rna), 3):
        result += dictionary[rna[i:i + 3]]
    return result

'MIVRIQDRALWLWASRIVQTAVPRAIKPKALLNQTLLRNKASHDPCSILGPRRSALILFLLVPSLIICVLLPPKDMISGNPPLTPLPRAADSVALPNTRSTAEDLTEHSDLMGHASSEEVACSDGAASTLLGAVRSSSRIKGGGYHKNKLALSFRIAVQTLSSTDVKRAISPSRPRGGTLIRILRPQLSFHYHRNNIAYKYFVGNSIFAAPGIVTLYPGTSVQVLLRLSRLHFMLLSMRGQRSETHYSNLGRMSSPTLFSGGMSWYSDGLMRSVPGLVTERPWGYCLGDLITRLCRVFTIPLEPSCLLQQLRTRIQLLDVQLPWKSLAARDAKGRAHGSDDAWGPTFRRSKFPELRPRPCSFDGPLTSSLVVARLVTPPNTPTVIYNSRGSHFIIAPWVYRLGVLCHRACVRTHMLPEDTIRVKWLYLEIGGGCYTPQSRAFKPCVAKPTFVHQDDTEESTRKRPTSGKAVRPTKHLSPLAAVLGQPAVLTCYRVRFCKCQVLGLSREGTTRVTCKYTRQTRWNLGIPTEGPSRHESQALPGNPQPSIHSGISSMARVTTYSPTNGTGDMAHIDLIGIGLMARVYHHKHASKFVMDKDNIKQLSSRFSDFHHNGNDVLPGVFDGEATPKSRWCLTLRIETGASFIDNVSAIAGISSIVPGTINEYRFGPISQRYREVPYWSGYRPAQCRHGDVKPSKGGTLPLLRELHIGTFAARLLLCFILIILKKLTDGRGKPYVASSTVSNLHVNLAGEFTLSVRPSIIEWSYKVRVVKDKDLATRRYSQGSGLAGKRSFWGKTTYRYLNGRGHLDRDRVPAGILLGGEPIRHWRPPTCVSTCSRSTSHPTPGHMTEGGGCPIREGVGSLGLLRWDRFLTAHNCPKRPGHKLKLVHIYSLQRIRDQFGSSGPKLYLPSLHLGDAYRIPCRTPVKASHSVTARNGVAFRFRTFTYNSTGSIISSEGLETPAGLQPGRKPPDEEREAWSGVPLSYFMSLQDPHARSAS

Peptide Encoding Problem: Find substrings of a genome encoding a given amino acid sequence.

Input: A DNA string Text, an amino acid string Peptide, and the array GeneticCode.<br>
Output: All substrings of Text encoding Peptide (if any such substrings exist).<br>
Code Challenge: Solve the Peptide Encoding Problem. Click here for the RNA codon table corresponding to the array GeneticCode.

Note: The solution may contain repeated strings if the same string occurs more than once as a substring of Text and encodes Peptide.

Sample Input:

    ATGGCCATGGCCCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGA
    MA
Sample Output:

    ATGGCC
    GGCCAT
    ATGGCC

In [24]:
from itertools import product

def reverse_translate(peptide):
    aminoacids = {
    "M": ["ATG"],
    "I": ["ATA", "ATC", "ATT"],
    "A": ["GCT", "GCA", "GCC", "GCG"],
    "S": ["TCA", "TCC", "TCG", "TCT"],
    "F": ["TTC", "TTT"],
    "P": ["CCA", "CCC", "CCG", "CCT"],
    "C": ["TGC", "TGT"],
    "K": ["AAG", "AAA"],
    "H": ["CAT", "CAC"],
    "D": ["GAT", "GAC"],
    "V": ["GTA", "GTC", "GTG", "GTT"],
    "L": ["TTG", "TTA", "CTA", "CTC", "CTG", "CTT"],
    "W": ["TGG"],
    "T": ["ACA", "ACC", "ACG", "ACT"],
    "R": ["AGA", "AGG", "CGA", "CGG","CGT", "CGC"],
    "Y": ["TAT", "TAC"],
    "N": ["AAC", "AAT"],
    "Q": ["CAA", "CAG"],
    "E": ["GAA", "GAG"],
    "G": ["GGA", "GGC", "GGT", "GGG"],
    "*": ["TAA", "TAG", "TGA"]
    }
    result = []
    for i in range(len(peptide)):
        result.append(aminoacids[peptide[i]])
    result2 = list(map(lambda x: "".join(x), product(*result)))
    return result2


def reverse_complement(dna):
    result = ""
    for i in range(len(dna)):
        if dna[i] == "A":
            result += "T"
        elif dna[i] == "T":
            result += "A"
        elif dna[i] == "C":
            result += "G"
        elif dna[i] == "G":
            result += "C"
    return result[::-1]


def main(text, peptide):
    result = reverse_translate(peptide)
    result.extend(list(map(reverse_complement, result)))
    final_result = []
    for item in result:
        n = text.count(item)
        for i in range(n):
            final_result.append(item)
    # Print answer
    for item in final_result:
        print(item)
    return final_result

ACATACAATACTCAAATGATCTGGACA
ACATACAATACTCAGATGATTTGGACC
ACCTATAACACGCAGATGATCTGGACG
ACCTACAACACCCAAATGATCTGGACA
ACGTACAACACGCAGATGATCTGGACC
ACTTATAACACACAGATGATTTGGACG
ACTTATAATACCCAGATGATTTGGACG
ACTTACAACACACAAATGATTTGGACT
AGTCCATATCATCTGAGTATTATATGT
CGTCCAAATCATTTGCGTGTTGTATGT
TGTCCAGATCATCTGGGTATTGTATGT
GGTCCAGATCATCTGCGTATTGTATGT
CGTCCAGATCATCTGAGTGTTATAGGT
TGTCCATATCATTTGAGTATTGTAGGT
TGTCCATATCATCTGGGTATTATACGT
CGTCCATATCATCTGGGTATTATACGT
GGTCCAAATCATCTGCGTATTATAAGT
GGTCCATATCATTTGTGTGTTGTAAGT
CGTCCAGATCATCTGCGTGTTGTAAGT
TGTCCATATCATTTGCGTATTGTAAGT


['ACATACAATACTCAAATGATCTGGACA',
 'ACATACAATACTCAGATGATTTGGACC',
 'ACCTATAACACGCAGATGATCTGGACG',
 'ACCTACAACACCCAAATGATCTGGACA',
 'ACGTACAACACGCAGATGATCTGGACC',
 'ACTTATAACACACAGATGATTTGGACG',
 'ACTTATAATACCCAGATGATTTGGACG',
 'ACTTACAACACACAAATGATTTGGACT',
 'AGTCCATATCATCTGAGTATTATATGT',
 'CGTCCAAATCATTTGCGTGTTGTATGT',
 'TGTCCAGATCATCTGGGTATTGTATGT',
 'GGTCCAGATCATCTGCGTATTGTATGT',
 'CGTCCAGATCATCTGAGTGTTATAGGT',
 'TGTCCATATCATTTGAGTATTGTAGGT',
 'TGTCCATATCATCTGGGTATTATACGT',
 'CGTCCATATCATCTGGGTATTATACGT',
 'GGTCCAAATCATCTGCGTATTATAAGT',
 'GGTCCATATCATTTGTGTGTTGTAAGT',
 'CGTCCAGATCATCTGCGTGTTGTAAGT',
 'TGTCCATATCATTTGCGTATTGTAAGT']

Generating Theoretical Spectrum Problem: Generate the theoretical spectrum of a cyclic peptide.

Input: An amino acid string Peptide.<br>
Output: Cyclospectrum(Peptide).<br>
Code Challenge: Solve the Generating Theoretical Spectrum Problem.

Note: An obvious approach for solving the Generating Theoretical Spectrum Problem would be to construct a list containing all subpeptides of Peptide, and then find the mass of each subpeptide by adding the integer masses of its constituent amino acids. This approach will work, but you may like to check out Charging Station: Generating the Theoretical Spectrum of a Peptide to see a more elegant algorithm that applies to both linear and cyclic peptides.

Sample Input:

    LEQN
Sample Output:

    0 113 114 128 129 227 242 242 257 355 356 370 371 484

In [3]:
def cyclospectrum(peptide):

    def subpeptides(peptide):
        l = len(peptide)
        ls = []
        looped = peptide + peptide
        for start in range(0, l):
            for length in range(1, l):
                ls.append((looped[start:start + length]))
        ls.append(peptide)
        return ls

    masses = {'A' : 71,
          'R' : 156,
          'N' : 114,
          'D' : 115,
          'C' : 103,
          'E' : 129,
          'Q' : 128,
          'G' : 57,
          'H' : 137,
          'I' : 113,
          'L' : 113,
          'K' : 128,
          'M' : 131,
          'F' : 147,
          'P' : 97,
          'S' : 87,
          'T' : 101,
          'W' : 186,
          'Y' : 163,
          'V' : 99,
          'X' : 0     #  for unknown amino acids
          } # dictionary of amino acids 'aa' and their monoisotopic mass

    subpeptides = subpeptides(peptide)
    result = [0]
    for item in subpeptides:
        mass = 0
        for aa in item:
            mass += masses[aa]
        result.append(mass)

    final_result = str(sorted(result)).replace(", ", " ").strip("[]")
    return final_result

'0 57 57 97 99 113 115 128 129 129 131 156 156 163 163 163 170 212 213 213 214 220 241 257 260 260 262 285 292 294 298 311 319 326 342 349 370 375 376 376 377 393 413 416 423 423 427 454 470 473 474 474 480 489 505 505 508 522 526 531 583 583 583 586 586 588 605 617 630 633 634 636 637 637 640 643 685 687 701 714 734 739 742 746 746 762 765 768 793 796 799 800 800 803 829 843 850 857 875 893 897 897 898 902 924 928 928 932 954 958 959 959 963 981 999 1006 1013 1027 1053 1056 1056 1057 1060 1063 1088 1091 1094 1110 1110 1114 1117 1122 1142 1155 1169 1171 1213 1216 1219 1219 1220 1222 1223 1226 1239 1251 1268 1270 1270 1273 1273 1273 1325 1330 1334 1348 1351 1351 1367 1376 1382 1382 1383 1386 1402 1429 1433 1433 1440 1443 1463 1479 1480 1480 1481 1486 1507 1514 1530 1537 1545 1558 1562 1564 1571 1594 1596 1596 1599 1615 1636 1642 1643 1643 1644 1686 1693 1693 1693 1700 1700 1725 1727 1727 1728 1741 1743 1757 1759 1799 1799 1856'

Code Challenge: Implement LinearSpectrum.

Input: An amino acid string Peptide.<br>
Output: The linear spectrum of Peptide.

Sample Input:

    NQEL
Sample Output:

    0 113 114 128 129 242 242 257 370 371 484

In [6]:
def linear_spectrum(peptide):
    masses = {'A' : 71,
          'R' : 156,
          'N' : 114,
          'D' : 115,
          'C' : 103,
          'E' : 129,
          'Q' : 128,
          'G' : 57,
          'H' : 137,
          'I' : 113,
          'L' : 113,
          'K' : 128,
          'M' : 131,
          'F' : 147,
          'P' : 97,
          'S' : 87,
          'T' : 101,
          'W' : 186,
          'Y' : 163,
          'V' : 99,
          'X' : 0     #  for unknown amino acids
          } # dictionary of amino acids 'aa' and their monoisotopic mass
    alphabet = masses.keys()
    prefix_mass = [0]
    for i in range(1, len(peptide) + 1):
        for s in alphabet:
            if s == peptide[i - 1]:
                prefix_mass.append(prefix_mass[i - 1] + masses[s])
    linear_spectrum = [0]
    for i in range(0, len(peptide)):
        for j in range(i + 1, len(peptide) + 1):
            linear_spectrum.append(prefix_mass[j] - prefix_mass[i])
    return sorted(linear_spectrum)


0 57 71 71 71 71 71 87 87 97 99 101 101 103 103 103 103 113 113 114 114 114 114 115 128 128 128 128 128 129 129 129 131 131 137 137 137 142 147 156 156 163 163 163 168 170 171 174 186 186 199 200 204 206 208 211 213 215 215 217 218 229 232 234 238 241 242 244 257 258 259 262 265 265 266 268 269 270 271 273 275 276 276 277 277 282 284 287 287 291 291 291 305 313 318 318 321 323 328 328 329 332 333 333 336 339 339 360 362 362 372 373 374 375 379 392 394 394 395 399 400 400 401 404 404 404 404 405 410 410 418 419 421 424 424 428 428 431 433 433 435 442 447 447 457 461 463 466 474 476 480 481 486 491 495 497 499 499 502 503 506 507 511 518 523 531 531 532 532 534 537 539 541 547 547 550 557 561 562 564 567 571 575 577 578 587 594 594 594 594 596 598 598 600 604 605 605 608 610 610 612 617 618 628 634 645 665 665 668 669 674 675 676 681 681 686 687 690 690 692 692 695 695 695 697 697 701 706 708 709 710 712 715 718 722 722 727 739 740 746 747 748 757 757 767 768 768 784 790 791 793 801 805 