# Problem 1: Counting DNA Nucleotides

A **string** is simply an ordered collection of symbols selected from some **alphabet** and formed into a word; the **length** of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."

    Given: A DNA string *s* of length at most 1000 nt.

    Return: Four integers (separated by spaces) counting the number of times that the symbols A, C, G, and T 
    occur in s

## Sample Dataset

    Given: AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
    Return: 20 12 17 21

In [16]:
from collections import Counter

# open file containing DNA string 
sequence = open(r'/Users/Sid/Downloads/rosalind_dna.txt').read() #edit filepath to your computer

# count instances of each DNA base in sequence 
countA = sequence.count('A') 
countT = sequence.count('T')
countC = sequence.count('C')
countG = sequence.count('G')

# print instances of each base 
print(countA, " ", countC, " ", countG, " ", countT)

233   232   245   213


# Problem 2: Transcribing DNA into RNA 

An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'.

Given a DNA string *t* corresponding to a coding strand, its transcribed RNA string *u* is formed by replacing all occurrences of 'T' in *t* with 'U' in *u*.

    Given: A DNA string *t* having length at most 1000 nt.
    Return: The transcribed RNA string of *t*.

## Sample Dataset
    Given: GATGGAACTTGACTACGTAAATT
    Return: GAUGGAACUUGACUACGUAAAUU

In [34]:
# open file containing DNA string 
dnaSeq = open(r'/Users/Sid/Downloads/rosalind_rna.txt').read() #edit file path to your computer
# replace each instance of 'T' with 'U' 
rnaSeq = dnaSeq.replace("T", "U") 

print("RNA sequence for associated DNA strand is: ", rnaSeq)

RNA sequence for associated DNA strand is:  GCCGGGGUUUGUGCCGCCAUCCGCACGAGCAUAUAUUUGCUUGCAACAACGCAAAACUUUACCACUUACAUCUGACUUCAAGAUAGGUAGUACCCUAAGAGCAUUGGCGAUUGUCGCGGACAUCCAAUAAAUCCGGAGUAGCGAACCUGACGCGAACUUAGGUUUCCGCGGACGCUAUGAUUAUAGAGUAAAACGCUAUGCCGCCGGCCUUUGUGCGGUGGCAUUACUACCUGCAAGAUCGUUUUGAGCCAUGCCAGUGUACACGUCGUGCCAUUUGAGUUGACCCCCAGGGCAGGUACGUGAACCACAGAGGGUCACGUGAUUGUAUGCCCAUAAUAUAGUAGCCCUCAUGAAGUUGGGAGAGGUGCCGUGAACCAUAAUACGGGGGCUCAUGGAGUGGCAUAGGCUAAAGUUAAGUCGUGGUAUGUAGGUUGUCAGUGGCCUAAGGCUAGGUACCAAUCAACUGUCGGACGAACGGAUUCGAAUUAUGACUAUCAUUUUUUUUUCUCUCGACGCCUCCACAAAUGUCAUCCAUGUACAUUGCCCCGGCCUUCUGCCACGGAUCGAGCAGGAACCUUGAGCUCCCGAAGGAUUUGCCUUCGUUCAACUUCUACGUGCUCGCCGCUGGCAUACCCAAAACAUACUACCUCCGUACGUCCAGUCUCACUGCAAUGUCCUUGCUAGUAAGUUCAUUCAACGAUAUCGGCGUGGUGAUAAGUCUAGGUUGUUAAUGCCAUCAUUCAACUAUGCGUCUAAUUUACGCGUGGACAUGGCACUUAUAAAUACAACAUCGUAUUAAUCCUGAAUAGCACUUGUUUACUAUAAACAUGUGUUGGGAAUAUAUCUCACUGACCCGAUGCAUCCGUGCGCCCGAAAGGGUGGUGUAAUACCUUAGUAACCAGAAGGAUAGUCAAA



# Problem 3: Complementing a Strand of DNA

In DNA strings, symbols 'A' and 'T' are complements of each other, as are 'C' and 'G'. The reverse complement of a DNA string, *s*, the string, *s(c)*, formed by reversing the symbols of *s*, then taking the complement of each symbol (e.g., the reverse complement of "GTCA" is "TGAC").

    Given: A DNA string *s* of length at most 1000 bp.
    Return: The reverse complement *s(c)* of s.

## Sample Dataset
    Given: AAAACCCGGT
    Return: ACCGGGTTTT

In [83]:
# open file containing original DNA string 
origSeq = open(r'/Users/Sid/Downloads/rosalind_revc.txt').read() #edit file path to your computer

complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} # define complement bases 

reverse_complement = "".join(complement.get(base, base) for base in reversed(origSeq)) # swap bases 

print("Reverse complement strand: ", reverse_complement)

Enter seq: ATTCTAGGCATTTTTAAAGAGTCGTACTGGAAGCTCTTATTCACGAGGGCGTATGTAGCCTAACTACCGAAACTCAGACAAGGTGCCGGAGTTGAGGCCAGTGGGATATTTGTATACAAAGATCAGGAGCAGTTAGCGGGTACCCAACAATCCAATTATTACGACAGAAAGGGAGAACCACCCTTGAAGGTCGACAAGGTGTAAGCGACGCAAAGTGGTCTTGCTTTAGAATTATGAGCCTCATAATGACCTGGGAGAGGGCTGGACTTTCTATATACTTGTCAACGGCACAACTATTACATCGCCAAAGCGCTGCTTGCAAAGCCTGGCTGCCAACTTAACGATCGTTGGACACACGCTACTTCTTTAGAGCGAGCGAGCCCAAATACTGGAGTTGTGGGTGACGGTTCATATTAGATTGCAGAAAGCGCTAGATCCCGCCCGGGCTGTCATGGCGTGGTGTTAGGTCATAACCGACTATCTGGGAGATCCTTACTGGGCGGATGGCCAAGATACGTATTTGCGAGTTGATGGCCGAGTTGCATCGACATCTATGACGGTAGGTGGCACCAGAATAGTGCACGACCGATATGTAAGTCGTTGGACGCAGGGTAGGGAAAGGGTAGGCGCCCTTAGTCGGCTACCGGTTCAGGAACGTGATGGGTGATATCGAGCTTTCCGGCTTTCAGAACACGTTCTTGTGGTCGCGCCTGTCCATACTGGTCGCTAACGTTCCGAAAGGGGCAGAATGGAACGGAGGAGAGAGACTAAATATAAGATGCCCCCCTTTTACTCACGTGAGCGTAGCCGCGTATACGACGGGCTAGAGGCCCCCGGAACAAATCAACGTTGGCAGCTCTAGTTCAAGTACACAGCGCTCTTAG
Rev compl:  CTAAGAGCGCTGTGTACTTGAACTAGAGCTGCCAACGTTGATTTGTTCCGGGGGCCTCTAGCCCGTCGTATACGCGGCTACGCTCACGTGAG

# Problem 4: Rabbits and Recurrence Relations

A **sequence** is an ordered collection of objects (usually numbers), which are allowed to repeat. Sequences can be finite or infinite. Two examples are the finite sequence (π,−2‾√,0,π) and the infinite sequence of odd numbers (1,3,5,7,9,…). We use the notation an a(n) to represent the n-th term of a sequence.

A **recurrence relation** is a way of defining the terms of a sequence with respect to the values of previous terms. In the case of Fibonacci's rabbits from the introduction, any given month will contain the rabbits that were alive the previous month, plus any new offspring. A key observation is that the number of offspring in any month is equal to the number of rabbits that were alive two months prior. As a result, if F(n) represents the number of rabbit pairs alive after the n-th month, then we obtain the **Fibonacci sequence** having terms F(n) that are defined by the recurrence relation F(n)=F(n)−1+F(n)−2 (with F1=F2=1 to initiate the sequence). Although the sequence bears Fibonacci's name, it was known to Indian mathematicians over two millennia ago.

When finding the n-th term of a sequence defined by a recurrence relation, we can simply use the recurrence relation to generate terms for progressively larger values of n. This problem introduces us to the computational technique of **dynamic programming**, which successively builds up solutions by using the answers to smaller cases.

    Given: Positive integers n≤40 and k≤5.
    Return: The total number of rabbit pairs that will be present after n months, if we begin with 1 pair and in each generation, every pair of reproduction-age rabbits produces a litter of k rabbit pairs (instead of only 1 pair).

## Sample Dataset
    Given: 5 3
    Return: 19

In [88]:
def rabbits(n, k):
   if n == 0: # base case 
       return 0
   if n == 1: # another base case 
       return 1
   else: # recursive case 
       return rabbits(n-1, k) + k*rabbits(n-2, k)

print(rabbits(36,2))

22906492245


In [89]:
origSeq = open(r'/Users/Sid/Downloads/rosalind_revc.txt').read() #edit file path to your computer



with open('/Users/Sid/Downloads/rosalind_gc.txt','r') as file:
    content = file.read().splitlines()
final, lst = [], []
for i in range(len(content)):
    if content[i][0] == '>':
        final.append(lst)
        lst = [content[i][1:]]
    else:
        lst.append(content[i])
final.append(lst)
final.pop(0)
Final = []
for i in range(len(final)):
    DNA = ''
    for j in range(1,len(final[i])):
        DNA += final[i][j]
    Final.append([final[i][0],DNA])
GC_content = []
for i in range(len(Final)):
    record = "%.6f" % round((Final[i][1].count('C') + Final[i][1].count('G')) / len(Final[i][1])*100, 6)
    GC_content.append([record, Final[i][0]])

Max = max(GC_content)

print(Max[1])
print(Max[0])

Rosalind_1370
52.597403


In [95]:
from sys import stdin

def hamm(dna1, dna2):
  answer = 0
  for i in range(0, len(dna1)):
    if not dna1[i] == dna2[i]:
      answer = answer + 1
  return answer

def main():
  data = ['CTGTGATCTTAAAGATCCCAATACAATCGGAATCAGTTTGAGTCGAGGCAAACAAGGATGTACTAAAGACATGCCTTAACTACCTATAGCTGCAGGGTCAATAAGCCGTGCGTGGAGTAACTTGTGATGTCGGGCCGAGTTTCAATGTATTCTGCAAATTCTTTGACGACCTGAGTCTTTAACGTTGTTAGAACCGAACGCATCAGACTCCAATCTGATTATCGAAAGATGGAATGCGCTAATAAATCCGGGGAGTGGATTTAGCGTTCACACTAGCGGTTGTGCGCGAGCTCTACCCCGGTAAATCACCTGCGGAAGGTTGCCCGACGTGTCAAAAAAAGCAGCGGAGGGAACGTGCCGTTCGATCCGACACGAAGTTTCGATCGCCAACTTAAGACTGACAGGACGCGTGTTAGATCCCCGATTATATAGGACCGTCCCTCCGAACATGTGCCCGTCTAAAAAACAGATGGTTCATGGCCCTTAAAGGGCTGGTTGAATTGCGATCAGAATGTGCGCTCATCACCTCATACACTTTTCATGAGGTGGTAGTATTGTGATTGCAACCGGATTAGACAGAGCGAAAGGAATACTAACTTTGTTAGCCTGGACTCAGTAGATGTAATTCGTGTCAAGAGCTCTGCTAGCCTCGAATCGGACACTATTGCCGAAAGAAACAGTTACCGCAAGGCATCATCACAATGGGCGGATAATTGTCAACGGGCCACCGACAGGTTTTGCGCCTAGAAGGTTGGACTACTCCGCACGCTGGATGAGCAATGTCGGGAACGGCTTGGGAGAAGATCAGTAAGCAGGTCGTCTGCAACAGGCCCCGTTTCCTTCGTCATGGATATCTAGTTACTACTGTGGTTTGAAGTTGGTGAGGGAGCCGCACAGAGTACCGGTTCCGGATGCCTCGCTCGCCTCGTTTTGTTAGTAACTCTTAACATAACGAAT',
          'ATGGAACCGTAGGAGACATTACTAACTGGTACTCCGTTGCTGGGGAGGGCGAGAAGATTCTACTCAAGTTCGTGCTGACGGATCTAGCTACGCGCGTACACGGGGCCGAACAAGGCGTAACACCGAATATCGGTACCACCTTCACGGAGGCTTACAGACTTGACGGGGTAATGAGTGCTTGACGTGTGTATCGGTGAGCAAAAGATGCTCCACCCTAGTCGGGTGTAGCTAGTACACAATCGAAAATCCGGAGAGTTCGTGAGGCTTTACAATTACCGTTGATTTCGGACCGCGTGCATCAGACAGCCGCTGCAGGACTACGCCCGACCTGTGTACACACTACGAGGTGAGAGAGTGTGGCTCGATTCGACGAGAGGTCTGGACCGTTTGCTTGCCAATAACAGCGCCGTTTAACCATGTACGAATATGTATGAGCGACGCGCGTAAGATGTAAGCAATGAGCCAGCTTTGTGATAATTGCCATGAAAAGGTTGTCTGGATTACGATCGGAAGGTACACTCTTGACAACGTATAAGGGGATATCGCTGCTGTTTTTCCTCTTGCGTGTGTACAAGCTTGATGCGTTGTGGAAAAATCCTGGTCACTAGGGAATCTTTAGACATAATTCCTTTAGTGAGCGCTGGTACTCTCAGGCCACCCACGCTTGCCGGACTAAACATTCCACGAAGTGTGCACTAACTGCGCGTGCATACTCGAGTATCCGTTTACGGCGGCTATACAGCCATAGGCGTCTATGAACTTCTAACCGTGGCTCGCGAACCTCCAGGAAAGCTGGTGACGGAATCTTCGATCCCCTCAACTGGCTCAGGCAGCGATTTCATCGGGATGGACACAATTGTTATACCCTTAGCTAAAGATGGTGAAACTTGACATCGATCTGACCGAGCAGGATGGCTGACAATCGCTCTATTCTTAGATTCACTGGGGATGAGCCAA'
         ]
  for line in stdin:
    data.append(line.strip())
  dna1 = data[0]
  dna2 = data[1]
  answer = hamm(dna1, dna2)
  print(answer)

if __name__ == "__main__":
  main()

495
