<a href="https://colab.research.google.com/github/shawnmuhr/BIOL_398/blob/main/HW_Solutions/biol300_hw1_solns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import random

# Exercise 1 (10 pts)

Concepts: Control Flow, Loops, Functions, Stings

Task: Write a function that will compute the GC content of a given DNA sequence. GC content is the fraction of the sequence that is made up of guanines and cytosines (as opposed to adenines and thymines).  

In [None]:
def GC_content(DNA_seq):
  """Computes the GC content (as a float) given the provided DNA sequence 
  (as a string)"""

  # count Cs and Gs, respectively
  C_count = DNA_seq.count('C')
  G_count = DNA_seq.count('G')

  # comupte GC content as a fraction 
  GC_frac = (C_count + G_count) / len(DNA_seq)

  return GC_frac

In [None]:
GC_content("ATCG")

0.5

As an example, running `GC_content("ATCG")` above should return `0.5`. Try your funtion out with a few other different sequences and dispaly the results to confrim that your function works as expected:

In [None]:
print(GC_content("ATTC")) # should be 0.25
print(GC_content("AGCC")) # should be 0.75
print(GC_content("GGCG")) # should be 1.0

0.25
0.75
1.0


# Exercise 2 (10 pts)

Concepts: Control Flow, Functions, Modules

Task: Write a function that will mutate a given nucleotide (A, T, C, or G) to any *other* nucleotide. For this, you will want to make use of functions in the `random` module. Note that I have already imported the `random` module at the top of this file, which is generally accepted as good practice.



In [None]:
def mutate_nt(nt):
  """Returns a randomly selected mutated nucleotide from the inputted 
  nucleotide"""

  # initialize all possible options, then exclude the nt provided
  options = ["A","T","C","G"]
  options.remove(nt)

  # return one of the remaining options at random
  return random.choice(options)

In [None]:
print(mutate_nt("A"))
print(mutate_nt("T"))
print(mutate_nt("C"))
print(mutate_nt("G"))

T
G
A
C


Run the cell above a couple times to verify that your results are in fact random. Also make sure that you actually mutate the DNA and don't return the same nucleotide as was provided! 

# Exercise 3 (20 pts)

Concepts: Control Flow, Loops, Functions, Strings, Dictionaries

Task: Write a function that will "translate" a given DNA sequence into the corresponding amino acids sequence. For this, the data structure of a `dictionary` will be extremely useful. Below I save a dictionary where the *keys* are the three-basepair codons and the *values* are the corresponding one-letter code for the amino acid. Note that the stop codons are denoted with `*`.

In [None]:
aa_dict = \
{'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', 'TCC': 'S', 
 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TAA': '*', 'TAG': '*', 
 'TGT': 'C', 'TGC': 'C', 'TGA': '*', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', 
 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 
 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGT': 'R', 'CGC': 'R', 
 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 
 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 
 'AAA': 'K', 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R', 
 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', 'GCC': 'A', 
 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E', 
 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'}

With this dictionary, you should be able to loop through a DNA sequence in chunks of three basepairs to get the corresponding amino acid sequence. For this, we won't worry about looking for a start codon (we will just start reading off the DNA at the beginning), but you should make sure to terminate the sequence if there's a stop codon.

In [None]:
def translate(DNA_seq):
  """Returns a string of the amino acid sequence that results from translating
  the inputted DNA sequence"""

  # round down to get number of codons
  num_codons = int(len(DNA_seq)/3)

  # initilize string of amino acids
  aa_seq = ""
  
  for i in range(num_codons):

    # determine codon and corresponding amino acid
    codon = DNA_seq[i*3:i*3+3]
    aa = aa_dict[codon]

    # update amino acid string with new amino acid
    aa_seq = aa_seq + aa

    # check for stop codon
    if aa == "*":
      return aa_seq

  return aa_seq


Below, test out your function on a few single codons. Does it return what you expect?



In [None]:
print(translate("ATG"))
print(translate("TAG"))

M
*


Next, test your function on slightly more complicated sequences, ones with more than one codon. 

As you test your function, consider how it responds if the DNA sequence length is not multiple of 3. If your function currently gives an error, address this bug before proceeding. 

In [None]:
print(translate("ATGACGCAG")) # multiple of three
print(translate("ATGACGCAGTT")) # not multiple of three


MTQ
MTQ


As a final test, we will now translate a longer sequence, one that would be a bit tedious to translate by hand. Make sure the code cell below returns `TRUE`.

In [None]:
translate("ATGATCAGCATCATACTACGACTACAATCAGCATCATACAGCATATAGCATCATCACTA") == 'MISIILRLQSASYSI*'

True

# How long did this take? 

With a new course and new assignments, I want to be conscientious with how much time this course take. Please let me know how long this took, so I can adjust future homeworks if needed.

# Submitting your homework

- Rename your version of this document as `biol300_hw1_FIRSTNAME_LASTNAME.ipynb`. 
- Go to File, then Download, and download this document as `.ipynb` 
- Upload this downloaded `.ipynb` file on Canvas for your HW1 submission.