## <u>__Exercises__</u> ##

**Part 1**

We can generate (pseudo-)random numbers using the `random` function:

https://docs.python.org/3/library/random.html

Can you combine this with a loop and and `if` function to roll the dice (pick a random number between 1-6) 10 times in a row?
Next, can you keep rolling a dice until you get a 6? How about two 6's in a row?


In [None]:
# @title Code snippet


import random
print(random.random())

In [None]:
# @title Solution - don't peak!

import random

x = 0
print("Roll 10x: ")
for x in range(10):
    roll = random.randint(1, 6)
    print(f"Roll: {x+1}: Rolled: {roll}")

print()
print("Roll a 6 and stop: ")
rolls = 0
while True:
    roll = random.randint(1, 6)
    rolls += 1
    print(f"Rolled: {roll}")

    if roll == 6:
        print(f"Got a 6 after {rolls} rolls!")
        break

print()
print("Roll until you get two 6's in a row: ")
rolls = 0
while True:
    roll = random.randint(1, 6)
    rolls += 1
    print(f"Rolled: {roll}")

    if roll == 6:
        print(f"Got a 6 after {rolls} rolls!")
        roll = random.randint(1, 6)
        if roll == 6:
            print(f"Got a 6 the next roll!")
            break


**Part 2**

If the random function is truely random, we should get equal numbers of 1-6 after suitably many dice rolls.

Use an array or list to capture the output of the dice rolls and see how many rolls are needed before the number of rolls is (roughly) even.



In [None]:
# @title Code snippet
# count may be useful here:

list = [1,1,2,2,3,5]
print(list.count(1))


In [None]:
# @title A solution

import random
import matplotlib.pyplot as plt # we will do more plotting tomorrow

x = 0
nrolls = 10000   # Number of dice rolls
outcome = []  # Our list of roll outcomes. We start with an empty list.

for x in range(nrolls):
    roll = random.randint(1, 6)
    outcome.append(roll)

#Â print(outcome)   #check that we are adding dice rolls to the list

counts = []
for i in range(1,7):
    counts.append(outcome.count(i))
print(counts)


plt.bar(range(1,7),counts)
plt.title('A graph...')
plt.xlabel('number')
plt.ylabel('count')
plt.show()



**Part 3**

No let's do something a little more biological.
Generate a random sequence of DNA.

In [None]:
# @title A solution

import random

length = 100

list = []
gene = []

# creat a list of numbers 1-4 of length 'length'
for i in range(length):
  list.append(random.randint(1, 4))
# print(list)

# replace 1-4 with A-G
gene = list
for i in range(len(gene)):
  gene[i] = "A" if gene[i] == 1 else gene[i]
  gene[i] = "T" if gene[i] == 2 else gene[i]
  gene[i] = "C" if gene[i] == 3 else gene[i]
  gene[i] = "G" if gene[i] == 4 else gene[i]

# convert the list into a single string
geneseq = "".join(gene)

# check length using `len`
print("gene of length: ",len(geneseq))
print(geneseq)



**Part 4**

Does the DNA sequence contain an ATG codon or Eco R1 restiction site ("GAATTC")?
If not, can you make a loop that keeps generating new random sequences until you find one?

For extra credit, can you calculate the melting temperature. 

https://www.calculator.bio/gc-content-tm/




In [None]:
# @title Another way using random.choices()
# from https://colab.research.google.com/github/Intertangler/bioinformatics_stockholm/blob/master/1b_biopython_and_string_practice.ipynb#scrollTo=fqs2bnmxn1_4

import random
import numpy as np

def generate_random_sequence(sequence_length, probability_distribution):
    sequence = ""
    for i in range(0,sequence_length):
        sequence = sequence + random.choices(["A","C","G","T"], weights=probability_distribution,k=1)[0]
    return sequence

length = 10
dist = [0.25,0.25,0.25,0.25] # probability of finding A, C, G and T, respectively


print("look for 'GAATTC':")

max = 100 # instead of using while, we can use a for loop with a maximum of `max' attempts
j = 0
k = 0

while True or j < max :
    j += 1
    DNA = generate_random_sequence(length, dist)
    
    if DNA[k:k+6] == "GAATTC" :
        print(DNA)
        print(f"Found 'ATG'. It took {j} turns ")
        break
        


**Part 5 - Introduction to Biopython**

We will use the Biopython module for the next part.
It does not come installed by default, so this will install it using pip.


In [None]:
pip install Biopython

Biopython has a `Seq` function, which can generate a special string object that is specifically e.g. DNA or protein sequence:

In [None]:
# @title Biopython

from Bio.Seq import Seq

DNA = Seq("ATCGATGATAGGATA")
protein = Seq("CATINTHAMAT")

print(type(DNA))
print(type(protein))

#Usual string operations apply
print(DNA[0:3])
print(len(protein))



Biopython Seq type uses `methods`, which are similar to functions, but specific to a particular type of object. 
Examples:
- make reverse complementary DNA sequence
- transcibe or translate DNA to protein sequence

In [None]:
reverse_complment = DNA.reverse_complement()
transcribe = DNA.transcribe()
translate = DNA.translate()


print(DNA)
print(reverse_complment)
print(transcribe)
print(translate)


Perhaps more usefully, Biopython can retrieve entries from the NCBI:

https://biopython.org/docs/1.75/api/Bio.Entrez.html

https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html



Let's look at some entries for the enzyme COMT.

https://www.ncbi.nlm.nih.gov/datasets/gene/1312/




In [None]:
from Bio import Entrez

Entrez.email = "XXX@manchester.ac.uk"  # Always tell NCBI who you are

handle = Entrez.efetch(db="nucleotide", id="NM_000754.4", rettype="gb", retmode="text")
print(handle.read())
handle.close()

handle = Entrez.efetch(db="pdb", id="NM_000754.4", rettype="gb", retmode="text")
print(handle.read())
handle.close()



We can also use python to work with pdb files (protein structures).

We will use another module call py3Dmol for this and will use !wget to directly download a pdb file from rcsb.org.

https://www.rcsb.org/structure/6I3C



In [None]:
!pip install py3Dmol # Installing biopython using pip


In [None]:
!wget http://files.rcsb.org/download/6I3C.pdb # downloading our pdb file
!grep ATOM 6I3C.pdb > protein.pdb
!grep HETATM 6I3C.pdb > ligand.pdb


In [None]:
# and use py3Dmol to view the file.

import py3Dmol
v = py3Dmol.view()
v.addModel(open('protein.pdb').read())
v.setStyle({'cartoon':{}})
v.addModel(open('ligand.pdb').read())
v.setStyle({'model':1},{'stick':{'colorscheme':'greenCarbon'}})
v.zoomTo({'model':1})



Now it's play time....