# Today's Learning Objectives

- Explain how we can find an open reading frame (ORF)
- *Appreciate* how code is relevant (and necessary!) to tackle some biological problems!  

# If you get stuck:
- Make sure to hit all the "play" buttons
- Read the **bottom** of the error messages 
- Check for typos vigilantly! Working in groups is great for this!
- Watch out for indentation -- Python cares about white space
- Worst case: use pen and paper!

# Part 1: Python basics.

The first piece of code anyone ever learns to write:

In [4]:
print("Hello, World!")

Hello, World!


First, we can use Python as a glorified calculator:

In [5]:
5+7

12

In [6]:
5*7

35

But more importantly, we may want to save values as *variables* in order to use them repeatedly. 

In [7]:
a = 5
b = 7 
a + b

12

In [8]:
a * b

35

We can also save values other than numbers. Specially, a sequence of characters can be saved as a variable known as a *string*. Strings are denoted with the quotation marks. 

In [12]:
DNA1 = "TCGATCGATCGATCG"
DNA2 = "GCTAGCTTGGCTAGCT"
DNA1 + DNA2 

'TCGATCGATCGATCGGCTAGCTTGGCTAGCT'

# Part 2: Finding the complement of a DNA sequence.

Let's now write our first piece of code to return the complement of a given DNA basepair.

In [23]:
bp = "A"

# comments
if bp == "A":
    ans = "T"

if bp == "T":
    ans = "A"
    
if bp == "C":
    ans = "G"

if bp == "G":
    ans = "C"
    
ans

'T'

While this is nice, we may want to get the reverse complement of an entire DNA sequence, and not just a single basepair. We will use a structure known as a `for` loop, which allows us to do the same action over and over again. 

In [25]:
DNA = "ATCGACTACGACATGACTACTAGACATCATCGCATATAGAGCAT"

ans = ""

for bp in DNA:
    
    if bp == "A":
        ans = ans + "T"

    if bp == "T":
        ans = ans + "A"

    if bp == "C":
        ans = ans + "G"

    if bp == "G":
        ans = ans + "C"
    
ans

'TAGCTGATGCTGTACTGATGATCTGTAGTAGCGTATATCTCGTA'

Lastly, we should wrap all this code into a *function*, so that we can easily use this code over an over again.

In [26]:
def complement(DNA):
    
    ans = ""
    
    for bp in DNA:
    
        if bp == "A":
            ans = ans + "T"

        if bp == "T":
            ans = ans + "A"

        if bp == "C":
            ans = ans + "G"

        if bp == "G":
            ans = ans + "C"
    
    return ans

We can now *call* this function below, to make sure it works with different input:

In [27]:
complement("ATCATCACTATCACTACTACTACTACTATAGCGCGCGCTATCGACGCA")

'TAGTAGTGATAGTGATGATGATGATGATATCGCGCGCGATAGCTGCGT'

In [28]:
complement("GCTGCATCGATC")

'CGACGTAGCTAG'

In [29]:
complement("ATTT")

'TAAA'

# Part 3: Finding an open reading frame (ORF).

As another DNA-related task, we may want to find an open reading frame (ORF) in a given DNA sequence. And open reading frame is a segment of DNA that begins with a start codon (ATG) and ends with one of the stop codons (TAG, TAA, or TGA).

Below, I define the `DNA` variable we will be using.

First let's find where the start codon is in the DNA sequence:

Using the position of the start codon, we will now look for a stop codon in the same reading frame, by looping in steps of three.

In [None]:
for i in range(start, len(DNA), 3):

  

With the start and end of the ORF, let's look at only that ORF region of the DNA.

Again, let's wrap all this code into a function.

In [None]:
def first_ORF(DNA):

  return DNA[start:end]

And let's try the function out!