# Part 1: Python basics.

The first piece of code anyone ever learns to write:

In [1]:
print("Hello World")

Hello World


First, we can use Python as a glorified calculator:

In [2]:
5*7

35

In [3]:
5+7

12

But more importantly, we may want to save values as *variables* in order to use them repeatedly. 

In [4]:
a = 5
b = 0.5
a*b

2.5

In [5]:
a+b

5.5

We can also save values other than numbers. Specially, a sequence of characters can be saved as a variable known as a *string*. Strings are denoted with the quotation marks. 

In [6]:
DNA1 = "ATAGCTAG"
DNA2 = "TGATCAGT"
DNA3 = DNA1 + DNA2
DNA3

'ATAGCTAGTGATCAGT'

# Part 2: Finding the complement of a DNA sequence.

Let's now write our first piece of code to return the complement of a given DNA basepair.

In [7]:
bp = "A"

if bp == "A":
  ans = "T"

if bp == "T":
  ans = "A"

if bp == "C":
  ans = "G"

if bp == "G":
  ans = "C"

ans

'T'

While this is nice, we may want to get the reverse complement of an entire DNA sequence, and not just a single basepair. We will use a structure known as a `for` loop, which allows us to do the same action over and over again. 

In [8]:
DNA = "ATCGACTACGACATGACTACTAGACATCATCGCATATAGAGCAT"

ans = ""

for bp in DNA:
  if bp == "A":
    ans = ans + "T"

  if bp == "T":
    ans = ans + "A"

  if bp == "C":
    ans = ans + "G"

  if bp == "G":
    ans = ans + "C"

ans

'TAGCTGATGCTGTACTGATGATCTGTAGTAGCGTATATCTCGTA'

Lastly, we should wrap all this code into a *function*, so that we can easily use this code over an over again.

In [9]:
def complement(DNA):
  
  ans = ""

  for bp in DNA:
    if bp == "A":
      ans = ans + "T"

    if bp == "T":
      ans = ans + "A"

    if bp == "C":
      ans = ans + "G"

    if bp == "G":
      ans = ans + "C"

  return ans

We can now *call* this function below, to make sure it works with different input:

In [10]:
complement("ATCAGCTA")

'TAGTCGAT'

In [11]:
complement("AGCTCAGTTA")

'TCGAGTCAAT'

In [12]:
complement("TTTCCCGGGAAA")

'AAAGGGCCCTTT'

# Part 3: Finding an open reading frame (ORF).

As another DNA-related task, we may want to find an open reading frame (ORF) in a given DNA sequence. And open reading frame is a segment of DNA that begins with a start codon (ATG) and ends with one of the stop codons (TAG, TAA, or TGA).

Below, I define the `DNA` variable we will be using.

First let's find where the start codon is in the DNA sequence:

In [13]:
for i in range(len(DNA)):
  if DNA[i:i+3] == "ATG":
    start = i
    break
  
start

12

In [14]:
DNA[start:]

'ATGACTACTAGACATCATCGCATATAGAGCAT'

Using the position of the start codon, we will now look for a stop codon in the same reading frame, by looping in steps of three.

In [15]:
for i in range(start, len(DNA), 3):
  
  codon = DNA[i:i+3]

  if codon == "TAG":
    end = i+3
    break

  if codon == "TAA":
    end = i+3
    break

  if codon == "TGA":
    end = i+3
    break

end

39

With the start and end of the ORF, let's look at only that ORF region of the DNA.

In [16]:
ORF = DNA[start:end]
ORF

'ATGACTACTAGACATCATCGCATATAG'

In [17]:
len(ORF)

27

Again, let's wrap all this code into a function.

In [18]:
def first_ORF(DNA):

  for i in range(len(DNA)):
    if DNA[i:i+3] == "ATG":
      start = i
      break
  
  for i in range(start, len(DNA), 3):
    
    codon = DNA[i:i+3]
    
    if codon == "TAG":
      end = i+3
      break

    if codon == "TAA":
      end = i+3
      break

    if codon == "TGA":
      end = i+3
      break

  return DNA[start:end]

And let's try the function out!

In [19]:
ORF = first_ORF(DNA)
ORF

'ATGACTACTAGACATCATCGCATATAG'

In [20]:
len(ORF)

27