In [None]:
# initialization
from IPython.display import Image
from IPython.core.display import HTML 

## opening and closing files in python

### open() and close() are built-ins for opening and closing files.
### we have to open files to get their contents into memory for processing.
### we have to close files once we've finished with them to prevent them
### from being accidentally overwritten or corrupted.
### however, if we open a file using the with-clause, we don't have to worry about closing it.
### the interpreter will do that once it reached the end of the with-clause's block of code.

In [None]:
# partial code
# to be completed in class
with open('../dataset/reformatted_Unigene1000.fa', 'r') as inFile:
    LoS = inFile.read().splitlines() # method chaining

In [None]:
%whos

## python's half-open interval

In [None]:
Image("number_line_1_10.jpg", width=500, height=100)

### one may believe that counting is child's play.
### and, in a sense, it is.
### however, in programming, as in mathematics, there is more to counting than first meets the eye.

### for example, if we count in just the natural numbers, should we include or exclude zero?

### or, if we count an interval in just the natural numbers, should we include or exclude the endpoints?

In [None]:
Image("EWD831.jpg", width=500, height=100)

<p>Four ways to define an interval, from Edsger Dijkstra's ``Why Numbering Should Start From Zero''. <it>a</it> and <it>b</it> are half-open intervals, <it>c</it> is an open interval (the interval includes its endpoints); and <it>d</it> is a closed interval (it excludes its endpoints).</p> <p>As Dijkstra points out, when programmers were allowed to choose how to define intervals to work with, those who defined their intervals like the half-open interval <it>a</it>, they made fewer mistakes than when using the other three.</p> <p>Python uses this half-open interval for range() and slicing operations.</p>

In [None]:
# show the range() function and its options.
range(5)

In [None]:
# to visualize slicing, we imagine that the indices are shifted a half-step to the left of their values

In [None]:
# show slicing using strings and the LoL

In [None]:
# algorithm: converting codons in nucleotide string into amino-acid strings
Image("codon2amino.jpg", width=500, height=100)

In [None]:
# algorithm: converting codons in nucleotide string into amino-acid strings
# our solution requires a dictionary of codon:amino-acid pairs

# first, let's convert a text file of codon/amino-acid pairs into a dictionary
def makeC2ADict(filepath: str) -> dict:
    with open(filepath, 'r') as inFile:
        LoS = inFile.read().splitlines()
    c2aDict = dict() # initialization
    for line in LoS:
        key, value = line.split('\t')
        key, value = key.strip(), value.strip()
        c2aDict[key] = value
    return c2aDict

# now let's pass the dictionary in with the nucleotide string
def codon2amino(nuclStr: str, c2aDict: dict) -> tuple:
    ##### begin initialization #####
    # because the codons in the text file are upper-case,
    # all nucleotide strings passed in made upper-case
    nuclStr = nuclStr.upper()
    # any 1 of 3 amino-acid strings can be made from an arbitrary
    # nucleotide string.
    rf0, rf1, rf2 = '','',''
    ##### end initialization #####
    for index in range(len(nuclStr)-2):
        # complete in class
        pass # dummy line
    return rf0, rf1, rf2


In [None]:
c2aDict = makeC2ADict('../dataset/codon_amino_tabs.txt')
codon2amino('GATTACAGATTACA', c2aDict)