# Week 1: Introduction to programming in Python

## 1. Introduction

Welcome to Bioinformatics 528!  Since almost all discussion sessions in this course depend on some level of familiarity with programming in Python, the first session will be an introduction to some basics of this programming langauge.  Python was a language first created in 1991 and has since become one of the most popular languages to date because of its common sense syntax and ease of implementation.  The Python language is:

1. **Interpreted:** there's no need to compile your written code into a binary executable file, the code is read directly by the Python interpreter.
2. **Implicitly Typed:** when you create variables to store values, you don't have to explicitly say what kind of data will be stored in that variable because Python will detect it automatically.
3. **High-level:** the syntax of Python code is far removed from the actual commands that are being processed by the computer, which makes it much easier to read and allows several convenient built-in features, such as error handling.
4. **Whitespace-dependent:** Instead of using braces to delinate code, Python uses indentation.

The combination of these features make it a good language for first-time programmers, as its syntax is easy to read and write, and it enforces good coding style practices (e.g. proper indentation).  However, it's not only a beginner's language, as it's ease of use and availability of well-documented packages have made it <a href="http://pypl.github.io/PYPL.html">the most popular programming language of the current age</a>.  The following notebook will introduce you to basic concepts of programming using Python syntax, as well as a few Python-specific tips and tricks.

## 2. Programming basics: data types, operations, and lists

Programming mainly consists of the automated manipulation of some sort of data.  Several of the most basic types of data are described:

In [3]:
#Statements with a "#" in front are comments, which the interpreter doesn't read

#Integers are positive and negative whole numbers
i1=6 #"=" will assign some value (on the right) to a variable name (on the left)
i2=-3

#print will output the specified data to the console
#Note: in Python 2, the parentheses aren't necessary, but they are in Python 3
print(i1) 
print(i2)

6
-3


In [4]:
#Floating point numbers (a.k.a "floats") are positive and negative decimal numbers
f1=3.14159
f2=1e-4

print(f1)
print(f2)

3.14159
0.0001


In [5]:
#Booleans only have two possible values: True or False
b1=False
b2=(i1==i1) #more on generating a boolean like this in the next section

print(b1)
print(b2)

False
True


In [6]:
#Strings are a series of characters, indicated by single or double quotes ("" or '')
s1="Hello World"
s2="6"

print(s1)
print(s2)

Hello World
6


In [7]:
#Data types can be converted between each other
i3=int(f1) #notice that floats are always rounded down when converted to integers
f3=float(s2) #strings can only be converted to numbers if they "look like" numbers
s3=str(f2) #all basic data types can be converted into a string
b3=bool(i2) #an integer value of 0 is False, nonzero is True

print(i3)
print(f3)
print(s3)
print(b3)

3
6.0
0.0001
True


In [8]:
#Typical mathematical operations can be performed between numbers
#If floats are involved, the output will be a float
i4=i1+i2 #addition/subtraction
f4=f1*f2 #multiplication/division
i5=i2%i1 #modulo operations
f5=f4**2 #exponentiation

print(i4)
print(f4)
print(i5)
print(f5)

#Strings can be fused, or "concatenated", by the + sign
print("Hello"+"World")

3
0.000314159
3
9.8695877281e-08
HelloWorld


In [9]:
#Multiple values can be stored in lists (also referred to as arrays)
l1=[2,5,8,20]

#Values of a list are then referenced by their position ("index") in the list, starting at zero
print(l1[2]+l1[0])

#Lists are mutable, which means the entries can be changed
l1[3]=10
print(l1[3])

10
10


In [10]:
#In Python, lists can consist of different data types
l2=[3,4.5,False,"String"]

#Sub-lists can be generated by "slicing", where a range of indexes are referenced
#Slicing syntax is list[start:stop] where start is inclusive and stop is exclusive
print(l2[1:3])

#A list can even contain lists, creating a multidimensional list
mdl=[[1,2],[3,4]]
print(mdl[0])
print(mdl[1][0])

#Strings can also be thought of as lists of characters, so slicing works on them too
string="Hello World"
print(string[:5]) #If the starting index is 0, it can be omitted
print(string[6:]) #If the end index is the length of the list, it can be omitted

#Similar to strings, lists can also be concatenated using the + sign
print(["a","b"]+["c","d"])

[4.5, False]
[1, 2]
3
Hello
World
['a', 'b', 'c', 'd']


In [11]:
#Miscellaneous common Python functions

#len(list) gives the number of elements in a list
print(len(l2))

#range(n) will give you a list of integers from 0 to n-1
#Note: in Python 3, this returns an "iterator" not a list, but can be be made a list through interpolation (see sct 4)
print(range(10))

#a.split(b) will split the string a into a list of strings using the string b as a separator
string="this,is,a,string"
print(string.split(","))

#a.join(b) will join the list of strings b into a single string using string a as a separator
print("/".join(["this","is","a","string"]))

#a.append(b) will add the element b to the end of list a
a=["a","b"]
a.append("Hello")
print(a)

#a.pop() will remove and return the last value of list a
print(a.pop())
print(a)

#a.index(b) will return the index of the first occurence of element b in list a
print(a.index("b"))

4
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
['this', 'is', 'a', 'string']
this/is/a/string
['a', 'b', 'Hello']
Hello
['a', 'b']
1


## 3. Code modularization: control flow, functions, and loops

In order to start automating actions, a method for choosing which code to run is required.  If/else statements allow specific code to be run if a given condition is true or false.  If the condidtion is true, the code under "if" will run.  Otherwise, the code under "else" will run.  This is where booleans gain their importance, as these statements are controlled by boolean values.  Booleans in if/else statements are typically generated through comparison statements which compare two values and return whether the comparison is true or false.  For example:

In [12]:
#True statements
1==1 #"==": is equal to
3 < 6 #"<": is less than
5 in [1,2,3,4,5] #"in": the element is in the list

#False statements
1!=1 #"!=": is not equal to
3 >= 6 #">=": is greater than or equal to
not 5 in [1,2,3,4,5] #"not": makes True statements False, and False statements True

False

One can check multiple conditions at once through the boolean logic operators "and" and "or", which will return True if all statements are true, or if one statement is true, respectively.  Predict what the following statements will return before running.

In [13]:
print(1==1 and 2==1)
print(1==1 or 2==1)
print("Car" in "Carmen Sandiego")
print((True or False) and (True or True))
print((not True or False) or not (True or False))

False
True
True
True
False


Following are some example if/else statements.  The pattern of indentation is essential for Python to understand what code is part of the if/else statements; all code belonging to the if or else should be indented at least once relative to the indentation of the statement.  In addition, it's required that this indentation be consistent across all lines (i.e. using both tabs and spaces to indent code will create errors).  Given this information, predict what the output of the code will be before running.

In [14]:
a=1
b=-1

if a < b:
    a+=b #This is shorthand for a=a+b
else:
    b=a
b=3

if a <= 0 and b <= 0:
    a=0
    b=0
else:
    a-=b #Again, shorthand for a=a-b
    
if a < 0 or b < 0:
    print("A")
else:
    print("B")

A


Another important step to achieving automation is being able to repeat the same actions under varying conditions.  One way of doing this is to store a set of actions as a function, which can take in a number of "arguments" (i.e. values passed in to the function) and return an output value (or return nothing at all).  Defined below is a function that will return the position (starting at 0) of a letter in the alphabet:

In [15]:
#Definition of the function, which sets its name, what arguments it needs, and what it does
def alphabetIndex(letter):
    alphabet="abcdefghijklmnopqrstuvwxyz"
    i=alphabet.index(letter)
    print("The letter "+letter+" is letter #"+str(i)+" in the alphabet")
    return i

#"Calls" of the function, which asks the interpreter to run the function with the supplied argument
alphabetIndex("q")
print(alphabetIndex("n"))

The letter q is letter #16 in the alphabet
The letter n is letter #13 in the alphabet
13


However, functions don't exist just as a point of convenience for the programmer, they also help a great deal with other people being able to understand your code.  By segmenting your code into separate functions (i.e. "modularizing" your code), the overall architecture of your program becomes clearer to outside viewers.  As a rule of thumb, you should never copy-paste code; if you find yourself doing this, it's better to wrap that code in a function.  However, beware over-modularizing, as it's equally frustrating to read code that requires you to jump around a file to find specific steps of the process.

Despite the fact that functions allow chunks of code to be run repeatedly, the function must still be called each time the code is to be run.  To automate this process, we need loops, which repeat some code until a condition is met.  The two main types of loops are "for" loops and "while" loops.  Both types of loops can be rewritten as the other, but as a rule of thumb, "for" loops are used when the amount of repetitions is known (e.g. perform an action 10 times, do something to each element in an array, etc.), and "while" loops are used when the amount of repetitions is unknown (e.g. continue until a quit signal is given, search until a specific element is found, etc.).  A few examples of these loops are shown below.

In [16]:
letters=["a","b","c"]

#Python for loops perform some action for each element in a list
for letter in letters:
    print(letter)
    
#If something needs to be done a specified number of times, use the range(n) function
#If you're familiar with C/C++, the following loop is equivalent to "for (i=0;i<3;i++)"
for i in range(3):
    print("Hello")

#If you need both the index and the list element, you can do this in two ways:
#The first is to loop using the index and reference the list element by index
for i in range(len(letters)):
    print(i,letters[i])

#The other is to use the "enumerate" function, which returns both the index and element
for i,letter in enumerate(letters):
    print(i,letter)

a
b
c
Hello
Hello
Hello
(0, 'a')
(1, 'b')
(2, 'c')
(0, 'a')
(1, 'b')
(2, 'c')


In [17]:
#While loops continue looping while some condition is true
i=-3
while i < 0:
    print(i)
    i+=1 #This is equivalent to "i++" in C, but the "++" operator doesn't exist in Python
    
i=0
while letters[i] != "c":
    print(letters[i])
    i+=1

#While loops are more dangerous than for loops because they open the possibility of infinite looping
#Always ensure that there's a way out of the loop, otherwise you'll loop forever!
#(Press the stop button in Jupyter to stop the loop once you're convinced it's in fact infinitely looping)
while True:
    print("Looping")

-3
-2
-1
a
b
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Loo

Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping
Looping


KeyboardInterrupt: 

In [18]:
#As previously stated, for loops can be written as while loops, and while loops can be written as for loops
#The following two loops do the same thing, but is better written as a for loop
for i in range(5):
    print("Hello")

i=0
while i < 5:
    print("Hello")
    i+=1


#These two loops also do the same thing, but is better written as a while loop
bools=[True,True,True,False,True]

for i in range(len(bools)):
    if not bools[i]:
        break #"break" exits the current loop immediately
    else:
        print(i)
        
i=0
while bools[i]:
    print(i)
    i+=1

 Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
0
1
2
0
1
2


With the information provided thus far, you should be able to create basic programs.  One such classical example is the "fizzbuzz" program, which prints out numbers from 0 to n but replaces the number with "fizz" if the number is divisible by 3, "buzz" if the number is divisible by 5, and "fizzbuzz" if the number is divisible by both 3 and 5.  Write a function *fizzbuzz* which takes in the argument n and prints the fizzbuzz pattern as described. (Hint: if a number x is divisible by y, then "x mod y" is equal to zero).

In [19]:
def fizzbuzz(n):
    for i in range(n):
        if i%3==0:
            if i%5==0:
                print("fizzbuzz")
            else:
                print("fizz")
        elif i%5==0:
            print("buzz")
        else:
            print(i)
            
fizzbuzz(100)

fizzbuzz
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
16
17
fizz
19
buzz
fizz
22
23
fizz
buzz
26
fizz
28
29
fizzbuzz
31
32
fizz
34
buzz
fizz
37
38
fizz
buzz
41
fizz
43
44
fizzbuzz
46
47
fizz
49
buzz
fizz
52
53
fizz
buzz
56
fizz
58
59
fizzbuzz
61
62
fizz
64
buzz
fizz
67
68
fizz
buzz
71
fizz
73
74
fizzbuzz
76
77
fizz
79
buzz
fizz
82
83
fizz
buzz
86
fizz
88
89
fizzbuzz
91
92
fizz
94
buzz
fizz
97
98
fizz


## 4. Advanced Python techniques: dictionaries, list interpolation, and error handling

Now that we've covered a foundational understanding of programming topics, there are a few features in Python that are convenient to know.  The first of these topics is dictionaries (a.k.a. "maps" or "hashes" in other languages), which are unordered sets of key-value pairs.  In other words, a dictionary is like a list but each value is referenced by some key (usually a string) rather than an index.  The dictionary is "unordered" because the order of key-value pairs is not preserved; only the association between key and value is stored.  Dictionaries are best used for lookup operations, such as counts for how many times a specific amino acid motif occurs, or the single amino acid letter for a three letter amino acid label.

In [20]:
#Dictionaries are declared using curly braces, and filled with key:value entries
d={"A":1,"B":2,"C":3}

#The values can be referenced in a similar fashion to lists, using the key in place of the index
print(d["B"])

#A list of all keys can be generated using the keys() function; notice how they're not in order
for i in d.keys():
    print(i,d[i])

2
('A', 1)
('C', 3)
('B', 2)


Next is list interpolation, which allows you to create a list from another list in one line using a for loop.  This feature can be thought of as similar to elementwise vector operations.  While this feature isn't essential to being able to program in Python, it's extremely convenient when applied properly.  A few examples of list interpolation are below:

In [21]:
#The syntax for interpolation is the entries of the list, followed by the for loop, all enclosed in brackets
intList=[i for i in range(10)] #This is range(10) in Python 2, but needs to be done for a list of numbers in Python 3
print(intList)
sqList=[i**2 for i in intList]
print(sqList)

#An if statement can be added after the for loop to restrict which entries are generated
sumList=[intList[i]+sqList[i] for i in range(len(intList)) if intList[i] > 4]
print(sumList)

#Even multidimensional lists can be generated in one line through nested interpolation
multiList=[[i*j for j in range(10)] for i in range(10)]
for row in multiList:
    print(row)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[30, 42, 56, 72, 90]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36]
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
[0, 6, 12, 18, 24, 30, 36, 42, 48, 54]
[0, 7, 14, 21, 28, 35, 42, 49, 56, 63]
[0, 8, 16, 24, 32, 40, 48, 56, 64, 72]
[0, 9, 18, 27, 36, 45, 54, 63, 72, 81]


Given that the *sum(x)* function returns the sum of all elements in the list _x_, write a function *cubeAvg* that takes in some positive integer <em>n</em> and returns the mean value of all cubes from 0 to *n*.  (Hint: if you use list interpolation, this should only be 2 lines: 1. making the list 2. calculating and returning the average.  Also, keep in mind that math involving only integers will return only integers, so you will need to convert something to a float to get an accurate non-rounded number.)

In [22]:
def cubeAvg(n):
    cubes=[i**3 for i in range(n+1)]
    return float(sum(cubes))/len(cubes)

cubeAvg(5)

37.5

Another convenient feature of Python is its robust system of reporting and handling errors as they occur in the script.  Since the end user usually does not know how the program is built, a well-made program should anticipate any  unexpected inputs or situations that could arise without crashing the entire program.  One common source of errors is attempting to reference an element of a list with an index beyond the boundaries of that list: 

In [23]:
smallList=[i for i in range(5)]
smallList[20]

IndexError: list index out of range

To handle this error, as well as any errors that may arise, we should encapsulate the potentially risky code in a "try/except" block, which attempts to run the code belonging to "try", but ceases execution and runs the code belonging to "except" instead if an error is detected.  For example:

In [34]:
try:
    print(smallList[2])
    print(smallList[20])
    print(smallList[2])
except:
    print("List index is out of bounds")

2
List index is out of bounds


There is more to error handling than is described here, such as printing the error message or establishing separate "except" blocks for different types of errors.  More information can be found in the <a href="https://docs.python.org/3/tutorial/errors.html">Python documentation</a>.

## 5. Outside of Python: packages and file I/O

Programming in a Python environment alone can do a great deal of work, but Python's real power is derived from its plentiful functionality expansions and interfacing with files in the operating system.  Expanding what Python can do necessitates the import of "packages", or external sets of pre-written functions and constants that can be called in your own program.  The Python language itself contains several built-in packages that can be imported when necessary; packages beyond the default set can be installed using <a href="https://pypi.org/project/pip/">pip</a>.  The <a href="https://www.anaconda.com/">Anaconda distribution</a> of Python comes with almost any package one would need to build effective programs and perform deep data analyses, as well as easy means of installing new packages and reconciling the versions of packages (which almost every Python programmer can tell you is much more of a chore than it initially sounds).  In order to import packages, one should use the following syntax:

In [32]:
#This syntax imports the entire package; anything referenced from it must first reference the package name
import os
print(os.getcwd()) #A dot means that the element after the dot "belongs to" the element before the dot

#You can customize how the package is referred to by using the following syntax
import subprocess as s
print(s.STDOUT)

#If you don't want to have to reference the package, you can import specific elements
from sys import argv
print(argv[0])

#Or you can import everything using a wildcard (*) character
#However, this is not recommended as the elements in the package might be the same name as something in your script
from math import *
print(pi, e, sqrt(16))

/Users/ewbell/Bioinf528D/keys
-2
/Users/ewbell/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py
(3.141592653589793, 2.718281828459045, 4.0)


The details of each built-in package and the elements within can be found in the <a href="https://docs.python.org/"> Python documentation</a>.  Some external packages that are commonly used (and will likely be used in this course) include <a href="https://docs.scipy.org/doc/numpy/">numpy</a>, <a href="https://docs.scipy.org/doc/scipy/reference/">scipy</a>, <a href="https://matplotlib.org/3.1.1/contents.html">matplotlib</a>, and <a href="https://scikit-learn.org/stable/documentation.html">scikit-learn</a>.

The goal of programming is to automate the processing of large amounts of data so that they can be analyzed efficiently.  However, this efficiency is lost if the user is required to enter the values manually every time the script is run. Therefore, it's necessary for Python to be able to parse data from files and write data to files (i.e. file input/output, or "I/O"), rather than depending on the user to supply the data on runtime.  Python can open files in three ways:

1. **Read ('r')**: This is the default mode when opening a file in Python; if no mode is specified, then the specified file will be opened to be read.  When a file is opened to be read, Python is capable of pulling data from it in a line by line manner.  In this mode, it is required that the file to be read exists, otherwise an error will be thrown.  It's good practice to encapsulate the opening of a file in read  mode in a "try/except" block if there's any possibility the file to be referenced is not present.
2. **Write ('w')**: This mode creates a file to be written to if the specified file does not exist, or overwrites the  contents of the file if it does exist.  Using write mode, data can be stored in a file in a similar way to how data can be printed to the console.  However, the main technical difference between writing to a file and printing to the console is that the "print()" function automatically puts newline characters ('\n') at the end of any string it prints.  When writing to a file, you must explicitly write these characters, otherwise all that you've written will be on one single line.
3. **Append ('a')**: Append mode works similarly to write mode, but instead of overwriting data that's already stored in the file if it already exists, the new data will be appended to the end of the file.

When a file is opened, the location of that file is stored in the computer's memory until it is explicitly closed again.  Therefore, after a file is done being referenced, it should be closed so that its location isn't taking up memory space.  Neglecting to do this for one or two files will typically not create many issues, but if file opening is part of a loop, the amount of open files will build up and flood the computer's memory.  A few examples of how to handle files can be found below:

In [31]:
#When opening a file, one must call the "open()" function with the name of the file and the opening mode
f=open("out.txt","w")
for i in range(10):
    f.write(str(i)+"\n")
f.close()

#If no mode is specified, "r" mode is assumed
f=open("out.txt")
for line in f: #The lines of a file can be iterated with a for loop, but it is NOT a list
    print(line.strip()) #string.strip() removes whitespace and newlines from the ends of a string

#Once the lines of a file are read, they can't be read again unless you reset f to the top of the file
#seek(i) returns f to the ith line of the file, therefore seek(0) returns to the top
f.seek(0)

lines=f.readlines() #This reads the lines of the file into a list
f.close()

numbers=[int(line) for line in lines]
print(numbers)

0
1
2
3
4
5
6
7
8
9
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


## 6. Conclusion and Further Study

Given this introduction to programming in Python, you are well prepared for the discussion activities ahead.  However, this assignment is far from a comprehensive view of all the features present in Python.  For example, this tutorial doesn't include Object-Oriented Programming (OOP), which allows the programmer to create "objects" according to some template "class".  In addition, Python packages will only be covered as they are relevant to the projects ahead; only a small portion of the utility of these packages will be covered.  Therefore, there still remains plenty of Python knowledge to be discovered through independent study.  Finally, to do research in any computational field, it's beneficial to be able to read and write code in many different languages.  For example, Python is a high-level language, which makes it easy to read and write but slow to execute.  Therefore, it would be beneficial to also understand how to write in a low-level language (such as C/C++) for cases where calculations need to be performed efficiently.