# PyRosetta Python Lecture #

# Setting up with Python

We're going to use Python 3 and [Jupyter](https://jupyter.org/) notebooks this week.  To start up a notebook, from a terminal, `cd` to the directory you want to be in, and start up a notebook server using `jupyter notebook`.

If you're using a terminal, you may also want to try `ipython` or `ipython3` which will give you things like tab-completion and syntax highlighting

You should have already installed the necessary items here based on Shourya's email - if things aren't working, just follow along with a partner for now.

# Topics #

* Control Flow
* String operations
* Functions
* Data structures: lists, dicts, when to use what
* Classes
* Numpy
* Plotting
* PEP-8
* Virtual environments

# Basics

### Booleans, comparison/ arithmetic operators, strings, control flow ###

Here are some basic string operations:

In [2]:
str1 = "String1" 
str2 = "String2"
print(str1 + str2)
str1 = "float %.2f %.3f" % (1.0, 3.141592)
print(str1)
str2 = "integer %d" % 2
print(str2)

String1String2
float 1.00 3.142
integer 2


Control flow with: **if**, **else**, **for**, **while**

In [None]:
x = 25
if (x < 15):
    print("Less")
    x += 1
elif (x < 20):
    print("More")
else:
    print("Most")

**for** loops are good for iterating through a range of values, or for going through lists (later)

In [None]:
print("loop 1")
for i in range(5):
    print(i)
    
print("\nloop 2")
for i in range(10,2,-2):
    print(i)

**while** loops are good when the iteration pattern isn't as simple

In [None]:
i = 1
while i < 100:
    if i < 10:
        i += 2
    elif i % 3 == 1:
        i += 4
    else:
        i += 20
    print(i)

**continue** for passing a loop iteration, **break** for skipping the rest of a loop

In [None]:
for i in range(10):
    if i % 4 == 0:
        continue
    print("Current number: {0}".format(i))

In [None]:
for i in range(10):
    if i % 4 == 0:
        continue
    print("Current number: {0}".format(i))
    for j in range(i % 4):
        if j > 1:
            break
        print("Counting remainders: {0}".format(j + 1))

Various installed **modules** give you additional functionality when you **import** them. One very useful one is `math`.

In [None]:
import math

math.sqrt(50)

Here's a useful one for string manipulation you used on the homework: `re`. 

Useful functions include searching for patterns in strings and substituting patterns. To find particular patterns in strings, you use **regex** searches.

In [3]:
import re 

text = "What are we hiding from this message?"

print(re.sub(r"r", '_', text))
print(re.sub(r" h(\w+)", ' _', text))
print(re.sub(r"(\w+)r(\w+)", '_', text))

print(re.match(r"hiding", text))
print(re.search(r"hiding", text))

What a_e we hiding f_om this message?
What are we _ from this message?
What _ we hiding _ this message?
None
<re.Match object; span=(12, 18), match='hiding'>


Another very useful one is `antigravity`.

In [None]:
import antigravity

# Functions

In [None]:
x = 25
if (x < 15):
    print("Less")
elif (x < 20):
    print("More")
else:
    print("Most")

x = 10
if (x < 16):
    print("Lest")
elif (x < 20):
    print("More")
else:
    print("Most")

x = 18
if (x < 15):
    print("Less")
elif (x < 20):
    print("More")
else:
    print("Most")

**Functions** improve readability, conciseness, and code maintenance

In [None]:
def double(x):
    return 2 * x

double(2)

In [None]:
def less_more_most(x):
    if (x < 15):
        print("Less")
    elif (x < 20):
        print("More")
    else:
        print("Most")

less_more_most(25)
less_more_most(10)
less_more_most(18)

### Scope ##
Which version of a variable does a function recognize? What is the scope of a function's changes?

In [None]:
x = 10
a = "A"
def less_more_most(x, y):
    if (x < 15):
        print("Less")
    elif (x < 20):
        print("More")
    else:
        print("Most")
    print(a)
    x = 15
    y = 30

y = 20
less_more_most(25, y)
less_more_most(10, y)
less_more_most(18, y)
print(x)
print(y)

### More advanced functions: ###
  * Using functions as variables
  * Variable number of arguments: ***argv** and ***kwargs**
  * **lambda functions**

In [4]:
def mult(*argv):
    ans = 1
    for arg in argv:
        ans *= arg
    return ans

def remainder(x, y):
    return x % y

def double(x):
    return 2 * x
        
def compose(a, b, *argv):
    return a(b(*argv))

print(compose(double, mult, 2, 3, 4))
print(compose(double, double, 2))
print(compose(double, remainder, 5, 3))

48
8
4


In [6]:
my_tups = [(4,8), (3,5,89), (1,2)]
sorted_tups = sorted(my_tups, key = lambda x: x[1])
print(sorted_tups)

[(1, 2), (3, 5, 89), (4, 8)]


In [7]:
import re

def make_subst(find, repl):
    return lambda text : re.sub(find, repl, text)

dna_conv = make_subst("U", "T")
rna_conv = make_subst("T", "U")

print(dna_conv("AAUCGUUA"))
print(rna_conv("AATCGTTA"))

AATCGTTA
AAUCGUUA


# Data Structures

### Lists ### 


In [None]:
a = [1, 2, 3] # dynamic array
print(a)

Iterating over lists and accessing list elements:

In [None]:
for value in a:
    print(value)
    
print(a[0])

You can append to lists using `.append()` or by doing list concatenations.

In [None]:
a.append(5)
print(a)

b = a + [10]
print(a)
print(b)

Other operations include `push()`, `pop()`, `insert()`, etc.

In [None]:
while len(a) > 0 :
    print(a.pop())
print(a)

A mistake that can be easy to make...

In [None]:
a = [1, 2, 4, 5]
for value in a:
    if value % 2 == 0:
        a.remove(value)
    print("Current value: {0}".format(value))
print(a) # Why did this not get rid of all the even values in the list?

### List Comprehensions ###

Python's list comprehensions let you create lists based on other lists.

In [8]:
a = [1, 2, 4, 5]
b = [x for x in a if x % 2 != 0]
print(b)

[1, 5]


You can achieve something similar to set notation.

$$ S = \{ x ~\mid~ 0 \le x \le 20, x\mod 3 = 0\}$$

In [9]:
S = [x for x in range(21) if x % 3 == 0]
S

[0, 3, 6, 9, 12, 15, 18]

# List comprehensions can elegantly solve problems that might otherwise need a lot of for loops.

In [10]:
S = [(i,j,k) for i in range(2) for j in range(2) for k in range(2)]
S

[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

### Other Collections: Dictionaries, Deque, Sets ###

Other collections exist to hold data in various ways. We'll discuss dictionaries and sets.

In [11]:
animal_legs = {"snake": 0, "dog": 4, "duck": 2}

In [13]:
snake_legs = animal_legs["snake"]
print("snake: {0},{1}".format(snake_legs, 23))

snake: 0,23
snak


In [14]:
animal_legs["centipede"] = 100
for k, v in animal_legs.items():
    print("{0}: {1}".format(k, v))

snake: 0
dog: 4
duck: 2
centipede: 100


In [15]:
'hello, %(centipede)d' % animal_legs

'hello, 100'

`defaultdict` lets you elegantly avoid casework for setting initial values for keys.

In [None]:
from collections import defaultdict

count_aa = defaultdict(int) # initialize value to be 0 count_aa == 0

seq = "WKKAAGKLKW"
for aa in seq:
    count_aa[aa] += 1
print(count_aa["W"])

`set` is a container that holds unique values. You can make a set out of a list. Note: order is not preserved, will be sorted

In [None]:
a_list = [1, 2, 6, 1, 3, 2, 4]
a_set = set(a_list)
print(a_list)
print(a_set)

### Data Structure Decision Making ###

Compare based on properties of how values are stored in each collection.

  * `Lists`: ordered, indexed, mutable, duplicates

  * `Dict`: unordered, indexed, mutable, duplicates

  * `Deque`: ordered, unindexed, mutable, duplicates

  * `Set`: unordered, unindexed, immutable, no dupicates

  * `Tuple`: ordered, indexed, immutable, duplicates

Compare based on how efficient and easy basic operations are with each collection:

  * Get an item by index: `Lists` and `Dict` beat `Set` and `Deque`
  * Remove the first item in a sorted collection: `Deque` is better than `Lists`
  * Check if an item is in the collection: `Set` and `Dict` beats `Lists`
  * `Set` gives access to basic set operation implementations

**Question 1:** You're the manager of a cluster, assembling a list of Rosetta job submissions. Rosetta is insanely popular, and your list of tasks is very very long. You're going to send off tasks in the order they were submitted. What collection are you using to store the requests?

**Question 2:** You've got a blacklist of bad words. You're going to go through the text of the Odyssey, and do some text processing on each word but only if it's not bad. How do you want to store the list of bad words?

**Question 3:** You're tallying votes in a recent election for 3 candidates by voter age group ("Young", "Middle", and "Old"). You want to later be able to look up the votes by age group and candidate: for instance, "Young" and "Candidate Martha". How do you store the vote counts?

# Classes

Everything is an object in Python - some variables stored in memory somewhere, you can apply functions to them. `id` gets the memory location. `is` checks if two objects are the same.

In [None]:
x = "hi"
print(isinstance(x, object))
print(id(x))
z = x
print(z is x)

You can make your own types of objects by making a **class**. You then make objects that are **instances** of your class, and give your class some functions (called **methods**).

In [None]:
class Teacher:
    def __init__(self, name = "Mrs. Boring"):
        self.name = name
        
    def __str__(self):
        return "Hi I'm " + self.name
    
    def grade_homework(self):
        print("Grade inflation for all! A.")

teacher = Teacher()
print(teacher)
teacher = Teacher("Ramya")
print(teacher)
teacher.grade_homework()

This is useful because you are encapsulating member functions so others don't need to know how they work, and you're not cluttering the global namespace. 

You can also have classes that **inherit** from one another. The base class is the parent class, and classes that inherit from it are child classes.

In [None]:
class Teacher:
    def __init__(self, name = "Mrs. Boring"):
        self.name = name
        
    def __str__(self):
        return "Teacher: " + self.name
    
    def give_candy(self):
        raise NotImplementedError

teacher = Teacher()
print(teacher)
teacher.give_candy()

In [None]:
class NiceTeacher(Teacher):
    def __init__(self ):
        Teacher.__init__(self, "Mrs. Nice")
        
    def give_candy(self):
        print("Here's some candy!!!")

class MeanTeacher(Teacher):
    def __init__(self ):
        Teacher.__init__(self, "Mrs. Mean")
        self.ask_count = 0
    
    def give_candy(self):
        if self.ask_count > 1:
            print("Here's some candy!!!")
            self.ask_count = 0
        else:
            print("Not this time.")
            self.ask_count += 1

nice_teacher = NiceTeacher()
print(nice_teacher)
nice_teacher.give_candy()

mean_teacher = MeanTeacher()
print(mean_teacher)
mean_teacher.give_candy()
mean_teacher.give_candy()
mean_teacher.give_candy()
mean_teacher.give_candy()

In [None]:
all_teachers = [NiceTeacher(), NiceTeacher(), MeanTeacher(), NiceTeacher()]

for teacher in all_teachers:
    print(teacher)
    teacher.give_candy()

What we can achieve: *Polymorphism*, *class inheritance*, *encapsulation*

In [None]:
from abc import ABC, abstractmethod

class Teacher(ABC):
    def __init__(self, name = "Mrs. Boring"):
        self.name = name
        
    def __str__(self):
        return "Teacher: " + self.name
    
    @abstractmethod
    def give_candy(self):
        pass

teacher = Teacher()

In [None]:
class NiceTeacher(Teacher):
    def __init__(self ):
        Teacher.__init__(self, "Mrs. Nice")
        
    def give_candy(self):
        print("Here's some candy!!!")

class MeanTeacher(Teacher):
    def __init__(self ):
        Teacher.__init__(self, "Mrs. Mean")
        self.ask_count = 0
    
    def give_candy(self):
        if self.ask_count > 1:
            print("Here's some candy!!!")
            self.ask_count = 0
        else:
            print("Not this time.")
            self.ask_count += 1

nice_teacher = NiceTeacher()
print(nice_teacher)
nice_teacher.give_candy()

mean_teacher = MeanTeacher()
print(mean_teacher)
mean_teacher.give_candy()
mean_teacher.give_candy()
mean_teacher.give_candy()
mean_teacher.give_candy()

# Numpy / Scipy

`numpy` allows you to easily manipulate arrays of data. Its functions are implemented very efficiently!

For this section, if you don't have numpy installed, don't worry about running the code here.

### Making Arrays ###

In [17]:
import numpy as np

l = [1, 2, 3]
arr = np.array([1, 2, 3])
print(l)
print(arr) # Not a list!

arr2 = np.zeros((3,4))
print(arr2)

arr3 = np.zeros((2,5), dtype=int)
print(arr3)

[1, 2, 3]
[1 2 3]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[0 0 0 0 0]
 [0 0 0 0 0]]


`numpy` lets you easily make multi-dimensional arrays.

In [None]:
# 1-dimensional arrays
x = np.linspace(0,10,5) # linear spacing of points
print(x)
y = np.random.rand(5) # numbers betwen 0 and 1
print(y)
z = np.random.randn(5) # normal random variables
print(z)

In [None]:
# n-dimensional arrays
y = np.random.rand(10,10,5)
print(y.shape)

### Indexing Arrays ###

`x[i,j]` returns value at $i$th row and $j$th column of $x$

**slicing**

`x[i,:]` returns entire $i$th row

`x[:,j]` returns entire $j$th column

In [None]:
# slices
l = [1,2,3,4,5,6,7]
print(l[2:4])
print(l[-1])
print(l[6:])
print(l[-3:])
print()

x = np.arange(9)
x = np.resize(x, (3,3))
print(x)
print(x[0,:])
print(x[:,0])
print(x[0,1:])
print(x[0,-2:])

In [19]:
y = np.arange(10)
print(y)
print(y[0:10:2]) 
print(y[::2])
print(y[::-1])

[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]
[0 2 4 6 8]
[9 8 7 6 5 4 3 2 1 0]


### Functions on Arrays ### 

There are functions that are **vectorized** and can be applied to every item in an array. If you can figure out a way to process your data in this way, do it! It's faster than looping through large Python lists.

In [None]:
x = np.arange(4)
print(x)
y = np.square(x)
print(y)
print(x * 3)
print(np.dot(x, y))

You can make custom vectorized functions.

In [None]:
def f(x):
    y = x*x
    return y + 2

x = np.arange(4)
print(x)
vf = np.vectorize(f)
print(vf(x))

print(f(x)) # This function automatically vectorizes nicely

# Plotting basics #

In [None]:
import matplotlib.pyplot as plt

# Get a list of evenly spaced numbers from 0 to 1
x = [a/100. for a in range(100)]
y = [a**2 for a in x]
plt.plot(x, y)
plt.show()

In [None]:
import math

# Get a list of evenly spaced numbers from 0 to 2pi
theta = [a/100. for a in range(int(100*2*math.pi))]
x = [math.cos(a) for a in theta]
y = [math.sin(a) for a in theta]
plt.scatter(x, y)
plt.show()

In [None]:
theta = [a/100. for a in range(int(100*2*math.pi))]
x = [math.cos(a) for a in theta]
plt.hist(x)
plt.show()

This is really truly the bare bones. if you're interested, ggplot can make nice plots.

# PEP-8

### Code Style Exercise ###
The code below works but doesn't look great. What can we do to make it better? Think code style, not function.

In [None]:
class_bonus=50
class Teacher:
    def __init__( self , name="Mrs. Boring" ):
        self.name=name
        self.test1=95
        self.test2=97
    def __str__( self ):
        return "Teacher: "+self.name
    def GiveFeedback( self ):
        if self.test1>75: print("I'm giving lots of feedback because I'm a very good instructor and care a lot about my students.")
    def CalculateGrade( self ):
        score1                   = self.test1
        score2                   = self.test2
        bonus_for_being_in_class = class_bonus
        return (score1+score2+bonus_for_being_in_class)/3
teacher=Teacher()
print(teacher.CalculateGrade())
teacher.GiveFeedback()

### PEP-8 Basics ###
* Spacing: 4 space indents. Vertically align long expressions with hanging indents.
* Blank lines: separate functions by blank lines, and classes by two blank lines.
* Extra spaces should be avoided
* Include spaces between operators (including assignment).
* No space for function default argument setting.
* Indent if blocks, even if they are one-liners.
* Variable names: descriptive. A convention: class name with CapWords, constants with UPPERCASE, methods and functions are lowercase
* Line breaks if your line exceeds 80 characters

### Revised Code ###

In [None]:
BONUS=50

class Teacher:
    def __init__(self, name="Mrs. Boring"):
        self.name = name
        self.test1 = 95
        self.test2 = 97
    
    def __str__(self):
        return "Teacher: " + self.name
    
    def give_feedback(self):
        if self.test1 > 75: 
            print(("I'm giving lots of feedback because I'm a very "
                   "good instructor and care a lot about my students."))
    
    def calculate_grade(self ):
        score1 = self.test1
        score2 = self.test2
        bonus_for_being_in_class = BONUS
        return (score1 + score2 + bonus_for_being_in_class)/3
    
    
teacher = Teacher()
print(teacher.calculate_grade())
teacher.give_feedback()