# Python Fundamentals: A Crash Course 

### Instructor: Soubhik Barari

#### (with solutions)

Welcome! This Jupyter notebook serves as a walk-through of the most fundamental features of the Python programming language *specifically for data science*. Therefore, this is a *practical* rather than *comprehensive* guide, giving you the core essentials to get started with data science. 

<a class="anchor" id="toc"></a>
### <u>Table of Contents</u>
1. [**Basic Types**](#basic-types)
2. [**Basic Data Structures (`list` and `dict`)**](#data-struct)
3. [**Control Flow**](#control-flow)
4. [**Functional and Object-Oriented Programming**](#func-obj)
5. [**Working with Strings (`str`)**](#str)**
6. [**Handling Errors**](#err)**
7. [**Reading and Writing**](#rw) **
8. [**System Libraries**](#sys-lib) **
9. [**Miscellaneous Topics**](#misc) **

<sub>** = optional</sub>

**TIP:** when first learning Python, it's a good idea to have the documentation up next to your workspace. For Python 2.7 - the version of Python used in this notebook - that is:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**[`https://docs.python.org/2.7/`](https://docs.python.org/2.7/)**


This notebook assumes that you have some basic exposure of or proficiency with the **`R`** programming language or an imperative programming language of similar complexity.



## 1. Basic Types  <a class="anchor" id="basic-types"></a>

First we're going to go over the most fundamental building blocks of Python: basic data types. <br>

As you would expect, Python has a data type for integers (**`int`**), strings (**`str`**), booleans (**`bool`**), and for floating point numbers/decimals (**`float`**). If you're familiar with R, these types pretty much directly translate with minor syntactical differences.

In [5]:
a = 17
b = "Hello world"
c = 806.0
d = """
    This is a multiline string.
    """
e = True
f = None

You can write comments with a single **`#`** just like in R.

In [6]:
# This is a comment!

The print command is simply **`print`**.

In [7]:
print a
print b
print c
print d

17
Hello world
806.0

    This is a multiline string.
    


The equivalent of **`typeof`** in R is **`type`** in Python to discover the data type of any variable.

In [8]:
print type(a)
print type(b); print type(c); print type(d)

<type 'int'>
<type 'str'>
<type 'float'>
<type 'str'>


The usual syntax for numerical operations in R still apply. 

In [9]:
print a + c
print a / 5

823.0
3


In [10]:
print float(a) / 5

3.4



### 1.1. **Exercise:** 
Concatenate **`a`, `b`, `c`, `d`** into a single **`string`** object and display the output: 


### <div style="color:red">Solution:</div>

In [11]:
str(a) + b + str(c) + d
# And the award for the ugliest string in the world goes to...

'17Hello world806.0\n    This is a multiline string.\n    '

**TIP**: when you want to learn more about a variable's *class*, use the **`help`** function!

In [12]:
help(type(d))

Help on class str in module __builtin__:

class str(basestring)
 |  str(object='') -> string
 |  
 |  Return a nice string representation of the object.
 |  If the argument is a string, the return value is the same object.
 |  
 |  Method resolution order:
 |      str
 |      basestring
 |      object
 |  
 |  Methods defined here:
 |  
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> string
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __getnewargs__(...)
 |  
 |  __getslice__(...)
 |      x.__getslice__(i, j) <==> x[i:j]
 |      
 |      Use of negative indices is not supported.
 |  


### <div style="color:purple">SUMMARY:</div>
- Arithmetic operations are the same as in **`R`**.


- **`float`** and **`int`** behave differently under division.


- Type-casting is encouraged!


- Functions to save your life: **`print`**, **`type`** and **`help`**.
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 2. Basic Data Structures: **`list`** and **`dict`** <a class="anchor" id="data-struct"></a>

We are going to go over the three most important data structures that can "collect" and organize multiple variables of various data types.

### 2.1. <u>**`list`**</u>

An ordered collection that can contain variables of multiple data types and is accessed via numerical index. Can be modified (*mutable*).

In [13]:
states_list = ["MA", "CA", "TX", "PA"]          # INSTANTIATION

Ways to index/slice into a Python list **`l`**:
* **`l[i]`** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - access single element at index **`i-1`**
* **`l[start:end]`** &nbsp;&nbsp;&nbsp;&nbsp; - access all elements between indices **`start-1`** and **`end-1`**
* **`l[start:end:n]`**&nbsp; - access every **`n`<sup>th</sup>** element between indices **`start-1`** and **`end-1`**

**TIP:** Python is a zero-indexed programming language!

In [14]:
print states_list[0]
print states_list[0:2]
print states_list[0:3:2]

MA
['MA', 'CA']
['MA', 'TX']


**TIP:** The **`n`** in Python can be negative, in which case you will iterate backwards from the specified **`start`** to the **`end`** with the step size of **`n`**.

In [15]:
print states_list[3:0:-1]

['PA', 'TX', 'CA']


In [1]:
print len(states_list)                         # LENGTH

NameError: name 'states_list' is not defined

In [17]:
states_list = states_list + ["AZ", "MN"]       # CONCATENATION
print states_list

['MA', 'CA', 'TX', 'PA', 'AZ', 'MN']


In [18]:
print ["Here's Johnny!"]*3                     # REPETITION

["Here's Johnny!", "Here's Johnny!", "Here's Johnny!"]


In [19]:
print sorted(states_list)                      # SORTING (way 1)
states_list.sort()                             # SORTING (way 2)
print states_list

['AZ', 'CA', 'MA', 'MN', 'PA', 'TX']
['AZ', 'CA', 'MA', 'MN', 'PA', 'TX']


In [20]:
states_list.insert(1, "AZ")                    # INSERT
print states_list

['AZ', 'AZ', 'CA', 'MA', 'MN', 'PA', 'TX']


In [21]:
states_list.append(17)                         # APPEND
print states_list

['AZ', 'AZ', 'CA', 'MA', 'MN', 'PA', 'TX', 17]


In [22]:
print states_list.index("AZ")                  # FIND (get index)
print "AZ" in states_list                      # FIND (get y/n answer)
print "PA" in states_list

0
True
True


In [23]:
states_list.count("AZ")                        # COUNT

2

In [24]:
zip(["MA","AZ","PA"],["Mass.","Ariz","Penn."]) # ZIP

[('MA', 'Mass.'), ('AZ', 'Ariz'), ('PA', 'Penn.')]

In [25]:
print min([5.15, 10.00, 7.25, 11.00])          # MINIMUM
print max([5.15, 10.00, 7.25, 11.00])          # MAXIMUM

5.15
11.0


In [26]:
print range(0, 7)                              # RANGES
print range(0, len(states_list))
print range(len(states_list))

[0, 1, 2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7]


### 2.2. <u>**`dict`**</u>

A collection of items (**values**) that can be accessed by corresponding labels (**keys**).  

In [4]:
min_wage_dict = {                              # INSTANTIATION (way 1)
    "MA"  : 11.00,
    "AZ"  : 10.00,
    "GA"  : 5.15,
    "TX"  : 7.25
}

states_list = ["MA", "AZ", "GA", "TX"]         # INSTANTIATION (way 2)
min_wages = [11.00, 10.00, 5.15, 7.25]
min_wage_dict = dict(zip(states_list, min_wages))

min_wage_dict = dict(MA=11.00,                 # INSTANTIATION (way 3)
                     AZ=10.00,
                     GA=5.15,
                     TX=7.25)
print min_wage_dict

{'AZ': 10.0, 'MA': 11.0, 'TX': 7.25, 'GA': 5.15}


In [8]:
min_wage_dict["MA"] = 15.00                    # UPDATE VALUE(S)
print min_wage_dict["MA"]

15.0


In [29]:
print min_wage_dict.items()                    # FLATTEN DICTIONARY

[('AZ', 10.0), ('MA', 15.0), ('TX', 7.25), ('GA', 5.15)]


**TIP:** a **`tuple`** is just a read-only list. Note that each key-value pair that comes out of **`min_wage_dict.items()`** is a tuple!

In [30]:
print "States:", min_wage_dict.keys()          # GET KEYS
print "Wages:", min_wage_dict.values()         # GET VALUES

States: ['AZ', 'MA', 'TX', 'GA']
Wages: [10.0, 15.0, 7.25, 5.15]


### 2.3. **Exercise:** 
Write some code to find the lowest minimum wage in the states in our "sample" and output the state name, using either the **`dict`** or the **`list`** variables.

### <div style="color:red">Solution:</div>

In [31]:
sol_A = states_list[min_wages.index(min(min_wages))]
sol_B = min(min_wage_dict.items(), key=lambda (k,v): (v,k))[0]

### 2.4. **Exercise:** 
Write some code to find the average minimum wage in the states in our "sample" rounded to the nearest dollar, using either the **`dict`** or the **`list`** variables.


### <div style="color:red">Solution:</div>

In [32]:
int(sum(min_wages)/len(min_wages))

8

**TIP:** A **`list`** can be converted to a **`set`** (i.e. stores only unique elements of a list):

In [33]:
set([1,2,2,2,2,3,3,3,5,5,5])

{1, 2, 3, 5}

### <div style="color:purple">SUMMARY</div>
- A **`list`** can be indexed, sliced using a range, or sliced using intervals.


- The first element of a list has index 0!


- A **`dict`** is a "named" list.


- There are many global functions (e.g. **`zip`**) and class functions (e.g. **`d.items()`**) that are applicable to **`list`** and **`dict`** objects respectively. 


- A **`tuple`** is a read-only list.


- A **`set`** is a de-depulicated list.
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 3. Control Flow <a class="anchor" id="control-flow"></a>

Now that we know how to create variables as well as organize them into some simple but powerful data structures, let's learn how to "traverse" our data structures in order to do tasks that involve *more than one element of a data structure*.

### 3.1. <u>Loops</u>

**TIP:** Indentation matters in Python!!

In [34]:
for s in states_list:                # FOR-loop
    print s

MA
AZ
GA
TX


In [35]:
for i in range(len(states_list)):    # FOR-loop (over range)
    print i

0
1
2
3


In [36]:
for (i, s) in enumerate(states_list): # FOR-loop (over enumerated list)
    print i, s

0 MA
1 AZ
2 GA
3 TX


In [37]:
for (i, s) in enumerate(states_list):# FOR-loops (nested)
    print i
    for char in s:
        print "'"+char+"'"

0
'M'
'A'
1
'A'
'Z'
2
'G'
'A'
3
'T'
'X'


In [38]:
i = 0                                # WHILE-loop
while (i < (len(states_list)-1)):
    print states_list[i]
    i = i + 1

MA
AZ
GA


### 3.2. <u>Conditional statements</u>

In [39]:
for (i, s) in enumerate(states_list): # IF statement
    if s[1] == "A":
        print s

MA
GA


In [40]:
for (i, s) in enumerate(states_list): # IF/ELSE statements
    if i == 0:
        print "BEST STATE:",s
    elif i == (len(states_list)-1):
        print "WORST STATE:",s
    else:
        continue                      # CONTINUE statement

BEST STATE: MA
WORST STATE: TX


In [41]:
for (i, s) in enumerate(states_list):
    if "G" in s:
        print s
        print "Found a state w/'G' in it! DONE!" 
        break                         # BREAK statement

GA
Found a state w/'G' in it! DONE!


**TIP:** You can create an entire conditional clause in one line ... similar to calling **`ifelse()`** in **`R`**:

In [42]:
has_best_state = True if "MA" in states_list else False
print has_best_state

True


### 3.3. <u>**Application: State minimum wage data**</u>

Below, we have dictionaries denoting which *region* each of the states the U.S. belongs to as well as the corresponding *majority party* in the state legislature (in 2017).



In [43]:
R2S   = {'Mountain': ['AZ', 'CO', 'ID', 'MT', 'NM', 'NV', 'UT', 'WY'], 'South Atlantic': ['DC', 'DE', 'FL', 'GA', 'MD', 'NC', 'SC', 'VA', 'WV'], 'New England': ['CT', 'MA', 'ME', 'NH', 'RI', 'VT'], 'East North Central': ['IL', 'IN', 'MI', 'OH', 'WI'], 'West North Central': ['IA', 'KS', 'MN', 'MO', 'ND', 'NE', 'SD'], 'Pacific': ['AK', 'CA', 'HI', 'OR', 'WA'], 'Middle Atlantic': ['NJ', 'NY', 'PA'], 'West South Central': ['AR', 'LA', 'OK', 'TX'], 'East South Central': ['AL', 'KY', 'MS', 'TN']}
S2P   = {'WA': 'Dem', 'DE': 'Dem', 'WI': 'Rep', 'WV': 'Rep', 'HI': 'Dem', 'FL': 'Rep', 'WY': 'Rep', 'NH': 'Rep', 'NJ': 'Dem', 'NM': 'Dem', 'TX': 'Rep', 'LA': 'Rep', 'NC': 'Rep', 'ND': 'Rep', 'TN': 'Rep', 'NY': 'Dem', 'PA': 'Rep', 'CA': 'Dem', 'NV': 'Dem', 'VA': 'Rep', 'CO': 'Split', 'AK': 'Rep', 'AL': 'Rep', 'AR': 'Rep', 'VT': 'Dem', 'IL': 'Dem', 'GA': 'Rep', 'IN': 'Rep', 'IA': 'Rep', 'OK': 'Rep', 'AZ': 'Rep', 'ID': 'Rep', 'CT': 'Split', 'ME': 'Split', 'MD': 'Dem', 'MA': 'Dem', 'OH': 'Rep', 'UT': 'Rep', 'MO': 'Rep', 'MN': 'Rep', 'MI': 'Rep', 'RI': 'Dem', 'KS': 'Rep', 'MT': 'Rep', 'MS': 'Rep', 'SC': 'Rep', 'KY': 'Rep', 'OR': 'Dem', 'SD': 'Rep'}
S2W   = {'WA': 11.0, 'DE': 8.25, 'DC': 12.5, 'WI': 7.25, 'WV': 8.75, 'HI': 9.25, 'FL': 8.01, 'WY': 7.25, 'NH': 7.25, 'NJ': 8.44, 'NM': 7.5, 'TX': 7.25, 'LA': 7.25, 'NC': 7.25, 'ND': 7.25, 'NE': 9.0, 'TN': 7.25, 'NY': 9.70, 'PA': 7.25, 'CA': 10.5, 'NV': 8.25, 'VA': 7.25, 'CO': 9.31, 'AK': 9.81, 'AL': 7.25, 'AR': 8.5, 'VT': 10.0, 'IL': 8.25, 'GA': 7.25, 'IN': 7.25, 'IA': 7.25, 'OK': 7.25, 'AZ': 10.0, 'ID': 7.25, 'CT': 10.1, 'ME': 9.0, 'MD': 9.25, 'MA': 11.0, 'OH': 8.15, 'UT': 7.25, 'MO': 7.71, 'MN': 9.5, 'MI': 8.91, 'RI': 9.60, 'KS': 7.25, 'MT': 8.15, 'MS': 7.25, 'SC': 7.25, 'KY': 7.25, 'OR': 10.25, 'SD': 8.65}

In [44]:
print R2S["New England"]
print S2P["AZ"]
print S2W["MA"]

['CT', 'MA', 'ME', 'NH', 'RI', 'VT']
Rep
11.0


#### 3.3.1. **Exercise:** 
First, output which party has the most state legislature majorities across all states.

#### <div style="color:red">Solution:</div>

In [45]:
parties = []
counts = []

for party in set(S2P.values()):
    parties.append(party)
    counts.append(S2P.values().count(party))
    
parties[counts.index(max(counts))]

# Note: Using loops is pretty painful and gross here! Hold this 
#       pain close to your heart as we go into functional
#       programming for sigh of relief.

'Rep'

#### 3.3.2. **Exercise:** 
Next, calculate the average minimum wage for states with *Democratic* state legislatures and *Republican* state legislatures respectively.

#### <div style="color:red">Solution:</div>

In [46]:
dem_mw = []
rep_mw = []

for state, party in S2P.items():
    if party == "Dem":
        dem_mw.append(S2W[state])
    else:
        rep_mw.append([state])

print "DEM state average min wage: $", round(sum(dem_mw)/len(dem_mw), 2)
print "REP state average min wage: $", round(sum(rep_mw)/len(rep_mw), 2)

DEM state average min wage: $ 9.37
REP state average min wage: $

TypeError: unsupported operand type(s) for +: 'int' and 'list'

#### 3.3.3. **Exercise:** 
Which region has the highest average minimum wage?

#### <div style="color:red">Solution:</div>

In [None]:
# First, find the highest minimum wage within each region. Then
# find the highest minimum wage between regions.

region_names = []
region_avg_mws = []
for region_name, region_states in R2S.items():
    region_mws = []
    for state in region_states:
        region_mws.append(S2W[state])
    region_avg_mw = sum(region_mws)/len(region_mws)
    region_avg_mws.append(region_avg_mw)
    region_names.append(region_name)
    
region_names[region_avg_mws.index(max(region_avg_mws))]

### <div style="color:purple">SUMMARY</div>
- Indentation matters for loops and conditional statements.


- You can write an entire clause of conditional statements in one line!


- The **`enumerate`** function automatically pairs each element of an iterable variable with its index.


- Each character in a **`str`** variable can be iterated over.


- In a loop, **`continue`** and **`break`** statements skip ahead to the next iteration and exit the loop respectively.


- Sometimes using loops is a huge pain when using multiple or nested data structures...
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 4. Functional and Object-Oriented Programming <a class="anchor" id="func-obj"></a>

Finally, we will learn the syntax for two important structures in Python: functions and classes.

### 4.1. <u>**Functions**</u>

A function groups a set of statements together so that they may be run anywhere in a Python program. As in any other programming language, a Python function takes a set of inputs and computes a output value to return to the user.

There are two ways to define functions. The first way is via a **`def`** statement:

In [None]:
def f(x):
    return x**2

In [None]:
def compute_average(iterable):
    """This function returns the average of elements in an
       iterable data type such as a list.
    """
    return sum(iterable)/float(len(iterable))

In [None]:
Last_Computed_Average = None
def compute_average2(iterable):
    """This function returns the average of elements in an
       iterable data type such as a list. Additionally it 
       stores the last computed average in a global variable.
    """
    global Last_Computed_Average
    Last_Computed_Average = sum(iterable)/float(len(iterable))
    return Last_Computed_Average

**TIP:** Functions defined via **`def`** do not need to have a **`return`** statement at the end! In other words, a function can simply "do some stuff" and then return nothing back to the user (the return value will just be `None`).

The second way is via a **`lambda`** expression:

In [None]:
compute_average = lambda i: sum(i)/float(len(i)) 

Both share the same type (**`function`**), however there are only 2 differences between **`def`** and **`lambda`**:

* **Anonymity**. A **`def`**'d fxn must have a name while a **`lambda`** does not need be assigned to a variable with a name.

* **Sparsity**. A **`def`**'d fxn can contain an arbitrary number of statements while a **`lambda`** must consist of a single expression (no assignments allowed!). 


**TIP:** The biggest convenience of **`lambda`** functions is being able to neatly pass them to *higher-order functions* -- that is, functions that take functions as their argments. See below:


In [None]:
map(lambda x: x**2, range(10))

In [None]:
filter(lambda s: s in R2S["Pacific"], ["AZ","MA","CA","OR"])

In [None]:
reduce(lambda x_i, x_j: x_i + "," + x_j, ["AZ","MA","CA","OR"])

In [None]:
# How many regions have more than 7 states?
filter(lambda y: y > 7, map(lambda x: (x[0], len(x[1])), R2S.items()))

#### 4.1.1. **Exercise:** 
Output which party has the most state legislature majorities across all states *but without any **`for`** loops, **`while`** loops, or **`if/else`** statements*.

**HINT:** Python's **`max`** function can sort iterable variables by some custom criterion using the **`key`** argument.

#### <div style="color:red">Solution:</div>

In [None]:
party_and_count = map(lambda p: (p, S2P.values().count(p)), set(S2P.values()))
max(party_and_count, key=lambda (party, count): count)[0]

**TIP:** The **`numpy`** package in Python has an implementation of "mode" which would come in handy here. See the next notebook for how to work with **`numpy`**!

#### 4.1.2. **Exercise:** 
Calculate the average minimum wage for states with *Democratic* state legislatures and *Republican* state legislatures respectively *but without any **`for`** loops, **`while`** loops, or **`if/else`** statements*.

#### <div style="color:red">Solution:</div>

In [None]:
def avg(l):
    """Can also define this as a lambda func!"""
    return sum(l)/len(l)

d_avg = avg(map(lambda (s,p): S2W[s], filter(lambda (s,p): p == "Dem", S2P.items())))
r_avg = avg(map(lambda (s,p): S2W[s], filter(lambda (s,p): p == "Rep", S2P.items())))

print "DEM state average min wage: $", round(d_avg, 2)
print "REP state average min wage: $", round(r_avg, 2)

#### 4.1.3. **Exercise:**

Finally, which region within *Republican* states has the lowest average minimum wage?

#### <div style="color:red">Solution:</div>

In [None]:
regions  = []
r_min_ws = []
for r, slist in R2S.items():
    r_min_ws.append(avg(filter(lambda x: x != 0, map(lambda s: S2W[s] if s in s in S2P.keys() and S2P[s] == "Rep" else 0, slist))))
    regions.append(r)
regions[r_min_ws.index(min(r_min_ws))]

# Note: in this case, functional programing is a bit overkill / not needed

### 4.2. <u>**Classes**</u>

A class is a framework that defines objects that contain certain properties (e.g. functions, shared globals, and local variables). 

Once a class is defined, it can be instantiated as an "instance" of that class.

All classes must have an **`__init__`** function defined. This function is what first runs when a class instance is created, based on the user's inputs.

Example:

In [None]:
class Animal:
    def __init__(self, species, name):     # INIT FUNCTION (must have `self` as first argument)
        self.my_species = species
        self.my_name    = name
        
    def say_my_name(self):                 # CLASS FUNCTION
        print "Hello human! My name is",self.my_name
        
    def say_my_species(self):
        print "I belong to the",self.my_species,"species"

In [None]:
doggo = Animal(species="German Sepherd", name="Eustace")
doggo.say_my_name()
doggo.say_my_species()

### 4.3. **Exercise:** 
The following is a class implementation of a U.S. state. Write a function for this class that returns the minimum wage rounded to the nearest dollar.

### <div style="color:red">Solution:</div>

In [None]:
class State:
    def __init__(self, abbrv, region, min_wage, maj_party_Dem):
        """Since Python is dynamically typed (types are not declared 
        for variables before running a program), it is good practice to 
        document your `init` function so that users know what to pass
        as parameters.
        
        Args:
            abbrv (str):          Two letter abbreviation for state.
            region (str):         Which U.S. region this state belongs to.
            min_wage (float):     The 2017 minimum wage in $USD.
            maj_party_Dem (bool): Whether the state legislature is controlled
                                  by Democrats in 2017.
        
        """
        self.abbrv         = abbrv
        self.region        = region
        self.min_wage      = min_wage
        self.maj_party_Dem = maj_party_Dem
        
    def print_min_wage(self):
        """Prints this state's minimum wage in 2017.
        """
        print "$",self.min_wage
        
    # SOLUTION fxn >>>>>>>
    def get_min_wage(self):
        return int(self.min_wage)
    # <<<<<<<<<<<<<<<<<<<<
        
    def get_maj_party_name(self):
        """Returns the name of the majority party in this state's 
        legislature.
        """
        if self.maj_party_Dem == True:
            return "Democrat"
        else:
            return "Republican"
        
MA = State("MA", "New England", min_wage=11.00, maj_party_Dem=True)
MA

### 4.4. **Exercise:** 
Create a list of **`State`** objects using all of the available data we have collected on region, minimum wage, and majority party. 

### <div style="color:red">Solution:</div>

In [None]:
States = []
for (r, slist) in R2S.iteritems():
    for s in slist:
        if s in S2W.keys() and s in S2P.keys():
            maj_party_Dem = True if S2P[s] == "Dem" else False
            States.append(State(s, r, S2W[s], maj_party_Dem))

### <div style="color:purple">SUMMARY</div>
- In this section, we demonstrated how Python is both **functional** and **object-oriented** .. most data science packages use a mix of the 2 "styles".


- A **`lambda`** is basically just a slim and trim version of a function defined by **`def`**.


- The functions **`max`**, **`min`**, and **`sorted`** can take in lambda functions (with the **`key`** argument) that describe alternative values to use when sorting each element in a list. 


- With **`map`**, **`filter`**, and **`reduce`**, you can pretty much condense *anything* into a single line full of **`lambda`** functions ... use this power wisely.


- Classes provide a powerful way of encapsulating functions into "container" objects.

<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 5. Working with Strings (**`str`**) [optional: *useful for debugging*] <a class="anchor" id="str"></a>

Python has a variety of powerful string parsing functions. 

Some of these functions are implemented as **`str`** class-methods, e.g.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**`myString.fxn()`**

while others are global functions to be called on strings, e.g.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**`fxn(myString)`**

Examples are given below:

In [None]:
s = "Cambridge, Massachusetts"

In [None]:
print len(s)                                       # LENGTH

In [None]:
print s[0]                                         # INDEX and SLICE into a string
print s[-1]
print s[0:9]

In [None]:
print ".".join([str(a), str(int(c))])              # JOIN strings
print ", ".join(["Cook County", "Illinois"])

In [None]:
s1 = "Course-Class-Notes-Syllabus"                 # SPLIT strings into a list
print s1.split("-") 

In [None]:
print "---2018-Election-Results---".strip("-")     # STRIP chars. from ends

In [None]:
y = "My GPA: 5.000"                                # FIND starting index of a substring
print y.find("GPA")

In [None]:
print "Course {0}, Class {1}".format(a, c)         # FORMAT strings 
print "Course {cr}, Class {cl}".format(cr=a, cl=c)
print "Course %i, Class %1.f" % (a, c)

### 5.1. **Exercise:** 
Given the following strings,

In [None]:
results_2016 = [
    "Middlesex (MA) [Dem] -- 66.3%",
    "Clark (IL) [Rep] -- 71.9%",
    "Maricopa (AZ) [Rep] -- 49.1%"
]

Use the above string parsing methods, write a function and apply it to each string and return a list containing:

1. The county name
2. The state name
3. The party name
4. The vote share (as a float)

### <div style="color:red">Solution:</div>

In [None]:
def parse(in_string):
    space_delim = in_string.split(" ")
    return [
        space_delim[0],
        space_delim[1].strip("()"),
        space_delim[2].strip("[]"),  
        float(space_delim[-1].strip("%"))
    ]

map(parse, results_2016)

### <div style="color:purple">SUMMARY</div>

- **`str`** variables have many nifty global and class methods (e.g. **`.join()`**, **`.split()`**, **`.find()`**)


- **`str`** variables can be treated like lists of individual characters; thus many **`list`** functions also apply to strings.  
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 6. Handling Errors [optional: *useful for debugging*] <a class="anchor" id="err"></a>

What is an **exception**? 

Let's find out:

In [None]:
# print S2P["DC"]

An **exception** is an event, which occurs during the execution of a program that disrupts the normal flow of the program's instructions. 

In general, when a Python script encounters a situation that it cannot cope with, it raises an exception. An exception is a Python object that represents an **error**.

Python allows us to handle errors in cutomized ways and thereby gracefully continue executing our program without exiting out on the problematic line of code:

In [None]:
try:
    S2P["DC"]
except KeyError as e:
    print "Entity not found in `S2P` ... moving on"

You can create highly informative error messages using the string formatting techniques discussed above to guide the user in debugging:

In [None]:
try:
    S2P["DC"]
except KeyError as e:
    print "Entity %s not found in `S2P` ... moving on" % e

Additionally, you can create your *own* exceptions when something happens in your program that you don't like:

In [None]:
if "MA" in S2P.keys():
    raise Exception("Coastal elite present in data!")

### 6.1. **Exercise:** 
Write a block of code that finds the U.S. states with missing party information using **`try/catch`**. 

### <div style="color:red">Solution:</div>

In [None]:
# Let's iterate through every state in each region and try to get party data.
# If it fails, we obviously know that state does not have party data so we can
# add it to a to-do list.
missing_ = []
for r, slist in R2S.items():
    for s in slist:
        try:
            p = S2P[s]
        except Exception as e:
            missing_.append(s)
missing_

See [**`https://docs.python.org/2.7/tutorial/errors.html`**](https://docs.python.org/2.7/tutorial/errors.html) for an overview of different Python exception types.

### <div style="color:purple">SUMMARY</div>

- **`try`** and **`except`** statements are useful in making sure bugs/errors don't end the execution of your program (i.e. your program can still smoothly continue running).


- The **`except`** that you use can behave differently based on the kind of **`Exception`** (error) that the program has enountered.


- These statements are absolutely *crucial* when doing **web-scraping** -- a task where you will undoubtedly encounter many errors / timeouts / failures, etc.
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 7. Reading and Writing [optional:* useful for messy files*] <a class="anchor" id="rw"></a>

It is recommended using Python's own native read/write functions when not dealing with columnar data (.csv, .xlsx) which can be more easily handled by the third-party **`pandas`** library (more on this in the next notebook!).

In [50]:
# 'w' marker is for 'write'
# 'r' marker is for 'read'
# 'b' marker is to open file in binary

# with open("data/state2party2017.txt", "wb") as f: 
#     f.write(str(S2P))
#     print "OK write."

# with open("data/state2party2017.txt", "rb") as f: 
#     print f.read()
#     print "OK read."

For a brief primer on reading/writing files, see:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[**`https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files`**](https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files)



### 7.1. **Exercise**: 
The **`.json`** file-type (used to store dictionaries) is precisely one file format that is best dealt with using lower level read/write functions in Python itself. Using the **`json`** library and consulting the appropriate documentation, perform the following tasks:

1. Flatten the dictionary **`R2S`** and prints it in a "pretty" way.
2. Save the dictionary **`R2S`** out to a .json file (**`region2states.json`**) and then read it back in as a **`dict`** object, rather than a string.

### <div style="color:red">Solution:</div>

In [49]:
import json

# print json.dumps(R2S, indent=4)
# with open("output/region2states.json", "wb") as f:
#     f.write(json.dumps(region2states, indent=4))
    
# with open("output/region2states.json", "rb") as f:
#     R2S = json.load(f)

{
    "West South Central": [
        "AR", 
        "LA", 
        "OK", 
        "TX"
    ], 
    "Mountain": [
        "AZ", 
        "CO", 
        "ID", 
        "MT", 
        "NM", 
        "NV", 
        "UT", 
        "WY"
    ], 
    "Middle Atlantic": [
        "NJ", 
        "NY", 
        "PA"
    ], 
    "South Atlantic": [
        "DC", 
        "DE", 
        "FL", 
        "GA", 
        "MD", 
        "NC", 
        "SC", 
        "VA", 
        "WV"
    ], 
    "East South Central": [
        "AL", 
        "KY", 
        "MS", 
        "TN"
    ], 
    "New England": [
        "CT", 
        "MA", 
        "ME", 
        "NH", 
        "RI", 
        "VT"
    ], 
    "East North Central": [
        "IL", 
        "IN", 
        "MI", 
        "OH", 
        "WI"
    ], 
    "West North Central": [
        "IA", 
        "KS", 
        "MN", 
        "MO", 
        "ND", 
        "NE", 
        "SD"
    ], 
    "Pacific": [
        "AK", 
        "CA", 
        "HI",

IOError: [Errno 2] No such file or directory: 'outputs/region2states.json'

### <div style="color:purple">SUMMARY</div>

- Reading/writing files in Python is really easy!


- All such read/write operations should happen within the scope of a **`with open()`** clause. Within such a clause, the file in question stays open and immediately closes once the clause is exited.
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 8. System Libraries [optional: *useful for complex tasks*] <a class="anchor" id="sys-lib"></a>

A list of other essential Python libraries for advanced Python users:

* **`sys`/`os`** &nbsp;&nbsp;&nbsp;&nbsp;- Perform functionalities that your system / OS provides (e.g. execute bash scripts, navigate filesystem).
* **`csv`** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - Read/write + interface with csv files on a low level.
* **`urllib2`** &nbsp;- Perform HTTP requests.
* **`math`** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - Basic mathematical operations and expressions (e.g. pi, e, inf)
* **`random`** &nbsp;&nbsp;&nbsp; - Useful for random sampling and shuffling.
* **`datetime`** - Parsing and creating dates.
* **`sqlite3`** &nbsp;&nbsp;- Work with SQLLite databases.
* **`joblib`** &nbsp;&nbsp;&nbsp; - Execute parallel functions.
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

## 9. Miscellaneous Topics [optional: *useful for writing cleaner/faster code*]

There is a number of other very "Pythonic" topics that we did not fully cover in this notebook for the sake of brevity. 


### 9.1. **<u>List Comprehensions</u>**

List comprehensions can be used to define collections mathematically as well as more quickly iterate over a complex, nested data structure. 

Consider the following mathematical sets:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**`A = {x² : x in {0 ... 9}}`**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**`B = (1, 2, 4, 8, ..., 2¹²)`**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**`C = {x | x in S and x even}`**

These can be written in Python as follows:

In [None]:
A = [x**2 for x in range(10)]
B = [2**i for i in range(13)]
C = [x for x in A if x % 2 == 0]

In [None]:
print A
print B
print C

A more complicated example extracting prime and non-prime numbers:

In [None]:
no_primes = [j for i in range(2, 8) for j in range(i*2, 50, i)]
primes = [x for x in range(2, 50) if x not in no_primes]

In [None]:
print no_primes
print primes

#### 9.1.1. **Exercise:** 
Create a list comprehension expression that outputs a list where each element is a **`tuple`** consisting of `(region, state, wage, party)`.

#### <div style="color:red">Solution:</div>

In [None]:
[(r, s, S2W[s], S2P[s]) for (r, slist) in R2S.iteritems() for s in slist if s in S2P.keys()]

### 9.2. **<u>Generators</u>**

What happens if you iterate over a *really huge* collection of items? Chances are that your RAM might explode and you will become deeply unhappy.

Rather than fitting all items of your *really huge* collection into memory immediately as you iterate, you can:

1. Ask for each value one at a time.
2. Use each generated value.
3. Throw it away.

This is essentially what a generator expression does, which can conserve a lot of memory if you're working with a huge dataset!

Functionally, however, a generator *serves* the same role as a list comprehension. The method of iteration over the collection is simply different. 

See below for a comparison:

In [None]:
list_of_squares = [x**2 for x in range(10)]
for y in list_of_squares:
    print y

In [None]:
generator_of_squares = (x**2 for x in range(10))
for y in generator_of_squares:
    print y

You can also use the **`next()`** function to iterate through a generator one element at a time:

In [None]:
generator_of_squares = (x**2 for x in range(10))
print next(generator_of_squares)
print next(generator_of_squares)
print next(generator_of_squares)
print next(generator_of_squares)
print next(generator_of_squares)

### 9.3. **<u>`yield` Statement</u>**

In essence, a generator is a "dynamic" list comprehension -- that is, each element of the list comprehension is created "on-the-fly" as we need it. However, note that list comprehensions must also all be contained in one line. In other words, a generator *can only really consist of a single expression*! 

This might be utterly ludicrous if we're doing something particularly complex to each element in our collection.

What if, instead, we could enclose the complex logic of determining the next element to dynamically generate in a *function*? That is precisely what we can do if we use a **`yield`** statement in our function rather than a **`return`**.

For example, the equivalent of the following generator:


In [None]:
square_lc = (x**2 for x in range(10))

would be:

In [None]:
def square_fxn():
    for x in range(10):
        yield x**2

To master this expression and de-confuse yourself, you must understand one thing: a function with a **`yield`** statement will return a generator and the **`yield`** statement declares what the next element in the generator will be!

For instance, when you call the above **`square_fxn`** it will return a generator object. You can then iterate through the generator.

Try it out!

In [None]:
g = square_fxn()
next(g)
next(g)
next(g)
next(g)
for x in g:
    print x

### 9.4. **Exercise:** 
The following block of code creates the *cartesian product* of all regional states. That is, assuming each region $r \in R$ has $N_r$ states, it creates $\prod_{r \in R}{N_{r}}$ tuples where each tuple contains $|R|$ elements, consisting of a unique combination of a single state from each region:

In [None]:
import itertools
all_regional_combos = list(itertools.product(*R2S.values()))
all_regional_combos[-1]

Note that since all the elements are being thrown into memory all at once, it is somewhat slow!

Write a function using either a **`yield`** statement or a generator expression that manually iterates over the cartesian product of regional states.

### <div style="color:red">Solution:</div>

In [None]:
# Easiest way to do this is just use the original expression 
# (without the `list` casting) which itself returns a 
# generator object!
all_regional_combos = itertools.product(*R2S.values())
next(all_regional_combos)

### <div style="color:purple">SUMMARY</div>

- List comprehensions are a *great* (and often more interpretable) alternative to **`map`** functions. Use them to replace nested loops that don't require a huge amount of "loop-body" code.


- Generators (either manually created or using **`yield`** embedded in a function) are essentially optimized list comprehensions. Use them when you expect to iterate over a *huge* amount of data or particularly complex/memory-intensive objects (e.g. state district geometric shapes, a large multi-GB page on a website).
<br><br>
<div style="color:gray; text-align:right; font-weight:bold;">[BACK TO TOP  &#8593;](#toc)</div>

# <span style="color:gray">END OF NOTEBOOK</span>

