# Applied Python
### Christopher Brooks, University of Michigan

# Fundamentals of Python

## Dictionaries
* Iterating over keys in a dictionary
* Create dictionary with curly brackets
* Append element to dictionary using bracket notation

In [1]:
x = {"Fred": 617, "Nate": 538} 
 
x["Sam"] = 234 
x

{'Fred': 617, 'Nate': 538, 'Sam': 234}

### Use loop to iterate over keys

In [2]:
for name in x: 
    print(x[name])

617
538
234


### Iterate over all items in list:

In [3]:
for name, email in x.items(): 
    print(name)
    print(email) 

Fred
617
Nate
538
Sam
234


## Unpacking a squence into different variables:

In [4]:
x = ('David', 'Wallace', '762')
fname, lname, number = x
fname

'David'

In [5]:
lname

'Wallace'

In [6]:
number

'762'

## Reading and Writing CSV Files
* Basics of iterating through CSV file to create dictionaries and collect summary statistics 
* First, let's import the CSV module, which will assist us in reading in our CSV file
* Datafile mpg.csv contains fuel economy data for 234 cars
* Use csv.DictReader to read in mpg.csv and convert it to a list of dictionaries

### Use iPython magic to set the floating point precision for printing to 2. 


### look at first three elements of list: 
* csv.Dictreader has read in each row of our csv file as a dictionary. 
* len shows that our list is comprised of 234 dictionaries.

In [7]:
import csv
%precision 2

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))  

mpg[:3] 

[OrderedDict([('', '1'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'auto(l5)'),
              ('drv', 'f'),
              ('cty', '18'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '2'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'manual(m5)'),
              ('drv', 'f'),
              ('cty', '21'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '3'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '2'),
              ('year', '2008'),
              ('cyl', '4'),
              ('trans', 'manual(m6)'),
              ('drv',

### Dictionaries in the list have 
* Column names of the CSV as keys, and 
* Data for each specific car are the values 

Use len to determine length of the list; a dictionary for each of the 234 cars in the CSV file. 

In [9]:
len(mpg)

234

### Use .keys() method to look at column NAMES of csv

In [10]:
mpg[0].keys()

odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

### Find the average CITY MPG across all cars in CSV file. 
* All values in dictionaries are strings, so we need to convert to float. 
* Sum city MPG entry across all dictionaries in our list and divide by the length of the list. 

In [11]:
sum(float(d['cty']) for d in mpg) / len(mpg)

16.86

### Find the average HIGHWAY MPG across all the cars in CSV file. 
* Average highway fuel economy is higher than in the city. 

In [12]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

23.44

### What is average city MPG is GROUPED BY the number of CYLINDERS a car has? 
* Create a set of values in cylinder entry of dictionaries will give us unique levels for number of cyl 
* Cars in our dataset have 4, 5, 6, and 8 cylinders. 

#### Use set() to return UNIQUE values for the number of cylinders the cars in our dataset have.

In [13]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

### Group cars by number of cylinder, and finding the average cty mpg for each group.
* First, create an empty list where we'll store our calculations. 
* Next, iterate over all cylinder levels. 
* And then iterate over all the dictionaries. 

If cylinder level for dict matches CYL level to calculate average for, 
* then add mpg to summpg variable and increment the count. 



In [14]:
CtyMpgByCyl = []

# First, iterate over all the cylinder levels
for c in cylinders:
    summpg = 0
    cyltypecount = 0
    
    # Second, iterate over all dictionaries
    for d in mpg:  
        
        # if the cylinder level type matches,
        if d['cyl'] == c: 
            
            # add the cty mpg
            summpg += float(d['cty'])  
            
            # increment the count
            cyltypecount += 1  

### After going through all dictionaries,
* Perform average MPG calculation and append it to our list. 

In [15]:
# append tuple ('cylinder', 'avg mpg')

CtyMpgByCyl.append((c, summpg / cyltypecount))
CtyMpgByCyl

[('8', 12.57)]

### To make things clearer: 
* Sort list from lowest number of cylinders to highest. 
* City fuel economy appears to be decreasing as number of cylinders increases. 

In [16]:
# sort list: low to high

CtyMpgByCyl.sort(key=lambda x: x[0]) 
CtyMpgByCyl

[('8', 12.57)]

### Example: Find average highway MPG for different vehicle classes. 
* What are the class types
* Use set() to return the unique values for the class types in our dataset.

In [17]:
vehicleclass = set(d['class'] for d in mpg) 
vehicleclass

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

### Similar to the last example
* First, iterate over all the vehicle classes, then iterate over all the dictionaries. 
* If vehicle class for dict matches vehicle class to compute average HWY MPG for, 
    * then add the value to our total, and increment the count. 
* Then we perform the average calculation [ sum / count ] and append it to our list. 		


In [19]:
HwyMpgByClass = []

for t in vehicleclass: 			# iterate over all vehicle classes
    summpg = 0
    vclasscount = 0
    
    for d in mpg: 					# iterate over all dictionaries
        if d['class'] == t: 			# if cylinder amt type matches
            summpg += float(d['hwy']) 			    # add hwy mpg
            vclasscount += 1 # increment the count
    
HwyMpgByClass.append((t, summpg / vclasscount)) 	    # append the tuple
	    # ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1]) 		# sort: lowest to highest 
HwyMpgByClass


[('suv', 18.13)]

### How to summarize data through iteration
* Don't worry if this seems somewhat inefficient or tedious. 
* Next week, we learn about Pandas, a Python library for easier, efficient, data analysis 

<br>
# Functions

In [None]:
x = 1
y = 2
x + y

In [None]:
x

<br>
`add_numbers` is a function that takes two numbers and adds them together.

In [None]:
def add_numbers(x, y):
    return x + y

add_numbers(1, 2)

<br>
`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell.

In [None]:
def add_numbers(x,y,z=None):
    if (z==None):
        return x+y
    else:
        return x+y+z

print(add_numbers(1, 2))
print(add_numbers(1, 2, 3))

<br>
`add_numbers` updated to take an optional flag parameter.

In [None]:
def add_numbers(x, y, z=None, flag=False):
    if (flag):
        print('Flag is true!')
    if (z==None):
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2, flag=True))

<br>
Assign function `add_numbers` to variable `a`.

In [None]:
def add_numbers(x,y):
    return x+y

a = add_numbers
a(1,2)

<br>
# Types and Sequences

<br>
Use `type` to return the object's type.

In [None]:
type('This is a string')

In [None]:
type(None)

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type(add_numbers)

<br>
Tuples are an immutable data structure (cannot be altered).

In [None]:
x = (1, 'a', 2, 'b')
type(x)

<br>
Lists are a mutable data structure.

In [None]:
x = [1, 'a', 2, 'b']
type(x)

<br>
Use `append` to append an object to a list.

In [None]:
x.append(3.3)
print(x)

<br>
This is an example of how to loop through each item in the list.

In [None]:
for item in x:
    print(item)

<br>
Or using the indexing operator:

In [None]:
i=0
while( i != len(x) ):
    print(x[i])
    i = i + 1

<br>
Use `+` to concatenate lists.

In [None]:
[1,2] + [3,4]

<br>
Use `*` to repeat lists.

In [None]:
[1]*3

<br>
Use the `in` operator to check if something is inside a list.

In [None]:
1 in [1, 2, 3]

<br>
Now let's look at strings. Use bracket notation to slice a string.

In [None]:
x = 'This is a string'
print(x[0]) #first character
print(x[0:1]) #first character, but we have explicitly set the end character
print(x[0:2]) #first two characters


<br>
This will return the last element of the string.

In [None]:
x[-1]

<br>
This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end.

In [None]:
x[-4:-2]

<br>
This is a slice from the beginning of the string and stopping before the 3rd element.

In [None]:
x[:3]

<br>
And this is a slice starting from the 3rd element of the string and going all the way to the end.

In [None]:
x[3:]

In [None]:
firstname = 'Christopher'
lastname = 'Brooks'

print(firstname + ' ' + lastname)
print(firstname*3)
print('Chris' in firstname)


<br>
`split` returns a list of all the words in a string, or a list split on a specific character.

In [None]:
firstname = 'Christopher Arthur Hansen Brooks'.split(' ')[0] # [0] selects the first element of the list
lastname = 'Christopher Arthur Hansen Brooks'.split(' ')[-1] # [-1] selects the last element of the list
print(firstname)
print(lastname)

<br>
Make sure you convert objects to strings before concatenating.

In [None]:
'Chris' + 2

In [None]:
'Chris' + str(2)

<br>
Dictionaries associate keys with values.

In [None]:
x = {'Christopher Brooks': 'brooksch@umich.edu', 'Bill Gates': 'billg@microsoft.com'}
x['Christopher Brooks'] # Retrieve a value by using the indexing operator


In [None]:
x['Kevyn Collins-Thompson'] = None
x['Kevyn Collins-Thompson']

<br>
Iterate over all of the keys:

In [None]:
for name in x:
    print(x[name])

<br>
Iterate over all of the values:

In [None]:
for email in x.values():
    print(email)

<br>
Iterate over all of the items in the list:

In [None]:
for name, email in x.items():
    print(name)
    print(email)

<br>
You can unpack a sequence into different variables:

In [None]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu')
fname, lname, email = x

In [None]:
fname

In [None]:
lname

<br>
Make sure the number of values you are unpacking matches the number of variables being assigned.

In [None]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu', 'Ann Arbor')
fname, lname, email = x

<br>
# The Python Programming Language: More on Strings

In [None]:
print('Chris' + 2)

In [None]:
print('Chris' + str(2))

<br>
Python has a built in method for convenient string formatting.

In [None]:
sales_record = {
'price': 3.24,
'num_items': 4,
'person': 'Chris'}

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items']*sales_record['price']))


<br>
# Reading and Writing CSV files

<br>
Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

In [None]:
import csv

%precision 2

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))
    
mpg[:3] # The first three dictionaries in our list.

<br>
`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries.

In [None]:
len(mpg)

<br>
`keys` gives us the column names of our csv.

In [None]:
mpg[0].keys()

<br>
This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float.

In [None]:
sum(float(d['cty']) for d in mpg) / len(mpg)

<br>
Similarly this is how to find the average hwy fuel economy across all cars.

In [None]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

<br>
Use `set` to return the unique values for the number of cylinders the cars in our dataset have.

In [None]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

<br>
Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [None]:
CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
    summpg = 0
    cyltypecount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl

<br>
Use `set` to return the unique values for the class types in our dataset.

In [None]:
vehicleclass = set(d['class'] for d in mpg) # what are the class types
vehicleclass

<br>
And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.

In [None]:
HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            summpg += float(d['hwy']) # add the hwy mpg
            vclasscount += 1 # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass

<br>
# Python: Dates and Times

In [None]:
import datetime as dt
import time as tm

<br>
`time` returns the current time in seconds since the Epoch. (January 1st, 1970)

In [None]:
tm.time()

<br>
Convert the timestamp to datetime.

In [None]:
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

<br>
Handy datetime attributes:

In [None]:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second # get year, month, day, etc.from a datetime

<br>
`timedelta` is a duration expressing the difference between two dates.

In [None]:
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta

<br>
`date.today` returns the current local date.

In [None]:
today = dt.date.today()

In [None]:
today - delta # the date 100 days ago

In [None]:
today > today-delta # compare dates