# Week 1 Notes

## Functions in Python

In [1]:
x = 1
y = 2
x + y

3

`add_numbers` is a function that takes two numbers as an argument, adds them and returns the result

In [2]:
def add_numbers(x, y):
    return x + y

add_numbers(1, 2)

3

Using optional parameters in function declaration

In [3]:
def add_numbers(x, y, z=None):
    if z == None:
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2))
print(add_numbers(1, 2, 3))

3
6


Using an optional flag parameter which defaults to `False`

In [7]:
def add_numbers(x, y, z=None, flag=False):
    if flag:
        print('Flag is true!')
    if z == None:
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2, flag=True))
print(add_numbers(1, 2, 3, flag=True))
print(add_numbers(1, 2, 3))

Flag is true!
3
Flag is true!
6
6


Assigning functions to a variable

In [9]:
def add_numbers(x, y):
    return x + y

a = add_numbers
a(1, 2)

3

## Types and Sequences in Python

Discuss tuples, list and dictionaries in Python

`type` function is used in Python to return the data type of a variable, as demonstrated through the subsequent examples.

In [10]:
type('This is a string.')

str

In [11]:
type(123)

int

In [12]:
type(123.45)

float

In [13]:
type(add_numbers)

function

In [14]:
type(a)

function

### Tuples
Tuples are immutable and they are declared within parenthesis with different types of values

tuple_example = (1, 'a', 2, 'b')
type(tuple_example)

### Lists
Lists are mutable and are declared within square brackets.

In [16]:
list_example = [1, 'a', 2, 'b']
type(list_example)

list

Since lists are mutable, we can add items to an already-declared list using the `append` function.

In [20]:
list_example.append(3)
list_example

[1, 'a', 2, 'b', 3, 3, 3]

Iterating through items in a list:

In [21]:
for item in list_example:
    print(item)

1
a
2
b
3
3
3


We can also iterate through items in a list using the indexing operator.

In [23]:
i = 0
while(i < len(list_example)):
    print(list_example[i])
    i = i + 1

1
a
2
b
3
3
3


Concatenating lists together using the `+` operator

In [24]:
[1, 2] + [3, 4]

[1, 2, 3, 4]

Repeating lists using the `*` operator

In [25]:
2 * [1, 2]

[1, 2, 1, 2]

Using the `in` operator to check for the existence of an item within a list

In [27]:
print (1 in [1, 2, 3])
print (4 in [1, 2, 3])

True
False


### Strings
Strings in Python are simply a list of characters, which allows us to leverage the operators for list.

Slicing a string using the bracket notation:

In [28]:
x = 'This is a string'
print(x[0])  # first character of the string
print(x[0:1])  # first character, with the end character being explicitly set
print(x[0:2])  # first two characters

T
T
Th


Returning the last element of the string:

In [29]:
x[-1]

'g'

Slice starting from the 4th element from the end and stopping before the 2nd element from the end

In [30]:
x[-4:-2]

'ri'

Slice starting from the beginning and stopping before the 5th element of the string

In [31]:
x[:4]

'This'

Slice starting from the 4th element and going all the way to the end of the string

In [32]:
x[4:]

' is a string'

Since strings are similar to list, we can use the `+` operator to concatenate and `*` to repeat strings, as well as use the `in` operator to check for the occurrence of a substring.

In [34]:
first_name = 'Jonathan'
last_name = 'Doe'
print(first_name + ' ' + last_name)
print(first_name * 3)
print('Jon' in first_name)

Jonathan Doe
JonathanJonathanJonathan
True


The `split` function returns a list of all words in a string, or a list split on a specific character.

In [35]:
full_name = 'Jonathan Arthur Doe'
first_name = full_name.split()[0]
last_name = full_name.split()[-1]
print(first_name)
print(last_name)

Jonathan
Doe


In [36]:
# Splitting on a specific character
csv = 'A,B,C,D,E,F'
characters = csv.split(',')
for character in characters:
    print(character)

A
B
C
D
E
F


Ensure non-string variables are converted into a string before concatenation, using the `str` function.

In [42]:
print('Chris' + str(1))
print('Chris' + str([1, 2, 3]))

Chris1
Chris[1, 2, 3]


### Dictionaries

Dictionaries are essentially a collection of key-value pairs.

In [43]:
# Declaration
x = {'Chris Brooks': 'brooks@umich.edu', 'Bill Gates': 'bill@mircosoft.com'}

# Retrieval
# Using bracket notation
x['Chris Brooks']

'brooks@umich.edu'

In [46]:
# Assiging new key-value pairs to a pre-existing dictionary
x['Kevyn Collins-Thompson'] = None
x['Kevyn Collins-Thompson']

Iterating over a dictionary:

1. Iterate over all keys.

In [47]:
for name in x:
    print(x[name])

brooks@umich.edu
bill@mircosoft.com
None


2. Iterate over all of the values.

In [48]:
for email in x.values():
    print(email)

brooks@umich.edu
bill@mircosoft.com
None


3. Iterate over all of the items as a list.

In [51]:
for name, email in x.items():
    print(name + ' : ' + str(email))

Chris Brooks : brooks@umich.edu
Bill Gates : bill@mircosoft.com
Kevyn Collins-Thompson : None


Unpacking a sequence:

_(Ensure that the number of values being unpacked match the number of variables the values are being assigned to)_

In [52]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu')
first_name, last_name, email = x

In [53]:
first_name

'Christopher'

In [55]:
last_name

'Brooks'

In [56]:
email

'brooksch@umich.edu'

Using built-in `format` method for convenient string formatting:

In [57]:
sales_record = {
    'price': 3.24,
    'num_items': 4,
    'purchaser': 'Ishan'
}

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}.'
print(sales_statement.format(sales_record['purchaser'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items'] *sales_record['price']))

Ishan bought 4 item(s) at a price of 3.24 each for a total of 12.96.


## CSV Files

In [60]:
import csv

In [61]:
%precision 2

'%.2f'

In [63]:
# Opening a csv file and using the csv library to parse it
# Here, we are creating a list of dictionaries for each line in the file
with open('mpg.csv') as csv_file:
    mpg = list(csv.DictReader(csv_file))

In [64]:
# First three dictionaries in our list 
mpg[:3]

[OrderedDict([('', '1'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'auto(l5)'),
              ('drv', 'f'),
              ('cty', '18'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '2'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '1.8'),
              ('year', '1999'),
              ('cyl', '4'),
              ('trans', 'manual(m5)'),
              ('drv', 'f'),
              ('cty', '21'),
              ('hwy', '29'),
              ('fl', 'p'),
              ('class', 'compact')]),
 OrderedDict([('', '3'),
              ('manufacturer', 'audi'),
              ('model', 'a4'),
              ('displ', '2'),
              ('year', '2008'),
              ('cyl', '4'),
              ('trans', 'manual(m6)'),
              ('drv',

In [66]:
# Denotes the number of dictionaries in the list, or the number of records in the csv file
len(mpg)

234

`keys` function gives the name of columns from our csv file, which is stored in the list of dictionaries as the keys for each dictionary.

In [68]:
mpg[0].keys()

odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

1. Finding average city fuel economy across all cars. Since all values in dictionaries are parsed from the CSV file and stored as string, we will need to convert them to float.

In [69]:
sum(float(d['cty']) for d in mpg) / len(mpg)

16.86

2. Similarly, we find average highway fuel economy across all cars.

In [70]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

23.44

3. Finding average city fuel economy whilst grouping cars by number of cylinders

We use the `set` method to obtain the set of number of unique cylinders across all cars in our dataset.

In [71]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

In [73]:
city_mpg_by_cyl = []

for cyl in cylinders:
    
    sum_mpg = 0
    cyl_type_count = 0
 
    for d in mpg:
        if d['cyl'] == cyl:
            sum_mpg += float(d['cty'])
            cyl_type_count += 1
            
    city_mpg_by_cyl.append((cyl, sum_mpg / cyl_type_count))
    
city_mpg_by_cyl

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

4. Finding average highway fuel economy whilst grouping cars by vehicle class

Again, we use the `set` function to obtain the set of all unique vehicle classes in our datasets.

In [74]:
vehicle_classes = set(d['class'] for d in mpg)
vehicle_classes

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

In [79]:
hwy_mpg_by_class = []

for vehicle_class in vehicle_classes:
    
    sum_mpg = 0
    vehicle_class_count = 0
    
    for d in mpg:
        if d['class'] == vehicle_class:
            sum_mpg += float(d['hwy'])
            vehicle_class_count += 1
            
    hwy_mpg_by_class.append((vehicle_class, sum_mpg / vehicle_class_count))
    
# sorting by the average highway fuel economy
hwy_mpg_by_class.sort(key=lambda x: x[1])
hwy_mpg_by_class

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]

## Dates and Time

There are many different ways of storing date and time in Python. The `datetime` library comes with a bunch of useful features. 

Also the `time` library is useful in computing the time since epoch start, i.e., January 1st, 1970.

In [80]:
import datetime as dt
import time as tm

The `time` function in `time` library returns the number of seconds elapsed since the Epoch (used as a timestamp).

In [81]:
tm.time()

1597450114.07

Converting a timestamp into a datetime object

In [83]:
dt_now = dt.datetime.fromtimestamp(tm.time())
dt_now

datetime.datetime(2020, 8, 14, 20, 10, 23, 214239)

Useful attributes available in a `datetime` object

In [88]:
dt_now.year, dt_now.month, dt_now.day, dt_now.hour, dt_now.minute, dt_now.second

(2020, 8, 14, 20, 10, 23)

Using the `timedelta` function to express the difference between two datetime objects

In [89]:
delta = dt.timedelta(days = 100)
delta

datetime.timedelta(days=100)

Using the `date.today` method to return the current local date

In [90]:
today = dt.date.today()

In [93]:
today + delta # the date 100 days ago

datetime.date(2020, 11, 22)

In [97]:
today > today - delta # comparison of dates

True

## Advanced Python objects, the map function

### Object-oriented Programming in Python and Classes

In [101]:
# Example class in Python
class Person:
    department = 'RBC Capital Markets QTS' # class variable, available across all instances
    
    def set_name(self, new_name):
        self.name = new_name
        
    def set_location(self, new_location):
        self.location = new_location

In [102]:
# Instantiating an object of a class
person = Person()
person.set_name('Anunai Ishan')
person.set_location('New York, NY')
print('{} lives in {} and works in {}.'.format(person.name, person.location, person.department))

Anunai Ishan lives in New York, NY and works in RBC Capital Markets QTS.


### The `map` method

Example: Using the `min` function to map values between two lists

In [107]:
store1 = [8.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]

cheapest = map(min, store1, store2)
cheapest

<map at 0x104c5b9d0>

In [108]:
# lazy evaluation, useful for optimizing memory and resource usage while dealing with big data
# to view values, iterate through the map object
for item in cheapest:
    print(item)

8.0
11.0
12.34
2.01


## Advanced Python Lambdas and List Comprehensions

Lambdas in Python are simple function declarations that take in parameters and perform an operation

In [109]:
add_function = lambda a, b: a + b

In [110]:
add_function(10, 20)

30

List Comprehension is a way to iterate in Python more concisely while offering performance benefits.

In [114]:
# Example 1: Iterating from 1 to 999 and returning even numbers traditionally
my_list = []
for number in range(1, 1000):
    if number % 2 == 0:
        my_list.append(number)
        
my_list[:10]

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In [115]:
# Example 2: Iterating from 1 to 999 and returning even numbers using list comprehension
my_list = [number for number in range(1, 1000) if number % 2 == 0]
my_list[:10]

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

## The Numerical Python Library (NumPy)

In [12]:
import numpy as np

### Creating arrays in NumPy

In [15]:
# Converting list to a NumPy array
my_list = [1, 2, 3]
np_array = np.array(my_list)
print(type(np_array))

<class 'numpy.ndarray'>


In [16]:
# Alternatively, a list can be passed directly to the NumPy array function
np_array = np.array([4, 5, 6])
np_array

array([4, 5, 6])

In [17]:
# Creating a multi-dimensional array
# Requires passing a list of lists
m = np.array([[7, 8, 9], [10, 11, 12]])
m

array([[ 7,  8,  9],
       [10, 11, 12]])

In [20]:
# Using the shape property to find the dimensions of the NumPy array
# Shape is a tuple of (num of rows, num of columns)
m.shape

(2, 3)

Using `arange` function:
The `arange` function returns evenly spaced values within the interval specified as parameters.

In [23]:
# Starting at 0, counting up by 2 and stopping before 30
n = np.arange(0, 30, 2)
n

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

Using `reshape` vs `resize` functions:  
`reshape` returns an array with the same data with a new shape  
`resize` changes the shape and size of the array in-place

In [24]:
n = n.reshape(3, 5)
n

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

In [26]:
n.resize(5, 3)
n

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22],
       [24, 26, 28]])

Using `linspace` function: The `linspace` function returns evenly spaced number over the interval specified as parameter

In [27]:
# Returns 9 evenly spaced values from 0 to 4, both inclusive 
o = np.linspace(0, 4, 9)
o

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

Using `zeros` and `ones`:  
`zeros` returns a new array of given shape and type, filled with zeros.  
`ones` does the same, except fills the array with ones.

In [29]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [30]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

`eye` returns an identity matrix

In [32]:
# Returns an identity matrix with the dimension specified as argument
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

`diag` function returns a diagonal matrix with the one-dimensional array to put in diagonal specified as an argument

In [34]:
np.diag(my_list)

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

Create an array using a repeating list

In [35]:
np.array([1, 2, 3] * 3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

Repeat elements of an array using the `repeat` function

In [36]:
np.repeat([1, 2, 3], 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

### Combining Arrays

In [41]:
# Specify dimensions of the 2-D array as well as type, int, in this case
p = np.ones([2, 3], int)
p

array([[1, 1, 1],
       [1, 1, 1]])

Using `vstack` to stack arrays in sequence vertically (row wise)

In [44]:
np.vstack([p, 2 * p])

array([[1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2]])

Using `hstack` to stack arrays in sequence horizontally (column wise)

In [45]:
np.hstack([p, 2 * p])

array([[1, 1, 1, 2, 2, 2],
       [1, 1, 1, 2, 2, 2]])

### Operations

Use `+`, `-`, `*`, `/` and `**` to perform element wise addition, subtraction, multiplication, division and power, respectively.

In [47]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

In [48]:
print(x + y) # element wise addition

[5 7 9]


In [49]:
print(y - x) # element wise subtraction

[3 3 3]


In [50]:
print(x - y) # element wise subtraction

[-3 -3 -3]


In [51]:
print (x * y) # element wise multiplication

[ 4 10 18]


In [52]:
print(x / y) # element wise division

[0.25 0.4  0.5 ]


In [53]:
print (x ** 3) # element wise raising to a power

[ 1  8 27]


Dot Product (from linear algebra):

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [54]:
x.dot(y) # dot product (1 * 4) + (2 * 5) + (3 * 6)

32

In [56]:
z = np.array([y, y ** 2])
z

array([[ 4,  5,  6],
       [16, 25, 36]])

Transposing a matrix:

In [57]:
z.shape # dimensions before transposing

(2, 3)

In [58]:
z.T # result of transpose

array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])

In [60]:
z.T.shape # dimensions after transposing

(3, 2)

Using `dtype` for seeing the data type of elements:

In [61]:
z.dtype

dtype('int64')

Using `astype` to cast the elements to a specific type:

In [62]:
z = z.astype('f')
z.dtype

dtype('float32')

### Math Functions in NumPy

In [63]:
a = np.array([-4, -2, 1, 3, 5])

In [64]:
a.sum() # returns sum of all elements in the NumPy array

3

In [65]:
a.max() # returns the maximum element

5

In [66]:
a.min() # returns the minimum element

-4

In [67]:
a.mean() # returns the mean / average value of elements in the array

0.6

In [68]:
a.std() # find standard deviation of elements in the array

3.2619012860600183

Using `argmax` and `argmin` to return the indices of maximum and minimum elements in the array, respectively.

In [69]:
a.argmax()

4

In [70]:
a.argmin()

0

### Indexing and Slicing in NumPy

In [72]:
s = np.arange(13) ** 2
s

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144])

Use bracket notation to get the value at a specific index. Indexing starts at `0`, also can fetch the last element of the array using `-1`.

In [73]:
s[0], s[4], s[-1]

(0, 16, 144)

Use `:` to indicate a range. Leaving the values empty will default to the beginning/end of the array.

In [74]:
s[1:5]

array([ 1,  4,  9, 16])

In [76]:
s[:-2]

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

Backward counting can be achieved using negatives, as shown:

In [81]:
s[-5::-2] # starting from the back and counting backwards by 2 until beginning is reached

array([64, 36, 16,  4,  0])

__Multi-dimensional Arrays:__

In [82]:
r = np.arange(36)
r.resize((6, 6))
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

Using bracket notation to slice:
`array[row,column]`

In [83]:
r[2, 2]

14

Similar to single-dimensional arrays, using `:` in bracket notation helps select a range of rows and/or columns.

In [85]:
r[3, 3:6] # 4rd row and columns 4 to 6

array([21, 22, 23])

Selecting all rows up to and not including row 2, and all columns up to and not including the last column:

In [86]:
r[:2, :-1]

array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])

Slice of the last row and only every other element:

In [90]:
r[-1, ::2]

array([30, 32, 34])

Using conditional indexing:

In [91]:
# Selecting values from the array greater than 30
r[r > 30]

array([31, 32, 33, 34, 35])

Assigning all values in the array that are greater than 30, a value of 30:

In [92]:
r[r > 30] = 30

In [93]:
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

### Copying Data

In NumPy, modifying arrays happen in place as shown below.

In [94]:
r2 = r[:3, :3]
r2

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])

In [95]:
r2[:] = 0
r2

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

However, in changing the values in `r2` to `0`, the values in the corresponding indices in `r` has also changed.

In [96]:
r

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

To avoid copying in place, we use the `copy` function provided by NumPy, as shown below.

In [97]:
r_cp = r.copy()
r_cp

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

Now, upon modifying `r_cp`, the values in `r` will not change.

In [98]:
r_cp[:] = 0
print(r_cp, '\n')
print(r)

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]] 

[[ 0  0  0  3  4  5]
 [ 0  0  0  9 10 11]
 [ 0  0  0 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 30 30 30 30 30]]


### Iterating over NumPy Arrays

Generating a 4 X 3 array with random numbers between 0 and 9

In [99]:
test = np.random.randint(0, 10, (4, 3))
test

array([[4, 0, 9],
       [0, 3, 1],
       [2, 0, 7],
       [1, 2, 1]])

Iterating by row

In [100]:
for row in test:
    print(row)

[4 0 9]
[0 3 1]
[2 0 7]
[1 2 1]


Iterating by index

In [101]:
for i in range(len(test)):
    print(test[i])

[4 0 9]
[0 3 1]
[2 0 7]
[1 2 1]


Iterating by both row and index using `enumerate` function

In [102]:
for i, row in enumerate(test):
    print('Row', i, 'is', row)

Row 0 is [4 0 9]
Row 1 is [0 3 1]
Row 2 is [2 0 7]
Row 3 is [1 2 1]


Using `zip` to iterate over multiple iterables

In [103]:
test2 = test ** 2
test2

array([[16,  0, 81],
       [ 0,  9,  1],
       [ 4,  0, 49],
       [ 1,  4,  1]])

In [104]:
for row_1, row_2 in zip(test, test2):
    print(row_1, '+', row_2, '=', row_1 + row_2)

[4 0 9] + [16  0 81] = [20  0 90]
[0 3 1] + [0 9 1] = [ 0 12  2]
[2 0 7] + [ 4  0 49] = [ 6  0 56]
[1 2 1] + [1 4 1] = [2 6 2]
