## Homework: Introduction to Python 

**Due: Thursday, Sept. 21**

Your assignment should be handed in an ipython notebook checked into your github repository.

## Optional But Recommended Reading

Read the official [Python Tutorial](https://docs.python.org/3/tutorial/index.html) sections 1-5. This excellent, well written tutorial introduces all of the most important concepts of the core python language and its standard library. Even if you don't read it, you should know where to find it; if you get stuck, it should be your first place to turn for help.

## Constraints

* This assignment should be done in _basic python_: don't `import` any extra modules unless specifically instructed to do so.
* You should work on this assignment _alone_. 
* Try to do the problems yourself. Refer to the lecture notes and the [Python Tutorial](https://docs.python.org/3/tutorial/index.html) if you get stuck. Avoid random googling and stack-overflowing. A perfect solution is less important than the process of figuring it out yourself.


## 1. Dictionaries and Strings

In this section we will explore a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) whose keys and values are [strings](https://docs.python.org/3/tutorial/introduction.html#strings). 

Cut and paste the following code to read a data file. (You are not expected to understand exactly what is happening here...it's just a quick way for us to get some data into python.)

    import pickle
    with open('class_names.pkl', 'rb') as file:
        data = pickle.load(file)

This will load some pre-generated data into the variable `data`. 

In [72]:
import pickle
with open('class_names.pkl', 'rb') as file:
    data = pickle.load(file)

### 1.1 Examine the data by allowing the notebook to print it
(Just enter `data` on the last line of the cell and execute it.)

In [73]:
data

{'akb2134': 'Bozack, Anne',
 'cjc2252': 'Carchedi, Christopher',
 'cjc2256': 'Chesley, Christine',
 'cjm2246': 'Martinez-Zayas, Carlos',
 'cp2850': 'Peltier, Carly',
 'cr2630': 'Raymond, Colin',
 'csl2164': 'Lesk, Corey',
 'djk2120': 'Kennedy, Daniel',
 'dpb2141': 'Babin, Daniel',
 'ehc2150': 'Case, Elizabeth',
 'jg3223': 'Guo, Jean',
 'jwd2136': 'Doss-Gollin, James',
 'lkg2133': 'Gruenburg, Laura',
 'map2251': 'Pascolini-Campbell, Madeleine',
 'meg2203': 'Gemma, Marina',
 'mmf2171': 'Frenkel, Megan',
 'njl2134': 'Lenssen, Nathan',
 'nr2447': 'Ramesh, Nandini',
 'rae2148': 'Esparzagamez, Ricardo',
 'sb3210': 'Baek, Seung Hun',
 'scw2148': 'Wong, Suki',
 'sel2172': 'Lytle, Sara',
 'sw2936': 'Wang, Siyan',
 'tpj2104': 'Janoski, Tyler',
 'tz2218': 'Zhang, Tianbo',
 'ukm2103': 'Miller, Una',
 'xj2176': 'Jin, Xiaomeng',
 'yh3019': 'Huang, Yu'}

## 1.2 Comare to the output of `print(data)`
Which one is prettier?

In [74]:
print(data)

{'dpb2141': 'Babin, Daniel', 'sb3210': 'Baek, Seung Hun', 'akb2134': 'Bozack, Anne', 'cjc2252': 'Carchedi, Christopher', 'ehc2150': 'Case, Elizabeth', 'cjc2256': 'Chesley, Christine', 'jwd2136': 'Doss-Gollin, James', 'rae2148': 'Esparzagamez, Ricardo', 'mmf2171': 'Frenkel, Megan', 'meg2203': 'Gemma, Marina', 'lkg2133': 'Gruenburg, Laura', 'jg3223': 'Guo, Jean', 'yh3019': 'Huang, Yu', 'tpj2104': 'Janoski, Tyler', 'xj2176': 'Jin, Xiaomeng', 'djk2120': 'Kennedy, Daniel', 'njl2134': 'Lenssen, Nathan', 'csl2164': 'Lesk, Corey', 'sel2172': 'Lytle, Sara', 'cjm2246': 'Martinez-Zayas, Carlos', 'ukm2103': 'Miller, Una', 'map2251': 'Pascolini-Campbell, Madeleine', 'cp2850': 'Peltier, Carly', 'nr2447': 'Ramesh, Nandini', 'cr2630': 'Raymond, Colin', 'sw2936': 'Wang, Siyan', 'scw2148': 'Wong, Suki', 'tz2218': 'Zhang, Tianbo'}


**Answer:** I think just executing 'data' is more organised.

### 1.3 Determine what type of object is `data`

In [75]:
type(data)

dict

**Answer:** `data` is a dictionary

### 1.4 Use python's built-in `help` function to find out what operations are available on `data`
(What is the meaning of the methods that start with \__?)

**Answer:** Methods that start with __  are names of Python's own defined methods.

In [76]:
help(dict)

Help on class dict in module builtins:

class dict(object)
 |  dict() -> new empty dictionary
 |  dict(mapping) -> new dictionary initialized from a mapping object's
 |      (key, value) pairs
 |  dict(iterable) -> new dictionary initialized as if via:
 |      d = {}
 |      for k, v in iterable:
 |          d[k] = v
 |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
 |      in the keyword argument list.  For example:  dict(one=1, two=2)
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      True if D has a key k, else False.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |

By now it should be clear that `data` is a [dict](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)  (i.e. "dictionary") that maps UNIs to names for all students in the class.

For the remaining questions, the point is to _have python tell you the answer_. (Obviously you could just manually count / sort the data, but that defeats the purpose!)

### 1.5 How many entries are there in `data`

In [77]:
len(data) #neat count

28

In [78]:
#does a basic count of data
entries = 0
for i in data:
    entries +=1
print(entries)

28


There are 28 entries in data, which means there are 28 students in the class.

### 1.6 Is your UNI in `data`?

Referred to help(dict) for a boolean method which searches through the dictionary for a key.

In [80]:
'scw2148' in data

True

### 1.7 How many total times does the character "s" appear in the names?
Do this in a way that is not case sensitive,

In [81]:
values = data.values()
count = 0
for names in values:
    count += names.count('s')
    count += names.count('S')
print(count)

20


In [82]:
#Without for loop
names = list(data.values())
names = ''.join(names)
total = names.count('s') + names.count('S')
print(total)

20


### 1.7 Create a new dictionary that contains the same keys but only the first names as values

In [599]:
#Concise way
forename = [names.split(', ')[1] for names in data.values()]
new_data = dict(zip(data.keys(),forename))
new_data

{'akb2134': 'Anne',
 'cjc2252': 'Christopher',
 'cjc2256': 'Christine',
 'cjm2246': 'Carlos',
 'cp2850': 'Carly',
 'cr2630': 'Colin',
 'csl2164': 'Corey',
 'djk2120': 'Daniel',
 'dpb2141': 'Daniel',
 'ehc2150': 'Elizabeth',
 'jg3223': 'Jean',
 'jwd2136': 'James',
 'lkg2133': 'Laura',
 'map2251': 'Madeleine',
 'meg2203': 'Marina',
 'mmf2171': 'Megan',
 'njl2134': 'Nathan',
 'nr2447': 'Nandini',
 'rae2148': 'Ricardo',
 'sb3210': 'Seung Hun',
 'scw2148': 'Suki',
 'sel2172': 'Sara',
 'sw2936': 'Siyan',
 'tpj2104': 'Tyler',
 'tz2218': 'Tianbo',
 'ukm2103': 'Una',
 'xj2176': 'Xiaomeng',
 'yh3019': 'Yu'}

In [600]:
#longer way without using a separator
full_names=[val for val in values] #Make a list, as you can't index dict_values
    
forename=[]

for i in range(0,len(full_names)):
    count = 0
    for char in full_names[i]:
        if char is ' ':
            break
        else:
            count += 1
    forename.append(full_names[i][count+1:])
    
new_data = dict(zip(data.keys(),forename))
new_data

{'akb2134': 'Anne',
 'cjc2252': 'Christopher',
 'cjc2256': 'Christine',
 'cjm2246': 'Carlos',
 'cp2850': 'Carly',
 'cr2630': 'Colin',
 'csl2164': 'Corey',
 'djk2120': 'Daniel',
 'dpb2141': 'Daniel',
 'ehc2150': 'Elizabeth',
 'jg3223': 'Jean',
 'jwd2136': 'James',
 'lkg2133': 'Laura',
 'map2251': 'Madeleine',
 'meg2203': 'Marina',
 'mmf2171': 'Megan',
 'njl2134': 'Nathan',
 'nr2447': 'Nandini',
 'rae2148': 'Ricardo',
 'sb3210': 'Seung Hun',
 'scw2148': 'Suki',
 'sel2172': 'Sara',
 'sw2936': 'Siyan',
 'tpj2104': 'Tyler',
 'tz2218': 'Tianbo',
 'ukm2103': 'Una',
 'xj2176': 'Xiaomeng',
 'yh3019': 'Yu'}

### 1.8 What is the longest first name?

In [606]:
name_lengths = [len(i) for i in forename]

for i in range(len(name_lengths)):
    if name_lengths[i] is max(name_lengths):
        print(forename[i])

Christopher


### 1.9 Check that the first letter of each UNI matches the first letter of each first name
(Be careful about the case.)

In [611]:
UNI = [key for key in data.keys()]

for i in range(len(UNI)):
    if UNI[i][0].startswith(forename[i][0].lower()) is True:
        print('All match')
        break
    else:
        print('MISMATCH! between '+forename[i]+ 'and '+UNI[i])

All match


## 2 Lists and Numbers

In this section, we will play with a [list](https://docs.python.org/3/tutorial/introduction.html#lists) of [numbers](https://docs.python.org/3/tutorial/introduction.html#numbers).

Keep in mind that doing lots of numerical calculations on lists of numbers (using just core python) is somewhat awkward and inefficient. [Numpy](http://www.numpy.org/) makes this sort of work much easier; in order to appreciate numpy, however, it is useful to first do things the "hard way".

Run this chunk of code to load some data.

    with open('numbers.pkl', 'rb') as file:
        numbers = pickle.load(file)

In [87]:
with open('numbers.pkl', 'rb') as file:
    numbers = pickle.load(file)

### 2.1 Confirm that `numbers` is indeed a list

In [88]:
type(numbers)

list

### 2.2 How many items are in `numbers`?

In [89]:
len(numbers)

2519

### 2.3 What is the difference between the first and last values?

In [90]:
numbers[0]-numbers[-1]

-8827.208984375

Magnitude of difference is:

In [91]:
abs(numbers[0]-numbers[-1])

8827.208984375

### 2.4 What are thre first five and last five items?

In [92]:
print('First five: ',numbers[:5])
print('Last five: ',numbers[-5:])

First five:  [13291.650390625, 13424.8798828125, 13442.51953125, 13403.419921875, 13739.3896484375]
Last five:  [21807.640625, 21784.779296875, 21797.7890625, 22057.369140625, 22118.859375]


### 2.5 What is the type of the first item? Confirm that all items have the same type.

In [93]:
type(numbers[0])

float

In [619]:
for i in range(len(numbers)):
    if type(numbers[i]) is float:
        print('All are floats')
        break
    else:
        print('Not all items are the same type')

All are floats


### 2.6 What are the minimum and maximum values in `numbers`?

In [95]:
print('(minimum, maximum) =', (min(numbers), max(numbers)))

(minimum, maximum) = (6547.0498046875, 22118.859375)


### 2.7 What is the mean value?

In [96]:
mean = sum(numbers)/len(numbers)
print('mean =', mean)

mean = 14174.955465222125


### 2.8 What is the standard deviation?

In [97]:
a=[(numbers[i]-mean)**2 for i in range(len(numbers))]
(sum(a)/len(numbers))**0.5 #check the formula

3568.729182942298

### 2.9 What is the largest (most positive) and smallest (most negative)  _change_ from one value to the next

In [98]:
change=[]
i=0
while i < len(numbers)-1:
    change.append([numbers[i]-numbers[i+1]])
    i+=1
print('Largest positive change: ',max(change),'; Largest negative change: ',min(change))

Largest positive change:  [777.6796875] ; Largest negative change:  [-936.419921875]


### Bonus: can you guess the source of these numbers? 
*Answer: the Dow Jones industrial average, 10 year chart*

## 3 Lists vs. Tuples

The difference between [lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) and [tuples](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) is a persistent source of confusion about python. The key difference is that **lists are mutable** while **tuples are immutable**. This short exercise is designed to help you understand what this means.

### 3.1 Create a list of the first three planets in the solar system

In [99]:
planets = ['Mercury','Venus','Earth']

### 3.2 Append the fourth planet to the end of the list
(Print the list so you can verify its contents after every modification)

In [100]:
planets.append('Mars')
planets

['Mercury', 'Venus', 'Earth', 'Mars']

### 3.3 Venus has exploded. Remove it from the list

In [101]:
planets.remove('Venus')
planets

['Mercury', 'Earth', 'Mars']

### 3.4 Convert the last item of your list to upper case

In [102]:
planets[-1] = planets[-1].upper()
planets

['Mercury', 'Earth', 'MARS']

### 3.5 Create a tuple of the first three planets in the solar system

In [103]:
planetuple = ('Mercury','Venus','Earth')
planetuple

('Mercury', 'Venus', 'Earth')

### 3.6 Try to append or remove items from the tuple
Go ahead, try it! I dare you!

In [105]:
planetuple.append('Alderaan')
planetuple

AttributeError: 'tuple' object has no attribute 'append'

### 3.7 Create a new tuple by concatenating a second tuple to your original tuple
(No loops needed). It should work. Check your original tuple. Did it change?

In [106]:
planetuple = planetuple + ('apples','bananas','syrup')
planetuple

('Mercury', 'Venus', 'Earth', 'apples', 'bananas', 'syrup')

## 4 Standard Library: `datetime`

Basic python comes with a powerful ["standard library"](https://docs.python.org/3/tutorial/stdlib.html) of modules which help you do useful things like interact with the operating system, perform complex pattern matching on strings using regular expressions, and open network connections. This standard library is too big for us to learn every module: the point here is just to be aware that it exists.

In this exercise we will also explore a simple but powerful module from python's standard library: the [`datetime` module](https://docs.python.org/3/tutorial/stdlib.html#dates-and-times).

### 4.1 Import the datetime module

In [107]:
import datetime as dt

### 4.2 Read the built-in help on `datetime.date` and `datetime.timedelta`

These are the two functions we will be using from the `datetime` module.

In [108]:
help(dt.date)

Help on class date in module datetime:

class date(builtins.object)
 |  date(year, month, day) --> date object
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      Formats self with strftime.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __radd__(self, value, /)
 |      Return value+self.
 |  
 |  __reduce__(...)
 |      __reduce__() -> (cls,

In [109]:
help(dt.timedelta)

Help on class timedelta in module datetime:

class timedelta(builtins.object)
 |  Difference between two datetime values.
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __divmod__(self, value, /)
 |      Return divmod(self, value).
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __floordiv__(self, value, /)
 |      Return self//value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mod__(self, value, /)
 |      Return self%value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.
 |  
 |  __

### 4.3 Create a `datetime.date` object for your birthday

In [110]:
birthday = dt.date(1994,11,15)
birthday

datetime.date(1994, 11, 15)

### 4.4 What day of the week was your birthday on?

In [111]:
Days=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
print('I was born on a',Days[birthday.isoweekday() - 1])

I was born on a Tuesday


### 4.5 Find out how many days have elapsed since your birthday?

In [129]:
age = dt.date.today() - birthday
print('The number of days elapsed since my birthday: ',age.days, 'days')

The number of days elapsed since my birthday:  8345 days


## 5 Functions

Learning to write our own [functions](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) is an important step towards writing more sophisticated programs. Here will will practice writing functions. 

As an example, we will consider the [harmonic series](https://en.wikipedia.org/wiki/Harmonic_series_%28mathematics%29), defined as

$$
\sum^N_{n=1}\frac{1}{n} 
$$

This series diverges logarithmically.

### 5.1 Write a function to compute the value of the harmonic series up to `N` terms

The function shold have one argument: `N`. It should `return` the series sum.

In [576]:
def harmonic_series(N):
    return sum((1/n) for n in range(1,N+1))

### 5.2 What is the value of the harmonic series after 1000 and 1000000 iterations

In [577]:
harmonic_series(1000)

7.485470860550343

In [578]:
harmonic_series(1000000)

14.392726722864989

The alternating harmonic series:
    
$$
\sum^N_{n=1}\frac{(-1)^{n+1}}{n} 
$$

_does_ converge.

### 5.3 Write a function to compute the value of the alternating  harmonic series up to `N` terms

In [579]:
def harmonic_alt(N):
    return sum(((-1)**(n+1))/n for n in range(1,N+1))

### 5.2 What is the value of the alternating harmonic series after 1000 and 1000000 iterations

In [580]:
harmonic_alt(1000)

0.6926474305598223

In [581]:
harmonic_alt(1000000)

0.6931466805602525

### 5.3 Numerical convergence test

Now for something more complicated: we will write a function that tests whether another function converges. As its first argument, this function should accept _another function_. It should also have _optional keyword arguments_ that specify

* the numerical tolerance (how close is close enough to consider a series converged?)
* the number of iterations (we can't iterate an infinite number of times, we just have to pick a "large number")

This function should print a statement telling what it found. If the test function converges, print the value it converges to.

_(Hint: the alernating harmonic series oscillates strongly around the asymptotic value. So checking neighboring points may not be the best way to test for convergence. Can you think of a more robust method?)_

**1st Approach:**

Answer: Series converges to 0.6931466805602525 at iteration no. `1000000`

In [583]:
#The difference between the odd terms in the alternating series should be positive.
#If the difference is negative, then the series diverges.

def convergence_test(harmonic_alt,tolerance=0.00001,N=1000000):
    if harmonic_alt(2*N-1) - harmonic_alt(2*N+1) < 0:
        print('Series diverges.')
    elif harmonic_alt(2*N-1) - harmonic_alt(2*N+1) > 0:
        if abs(harmonic_alt(N) - harmonic_alt(N+1)) <= tolerance:
            print('Series converges to',harmonic_alt(N),'at iteration no.',N)
        else:
            print('Series is conditionally convergent. Try increasing iterations, N, or relaxing the tolerance.')

In [584]:
convergence_test(harmonic_alt)

Series converges to 0.6931466805602525 at iteration no. 1000000


In [585]:
convergence_test(harmonic_series)

Series diverges.


**2nd Approach: my nested if statements makes it super slow to test more iterations** I wanted to print iteration no. at which convergence was reached.

Answer: The series converges to 0.6921481805579461 at iteration no. 500

In [589]:
#The difference between the odd terms in the alternating series should be positive.
#If the difference is negative, then the series diverges.
#If positive, and the difference is also below the tolerance, then the series is considered to converge.
#The alternating harmonic series converges conditionally, so a tradeoff must be met between the tolerance and number of iterations.

def convergence_test_2(harmonic_alt,tolerance=0.000001,N=2000):
    difference = [harmonic_alt((2*i)-1)-harmonic_alt((2*i)+1) for i in range(1,N+1)]
    for val in range(len(difference)):
        if difference[val] > 0:
            if difference[val] <= tolerance:
                print('The series converges to',harmonic_alt(val+1),'at iteration no.', val+1)
                break
        else:
            print('The series diverges.')
            break
    if difference[-1] > tolerance:
        print('The series converges conditionally. Try changing the arguments by increasing the number of iterations, N, or by relaxing the numerical tolerance.')

In [590]:
convergence_test_2(harmonic_alt)

The series converges to 0.6921481805579461 at iteration no. 500


In [592]:
convergence_test_2(harmonic_series)

The series diverges.


### 5.3 Use your convergence test function see that the harmonic series diverges and the alternating harmonic series converges?

Explore the sensitivity to the two optional parameters. What are the tradeoffs between tolerance and number of iterations?

In [593]:
convergence_test(harmonic_series) #Diverging series

Series diverges.


In [595]:
convergence_test(harmonic_alt) #Alternating series converges to the 'natural log of 2'

Series converges to 0.6931466805602525 at iteration no. 1000000


In [596]:
convergence_test_2(harmonic_series) #2nd approach is also able to test for convergence/divergence.

The series diverges.


In [597]:
convergence_test_2(harmonic_alt) #The fewer iterations has resulted in a less accurate answer than in the first approach.

The series converges to 0.6921481805579461 at iteration no. 500


### Answer:

Using fewer iterations results in a less accurate converging value, as does relaxing the tolerance. Vice versa.
**Tradeoffs:** The higher the tolerance (more relaxed), the greater the number of iterations, N, needed to get to the sum of the alternating harmonic series, ln(2). Vice versa.