In [28]:
from __future__ import print_function

# Exercises

## Q 1

When talking about floating point, we discussed _machine epsilon_, $\epsilon$ —
this is the smallest number that when added to 1 is still different from 1.

We'll compute $\epsilon$ here:

  * Pick an initial guess for $\epsilon$ of `eps = 1`.  

  * Create a loop that checks whether `1 + eps` is different from `1`
  
  * Each loop iteration, cut the value of `eps` in half
  
What value of $\epsilon$ do you find?

In [1]:
eps = 1
while 1+eps != 1:
    eps = eps/2
print(eps)

1.1102230246251565e-16


## Q 2

To iterate over the tuples, where the _i_-th tuple contains _i_-th elements of certain sequences, we can use `zip(*sequences)` function.

We will iterate over two lists, `names` and `age`, and print out the resulting tuples.

  * Start by initializing lists `names = ["Mary", "John", "Sarah"]` and `age = [21, 56, 98]`.
  
  * Iterate over the tuples containing a name and an age, the `zip(list1, list2)` function might be useful here.
  
  * Print out formatted strings of the type "*NAME is AGE years old*".

In [131]:
names = ["Mary", "John", "Sarah"]
age = [21, 56, 98]
for t in zip(names,age):
    print('%s is %d years old' % (t[0],t[1]))


Mary is 21 years old
John is 56 years old
Sarah is 98 years old


## Q 3

The function `enumerate(sequence)` returns tuples containing indecies of objects in the sequence, and the objects. 

The `random` module provides tools for working with the random objects. In particular, `random.randint(start, end)` generates a random number not smaller than `start`, and not bigger than `end`.

  * Generate a list of 10 random numbers from 0 to 9.
  
  * Using the `enumerate(random_list)` function, iterate over the tuples of random numbers and their indecies, and print out *"Match: NUMBER and INDEX"* if the random number and its index in the list match.

In [130]:
import random
random_list = random.sample(range(0,10),10)
for t in enumerate(random_list):
    if t[0] == t[1]:
        print('Match: %d and %d' % (t))

Match: 1 and 1
Match: 5 and 5


## Q 4

The Fibbonacci sequence is a numerical sequence where each number is the sum of the 2 preceding numbers, e.g., 1, 1, 2, 3, 5, 8, 13, ...

Create a list where the elements are the terms in the Fibbonacci sequence:

  * Start with the list `fib = [1, 1]`
  
  * Loop 25 times, compute the next term as the sum of the previous 2 terms and append to the list
  
  * After the loop is complete, print out the terms 
  
You may find it useful to use `fib[-1]` and `fib[-2]` to access the last to items in the list

In [27]:
fib = [1,1]
for i in range(0,25):
    fib.append(fib[-1]+fib[-2])
print(fib)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418]


## Q 5

We can use the `input()` function to ask for input from the prompt (note: in python 2 the function was called `raw_input()`).

Create an empty list and use a while loop to ask the user for input and append their input to the list.  Keep looping until 10 items are added to the list

In [None]:
mylist = []
for i in range(0,10):
    mylist.append(input())
print(mylist)

## Q 6

Here is a list of book titles (from http://thegreatestbooks.org).  Loop through the list and capitalize each word in each title.  You might find the `.capitalize()` method that works on strings useful.

In [136]:
titles = ["don quixote", 
          "in search of lost time", 
          "ulysses", 
          "the odyssey", 
          "war and peace", 
          "moby dick", 
          "the divine comedy", 
          "hamlet", 
          "the adventures of huckleberry finn", 
          "the great gatsby"]

In [137]:
titles = [" ".join([t.capitalize() for t in title.split()]) for title in titles]
titles

['Don Quixote',
 'In Search Of Lost Time',
 'Ulysses',
 'The Odyssey',
 'War And Peace',
 'Moby Dick',
 'The Divine Comedy',
 'Hamlet',
 'The Adventures Of Huckleberry Finn',
 'The Great Gatsby']

## <span class="fa fa-star"></span> Q 7

Here's some text (the Gettysburg Address).  Our goal is to count how many times each word repeats.  We'll do a brute force method first, and then we'll look a ways to do it more efficiently (and compactly).

In [111]:
gettysburg_address = """
Four score and seven years ago our fathers brought forth on this continent, 
a new nation, conceived in Liberty, and dedicated to the proposition that 
all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or 
any nation so conceived and so dedicated, can long endure. We are met on
a great battle-field of that war. We have come to dedicate a portion of
that field, as a final resting place for those who here gave their lives
that that nation might live. It is altogether fitting and proper that we
should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we
can not hallow -- this ground. The brave men, living and dead, who struggled
here, have consecrated it, far above our poor power to add or detract.  The
world will little note, nor long remember what we say here, but it can never
forget what they did here. It is for us the living, rather, to be dedicated
here to the unfinished work which they who fought here have thus far so nobly
advanced. It is rather for us to be here dedicated to the great task remaining
before us -- that from these honored dead we take increased devotion to that
cause for which they gave the last full measure of devotion -- that we here
highly resolve that these dead shall not have died in vain -- that this
nation, under God, shall have a new birth of freedom -- and that government
of the people, by the people, for the people, shall not perish from the earth.
"""

We've already seen the `.split()` method will, by default, split by spaces, so it will split this into words, producing a list:

In [112]:
ga = gettysburg_address.split()

In [113]:
ga

['Four',
 'score',
 'and',
 'seven',
 'years',
 'ago',
 'our',
 'fathers',
 'brought',
 'forth',
 'on',
 'this',
 'continent,',
 'a',
 'new',
 'nation,',
 'conceived',
 'in',
 'Liberty,',
 'and',
 'dedicated',
 'to',
 'the',
 'proposition',
 'that',
 'all',
 'men',
 'are',
 'created',
 'equal.',
 'Now',
 'we',
 'are',
 'engaged',
 'in',
 'a',
 'great',
 'civil',
 'war,',
 'testing',
 'whether',
 'that',
 'nation,',
 'or',
 'any',
 'nation',
 'so',
 'conceived',
 'and',
 'so',
 'dedicated,',
 'can',
 'long',
 'endure.',
 'We',
 'are',
 'met',
 'on',
 'a',
 'great',
 'battle-field',
 'of',
 'that',
 'war.',
 'We',
 'have',
 'come',
 'to',
 'dedicate',
 'a',
 'portion',
 'of',
 'that',
 'field,',
 'as',
 'a',
 'final',
 'resting',
 'place',
 'for',
 'those',
 'who',
 'here',
 'gave',
 'their',
 'lives',
 'that',
 'that',
 'nation',
 'might',
 'live.',
 'It',
 'is',
 'altogether',
 'fitting',
 'and',
 'proper',
 'that',
 'we',
 'should',
 'do',
 'this.',
 'But,',
 'in',
 'a',
 'larger',
 '

Now, the next problem is that some of these still have punctuation.  In particular, we see "`.`", "`,`", and "`--`".

When considering a word, we can get rid of these by using the `replace()` method:

In [108]:
a = "end.,"
b = a.replace(".", "").replace(",","")
b

'end'

Another problem is case&mdash;we want to count "but" and "But" as the same.  Strings have a `lower()` method that can be used to covert a string:

In [138]:
a = "But."
b = "but."
a == b

False

In [139]:
a.lower() == b.lower()

True

Recall that strings are immutable, so `replace()` produces a new string on output.

## your task

Create a dictionary that uses the unique words as keys and has as a value the number of times that word appears.  

Write a loop over the words in the string (using our split version) and do the following:
  * remove any punctuation
  * convert to lowercase
  * test if the word is already a key in the dictionary (using the `in` operator)
     - if the key exists, increment the word count for that key
     - otherwise, add it to the dictionary with the appropiate count of `1`.

At the end, print out the words and a count of how many times they appear

In [153]:
d = {}
for word in ga:
    word = word.lower()
    for char in word:
        if not char.isalnum():
            word = word.replace(char,"")
    if word in d:
        d[word] += 1
        a += 1
    elif len(word) != 0:
        d[word] = 1
print(d)

{'four': 1, 'score': 1, 'and': 6, 'seven': 1, 'years': 1, 'ago': 1, 'our': 2, 'fathers': 1, 'brought': 1, 'forth': 1, 'on': 2, 'this': 4, 'continent': 1, 'a': 7, 'new': 2, 'nation': 5, 'conceived': 2, 'in': 4, 'liberty': 1, 'dedicated': 4, 'to': 8, 'the': 11, 'proposition': 1, 'that': 13, 'all': 1, 'men': 2, 'are': 3, 'created': 1, 'equal': 1, 'now': 1, 'we': 10, 'engaged': 1, 'great': 3, 'civil': 1, 'war': 2, 'testing': 1, 'whether': 1, 'or': 2, 'any': 1, 'so': 3, 'can': 5, 'long': 2, 'endure': 1, 'met': 1, 'battlefield': 1, 'of': 5, 'have': 5, 'come': 1, 'dedicate': 2, 'portion': 1, 'field': 1, 'as': 1, 'final': 1, 'resting': 1, 'place': 1, 'for': 5, 'those': 1, 'who': 3, 'here': 8, 'gave': 2, 'their': 1, 'lives': 1, 'might': 1, 'live': 1, 'it': 5, 'is': 3, 'altogether': 1, 'fitting': 1, 'proper': 1, 'should': 1, 'do': 1, 'but': 2, 'larger': 1, 'sense': 1, 'not': 5, 'consecrate': 1, 'hallow': 1, 'ground': 1, 'brave': 1, 'living': 2, 'dead': 3, 'struggled': 1, 'consecrated': 1, 'far':

## More compact way

We can actually do this a lot more compactly by using another list comprehensions and another python datatype called a set.  A set is a group of items, where each item is unique (e.g., no repetitions).

Here's a list comprehension that removes all the punctuation and converts to lower case:

In [154]:
words = [q.lower().replace(".", "").replace(",", "") for q in ga]

and by using the `set()` function, we turn the list into a set, removing any duplicates:

In [155]:
unique_words = set(words)

now we can loop over the unique words and use the `count` method of a list to find how many there are

In [156]:
count = {}
for uw in unique_words:
    count[uw] = words.count(uw)
    
count

{'gave': 2,
 'proper': 1,
 'dedicated': 4,
 'people': 3,
 'power': 1,
 'and': 6,
 'earth': 1,
 'for': 5,
 'ground': 1,
 'now': 1,
 'their': 1,
 'sense': 1,
 'remaining': 1,
 'task': 1,
 'nobly': 1,
 'hallow': 1,
 'civil': 1,
 'never': 1,
 'are': 3,
 'fought': 1,
 'increased': 1,
 'not': 5,
 'this': 4,
 'brave': 1,
 'forget': 1,
 'seven': 1,
 'those': 1,
 'as': 1,
 'so': 3,
 'do': 1,
 'died': 1,
 'from': 2,
 'resting': 1,
 'fathers': 1,
 'struggled': 1,
 'should': 1,
 'here': 8,
 'government': 1,
 'testing': 1,
 'might': 1,
 'consecrate': 1,
 'any': 1,
 'is': 3,
 'add': 1,
 'perish': 1,
 'little': 1,
 '--': 7,
 'which': 2,
 'god': 1,
 'portion': 1,
 'battle-field': 1,
 'remember': 1,
 'men': 2,
 'advanced': 1,
 'full': 1,
 'be': 2,
 'equal': 1,
 'vain': 1,
 'birth': 1,
 'consecrated': 1,
 'nor': 1,
 'our': 2,
 'nation': 5,
 'last': 1,
 'thus': 1,
 'on': 2,
 'new': 2,
 'conceived': 2,
 'all': 1,
 'dead': 3,
 'devotion': 2,
 'but': 2,
 'long': 2,
 'the': 11,
 'under': 1,
 'final': 1,
 'un

Even shorter -- we can use a dictionary comprehension, like a list comprehension

In [157]:
c = {uw: count[uw] for uw in unique_words}

In [158]:
c

{'gave': 2,
 'proper': 1,
 'dedicated': 4,
 'people': 3,
 'power': 1,
 'and': 6,
 'earth': 1,
 'for': 5,
 'ground': 1,
 'now': 1,
 'their': 1,
 'sense': 1,
 'remaining': 1,
 'task': 1,
 'nobly': 1,
 'hallow': 1,
 'civil': 1,
 'never': 1,
 'are': 3,
 'fought': 1,
 'increased': 1,
 'not': 5,
 'this': 4,
 'brave': 1,
 'forget': 1,
 'seven': 1,
 'those': 1,
 'as': 1,
 'so': 3,
 'do': 1,
 'died': 1,
 'from': 2,
 'resting': 1,
 'fathers': 1,
 'struggled': 1,
 'should': 1,
 'here': 8,
 'government': 1,
 'testing': 1,
 'might': 1,
 'consecrate': 1,
 'any': 1,
 'is': 3,
 'add': 1,
 'perish': 1,
 'little': 1,
 '--': 7,
 'which': 2,
 'god': 1,
 'portion': 1,
 'battle-field': 1,
 'remember': 1,
 'men': 2,
 'advanced': 1,
 'full': 1,
 'be': 2,
 'equal': 1,
 'vain': 1,
 'birth': 1,
 'consecrated': 1,
 'nor': 1,
 'our': 2,
 'nation': 5,
 'last': 1,
 'thus': 1,
 'on': 2,
 'new': 2,
 'conceived': 2,
 'all': 1,
 'dead': 3,
 'devotion': 2,
 'but': 2,
 'long': 2,
 'the': 11,
 'under': 1,
 'final': 1,
 'un