In [1]:
from __future__ import print_function

_We'll work through this notebook together in class_

## Lists, Tuples, and Dictionaries Review

In the notebooks we looked over at home, we were introduced to a few different data structures that allow us to group data together.  So far, we haven't learned how to take action depending on values, but we'll see that soon.

A compact way we saw to fill a list was via a list comprehension:

In [2]:
a = [x**2 for x in range(10)]
a

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

dictionaries hold a key and value:

In [3]:
b = {"one": 1, "two": 2, "three": 3}
b

{'one': 1, 'three': 3, 'two': 2}

We had a bit of a discussion of tuples on our slack, and how they differ from lists.  The most obvious way is that they are immutable.  Often we'll see tuples used to store related data that should all be interpreted together.  A good example is a Cartesian point, (x,y).  Here's a list of points:

In [4]:
points = []
points.append((1,2))
points.append((2,3))
points.append((3,4))
points

[(1, 2), (2, 3), (3, 4)]

we can even generate these for a curve using a list comprehension:

In [5]:
points = [(x, 2*x + 5) for x in range(10)]
points

[(0, 5),
 (1, 7),
 (2, 9),
 (3, 11),
 (4, 13),
 (5, 15),
 (6, 17),
 (7, 19),
 (8, 21),
 (9, 23)]

## Control Flow

To write a program, we need the ability to iterate and take action based on the values of a variable.  This includes if-tests and loops.

Python uses whitespace to denote a block of code.

### While Loop

A simple while loop -- notice the indentation to denote the block that is part of the loop.

Here we also use the compact `+=` operator: `n += 1` is the same as `n = n + 1`

In [6]:
n = 0
while n < 10:
    print(n)
    n += 1

0
1
2
3
4
5
6
7
8
9


This was a very simple example.  But often we'll use the `range()` function in this situation.  Note that `range()` can take a stride.

In [7]:
for n in range(2, 10, 2):
    print(n)

2
4
6
8


In [8]:
print(list(range(10)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [9]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(se

### `if` Statements

`if` allows for branching.  python does not have a select/case statement like some other languages, but `if`, `elif`, and `else` can reproduce any branching functionality you might need.

In [10]:
x = 0

if x < 0:
    print("negative")
elif x == 0:
    print("zero")
else:
    print("positive")


zero


### Iterating Over Elements

it's easy to loop over items in a list or any _iterable_ object.  The `in` operator is the key here.

In [11]:
alist = [1, 2.0, "three", 4]
for a in alist:
    print(a)

1
2.0
three
4


In [12]:
for c in "this is a string":
    print(c)

t
h
i
s
 
i
s
 
a
 
s
t
r
i
n
g


We can combine loops and if-tests to do more complex logic, like break out of the loop when you find what you're looking for

In [13]:
n = 0
for a in alist:
    if a == "three":
        break
    else:
        n += 1

print(n)


2


(for that example, however, there is a simpler way)

In [14]:
print(alist.index("three"))

2


for dictionaries, you can also loop over the elements

In [15]:
my_dict = {"key1":1, "key2":2, "key3":3}

for k, v in my_dict.items():
    print("key = {}, value = {}".format(k, v))    # notice how we do the formatting here


key = key1, value = 1
key = key3, value = 3
key = key2, value = 2


In [16]:
for k in sorted(my_dict):
    print(k, my_dict[k])

key1 1
key2 2
key3 3


sometimes we want to loop over a list element and know its index -- `enumerate()` helps here:

In [17]:
for n, a in enumerate(alist):
    print(n, a)

0 1
1 2.0
2 three
3 4


In [18]:
help(enumerate)

Help on class enumerate in module builtins:

class enumerate(object)
 |  enumerate(iterable[, start]) -> iterator for index, value of iterable
 |  
 |  Return an enumerate object.  iterable must be another object that supports
 |  iteration.  The enumerate object yields pairs containing a count (from
 |  start, which defaults to zero) and a value yielded by the iterable argument.
 |  enumerate is useful for obtaining an indexed list:
 |      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.



## In-class Exercises

## Q 1

We can use the `input()` function to ask for input from the prompt (note: in python 2 the function was called `raw_input()`).

Create an empty list and use a while loop to ask the user for input and append their input to the list.  Keep looping until 10 items are added to the list

## Q 2

Here's some text (the Gettysburg Address).  Our goal is to count how many times each word repeats.  We'll do a brute force method first, and then we'll look a ways to do it more efficiently (and compactly).

In [19]:
gettysburg_address = """
Four score and seven years ago our fathers brought forth on this continent, 
a new nation, conceived in Liberty, and dedicated to the proposition that 
all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or 
any nation so conceived and so dedicated, can long endure. We are met on
a great battle-field of that war. We have come to dedicate a portion of
that field, as a final resting place for those who here gave their lives
that that nation might live. It is altogether fitting and proper that we
should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we
can not hallow -- this ground. The brave men, living and dead, who struggled
here, have consecrated it, far above our poor power to add or detract.  The
world will little note, nor long remember what we say here, but it can never
forget what they did here. It is for us the living, rather, to be dedicated
here to the unfinished work which they who fought here have thus far so nobly
advanced. It is rather for us to be here dedicated to the great task remaining
before us -- that from these honored dead we take increased devotion to that
cause for which they gave the last full measure of devotion -- that we here
highly resolve that these dead shall not have died in vain -- that this
nation, under God, shall have a new birth of freedom -- and that government
of the people, by the people, for the people, shall not perish from the earth.
"""

A useful operation on a string is the split it.  The `split()` method will, by default, split by spaces, so it will split this into words, producing a list:

In [20]:
ga = gettysburg_address.split()

In [21]:
ga

['Four',
 'score',
 'and',
 'seven',
 'years',
 'ago',
 'our',
 'fathers',
 'brought',
 'forth',
 'on',
 'this',
 'continent,',
 'a',
 'new',
 'nation,',
 'conceived',
 'in',
 'Liberty,',
 'and',
 'dedicated',
 'to',
 'the',
 'proposition',
 'that',
 'all',
 'men',
 'are',
 'created',
 'equal.',
 'Now',
 'we',
 'are',
 'engaged',
 'in',
 'a',
 'great',
 'civil',
 'war,',
 'testing',
 'whether',
 'that',
 'nation,',
 'or',
 'any',
 'nation',
 'so',
 'conceived',
 'and',
 'so',
 'dedicated,',
 'can',
 'long',
 'endure.',
 'We',
 'are',
 'met',
 'on',
 'a',
 'great',
 'battle-field',
 'of',
 'that',
 'war.',
 'We',
 'have',
 'come',
 'to',
 'dedicate',
 'a',
 'portion',
 'of',
 'that',
 'field,',
 'as',
 'a',
 'final',
 'resting',
 'place',
 'for',
 'those',
 'who',
 'here',
 'gave',
 'their',
 'lives',
 'that',
 'that',
 'nation',
 'might',
 'live.',
 'It',
 'is',
 'altogether',
 'fitting',
 'and',
 'proper',
 'that',
 'we',
 'should',
 'do',
 'this.',
 'But,',
 'in',
 'a',
 'larger',
 '

Now, the next problem is that some of these still have punctuation.  In particular, we see "`.`", "`,`", and "`--`".

When considering a word, we can get rid of these by using the `replace()` method:

In [22]:
a = "end.,"
b = a.replace(".", "").replace(",", "")
b

'end'

Another problem is case -- we want to count "but" and "But" as the same.  Strings have a `lower()` method that can be used to covert a string:

In [23]:
a = "But"
b = "but"
a == b

False

In [24]:
my_dict

{'key1': 1, 'key2': 2, 'key3': 3}

In [25]:
"key4" in my_dict

False

In [26]:
a.lower() == b.lower()

True

Recall that strings are immutable, so `replace()` produces a new string on output.

## your task

Create a dictionary that uses the unique words as keys and has as a value the number of times that word appears.  

Write a loop over the words in the string (using our split version) and do the following:
  * remove any punctuation
  * convert to lowercase
  * test if the word is already a key in the dictionary (using the `in` operator)
     - if the key exists, increment the word count for that key
     - otherwise, add it to the dictionary with the appropiate count of `1`.

At the end, print out the words and a count of how many times they appear

In [27]:
# your code here
words = {}
for word in ga:
    b = word.replace(".", "").replace(",", "").lower()
    if b in words:
        words[b] += 1
    else:
        words[b] = 1
        
words

{'--': 7,
 'a': 7,
 'above': 1,
 'add': 1,
 'advanced': 1,
 'ago': 1,
 'all': 1,
 'altogether': 1,
 'and': 6,
 'any': 1,
 'are': 3,
 'as': 1,
 'battle-field': 1,
 'be': 2,
 'before': 1,
 'birth': 1,
 'brave': 1,
 'brought': 1,
 'but': 2,
 'by': 1,
 'can': 5,
 'cause': 1,
 'civil': 1,
 'come': 1,
 'conceived': 2,
 'consecrate': 1,
 'consecrated': 1,
 'continent': 1,
 'created': 1,
 'dead': 3,
 'dedicate': 2,
 'dedicated': 4,
 'detract': 1,
 'devotion': 2,
 'did': 1,
 'died': 1,
 'do': 1,
 'earth': 1,
 'endure': 1,
 'engaged': 1,
 'equal': 1,
 'far': 2,
 'fathers': 1,
 'field': 1,
 'final': 1,
 'fitting': 1,
 'for': 5,
 'forget': 1,
 'forth': 1,
 'fought': 1,
 'four': 1,
 'freedom': 1,
 'from': 2,
 'full': 1,
 'gave': 2,
 'god': 1,
 'government': 1,
 'great': 3,
 'ground': 1,
 'hallow': 1,
 'have': 5,
 'here': 8,
 'highly': 1,
 'honored': 1,
 'in': 4,
 'increased': 1,
 'is': 3,
 'it': 5,
 'larger': 1,
 'last': 1,
 'liberty': 1,
 'little': 1,
 'live': 1,
 'lives': 1,
 'living': 2,
 'long'

## More compact way

We can actually do this a lot more compactly by using another list comprehensions and another python datatype called a set.  A set is a group of items, where each item is unique (e.g., no repetitions).

Here's a list comprehension that removes all the punctuation and converts to lower case:

In [28]:
words = [q.lower().replace(".", "").replace(",", "") for q in ga]

and by using the `set()` function, we turn the list into a set, removing any duplicates:

In [29]:
unique_words = set(words)

now we can loop over the unique words and use the `count` method of a list to find how many there are

In [30]:
count = {}
for uw in unique_words:
    count[uw] = words.count(uw)
    
count

{'--': 7,
 'a': 7,
 'above': 1,
 'add': 1,
 'advanced': 1,
 'ago': 1,
 'all': 1,
 'altogether': 1,
 'and': 6,
 'any': 1,
 'are': 3,
 'as': 1,
 'battle-field': 1,
 'be': 2,
 'before': 1,
 'birth': 1,
 'brave': 1,
 'brought': 1,
 'but': 2,
 'by': 1,
 'can': 5,
 'cause': 1,
 'civil': 1,
 'come': 1,
 'conceived': 2,
 'consecrate': 1,
 'consecrated': 1,
 'continent': 1,
 'created': 1,
 'dead': 3,
 'dedicate': 2,
 'dedicated': 4,
 'detract': 1,
 'devotion': 2,
 'did': 1,
 'died': 1,
 'do': 1,
 'earth': 1,
 'endure': 1,
 'engaged': 1,
 'equal': 1,
 'far': 2,
 'fathers': 1,
 'field': 1,
 'final': 1,
 'fitting': 1,
 'for': 5,
 'forget': 1,
 'forth': 1,
 'fought': 1,
 'four': 1,
 'freedom': 1,
 'from': 2,
 'full': 1,
 'gave': 2,
 'god': 1,
 'government': 1,
 'great': 3,
 'ground': 1,
 'hallow': 1,
 'have': 5,
 'here': 8,
 'highly': 1,
 'honored': 1,
 'in': 4,
 'increased': 1,
 'is': 3,
 'it': 5,
 'larger': 1,
 'last': 1,
 'liberty': 1,
 'little': 1,
 'live': 1,
 'lives': 1,
 'living': 2,
 'long'