# Day 2

## Day 2 Agenda
* __`enumerate()/zip()`__
* list comprehensions
* tuples
* dictionaries
* explaining __`this.py`__
* sets
* file I/O

## "Pythonic"

In [1]:
stooges = ['Shemp', 'Moe', 'Larry', 'Curley']

In [2]:
i = 0
for stooge in stooges:
    print('index', i, 'is', stooge)
    i += 1

index 0 is Shemp
index 1 is Moe
index 2 is Larry
index 3 is Curley


## __`enumerate()`__
* a builtin function which returns an _enumerate_ object from any iterable


In [8]:
for index, stooge in enumerate(stooges):
    print('index', index, 'is', stooge)

index 0 is Shemp
index 1 is Moe
index 2 is Larry
index 3 is Curley


In [4]:
type(enumerate(stooges))

enumerate

## __`zip(*iterable)`__
* builtin function which creates an iterator that aggregates elements from each iterable
* why is it called __`zip`__?

In [11]:
stooges = ['Larry', 'Moe', 'Curly']
marxbros = ['Groucho', 'Harpo', 'Chico']

for stooge, marx in zip(stooges, marxbros):
    print(stooge, marx)

Larry Groucho
Moe Harpo
Curly Chico


In [12]:
stooges = ['Larry', 'Moe', 'Curly']
marxbros = ['Groucho', 'Harpo', 'Chico', 'Zeppo']
for stooge, marx in zip(stooges, marxbros):
    print(stooge, marx)

Larry Groucho
Moe Harpo
Curly Chico


In [17]:
import itertools # module that helps with iteration
stooges = ['Larry', 'Moe', 'Curly']
marxbros = ['Groucho', 'Harpo', 'Chico', 'Zeppo']
for stooge, marx in itertools.zip_longest(stooges,
                     marxbros):
    print(stooge, marx)

Larry Groucho
Moe Harpo
Curly Chico
None Zeppo


# List Comprehensions

## List Comprehensions ("listcomps")
* quick way to build a list
* more readable/faster
* which is easier to read?

In [18]:
string = 'ABCabc*'
ascii_codes = []
for char in string:
    ascii_codes.append(ord(char))
    
print(ascii_codes)

[65, 66, 67, 97, 98, 99, 42]

In [21]:
string = 'ABCabc*'
ascii_codes = [ord(char) for char in string]

print(ascii_codes)

[65, 66, 67, 97, 98, 99, 42]


## List Comprehensions (cont'd)
* listcomps can generate a list from the Cartesian product of two or more iterables

In [41]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [[color, size] for color in colors
                         for size in sizes]

tshirts

[['black', 'S'],
 ['black', 'M'],
 ['black', 'L'],
 ['white', 'S'],
 ['white', 'M'],
 ['white', 'L']]

In [35]:
# generate a list of all the consonants in a
# string, discarding vowels and spaces
string = 'alphabet soup tastes great'
consonants = [char for char in string
                  if char not in 'aeiou ']
print(consonants)

['l', 'p', 'h', 'b', 't', 's', 'p', 't', 's', 't', 's', 'g', 'r', 't']


## Lab: List Comprehensions
*  Start with Cartesian product example (colors x sizes of t-shirts) and add a third list, __`sleeves = ['short', 'long']`__ then write a new listcomp which generates the Cartesian product colors x sizes x sleeves. __`tshirts`__ should look like this:<pre><b>
    [['black', 'S', 'short'],
     ['black', 'S', 'long'],
     ['black', 'M', 'short'],
     ['black', 'M', 'long'],
     ['black', 'L', 'short'],
     ['black', 'L', 'long'],
     ['white', 'S', 'short'],
     ['white', 'S', 'long'],
     ['white', 'M', 'short'],
     ['white', 'M', 'long'],
     ['white', 'L', 'short'],
     ['white', 'L', 'long']]
     
 </b></pre>
* Use a list comprehension to create a list of the squares of the integers from 1 to 25 (i.e, 1, 4, 9, 16, …, 625)
* Given a list of words, create a second list which contains all the words from the first list which do not end with a vowel
* Use a list comprehension to create a list of the integers from 1 to 100 which are not divisible by 5

## listcomps recap
* keep them short
* they are not list incomprehensions, so keep them simple
* use line breaks since they are ignored inside [] (and (), {}) and you therefore don't need the ugly '\' line continuation character
* note that __`for`__ loops do many things (e.g., scan a sequence to count or select items), computing aggregates (sum, averages) or any number of other processing tasks
  * in contrast, listcomps do ONE thing–generate lists!

# Tuples

## Tuples
* immutable data type
* typically heterogeneous (cf. lists)
* generally imply some structure

In [42]:
t = () # empty tuple
t

()

In [43]:
type(t)

tuple

In [49]:
t = (1,) # singleton tuple

In [50]:
t

(1,)

In [51]:
t = 'Jones', 'Smith', 1023, True # no parens
t

('Jones', 'Smith', 1023, True)

In [52]:
# tuple unpacking
last_name, first_name, employee_num, full_time = t

In [53]:
employee_num

1023

In [54]:
something = input()
t1 = something.split()
t2 = tuple(something.split())

hi there class


In [55]:
print(t1, t2, sep='\n')

['hi', 'there', 'class']
('hi', 'there', 'class')


In [56]:
person = ('Gutzon Borglum', 1867, 'Idaho')

In [57]:
person[1]

1867

In [58]:
person[1] = 1868

TypeError: 'tuple' object does not support item assignment

In [59]:
# a tuple may contain a mutable object...

person = ('Curie', 'Marie', 1867, [])

In [60]:
person[-1].extend('physicist chemist'.split())

In [61]:
person

('Curie', 'Marie', 1867, ['physicist', 'chemist'])

## Lab: Tuples
* Given a list of words, sort them by length of word, rather than alphabetically.
* To do this, first create a list of tuples of the form (len, word), where the first element is the length of the word.
* Next, sort the tuples.
* Finally, extract the words from the list of tuples into a new list which is now sorted by length of word. Try to use a list comprehension if you can.

## Recap: Tuples
* not just "constant lists" 
 (see http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists)
* remember that lists are (typically) ordered sequences of homogeneous values
* and tuples typically imply some structure and refer to multiple attributes of ONE item (person, country, building, etc.)

# Dictionaries



# Dictionaries
* unordered grouping of key/value pairs
* sometimes called a "hash", "hashmap", or "associative array"

In [62]:
d = {} # empty dict

In [63]:
d = { 'X': 10, 'V': 5, 'I': 1 } # can be initialized when declared

In [64]:
print(d)

{'X': 10, 'V': 5, 'I': 1}


In [65]:
d['L'] = 50
print(d)

{'X': 10, 'V': 5, 'I': 1, 'L': 50}


In [66]:
# iterating through a dict iterates through the keys 
for thing in d:
    print(thing, end=' ')

X V I L 

In [67]:
# ...of course we can print the values while iterating
for thing in d:
    print(thing, d[thing])

X 10
V 5
I 1
L 50


In [68]:
mydict = {'trenta': 31, 'grande': 16, 
          'venti': 20}
print(mydict)

{'trenta': 31, 'grande': 16, 'venti': 20}


In [69]:
print(mydict.keys(), 
      mydict.values(), mydict.items(), sep='\n')

dict_keys(['trenta', 'grande', 'venti'])
dict_values([31, 16, 20])
dict_items([('trenta', 31), ('grande', 16), ('venti', 20)])


In [70]:
total = 0
for amount in mydict.values():
    total += amount

total

67

## Dictionaries: View Objects
* __`keys()`__, __`values()`__, and __`items()`__ are view objects
* unlike lists, they provide a dynamic window into the dictionary
* view objects are new to Python 3

In [72]:
keys = mydict.keys()
keys

dict_keys(['trenta', 'grande', 'venti'])

In [73]:
# keys will change automagically after we add to the dict
mydict['tall'] = 12
keys

dict_keys(['trenta', 'grande', 'venti', 'tall'])

## Dictionaries: __`enumerate()`__
* because dicts are unordered, __`enumerate()`__ isn't all that useful

In [74]:
for index, val in enumerate(mydict):
    print('index', index, 'is', val)

index 0 is trenta
index 1 is grande
index 2 is venti
index 3 is tall


In [75]:
# We can iterate through the dict items, but remember that dict
# is unordered...
for key, val in mydict.items():
    print(key, '=>', val)

trenta => 31
grande => 16
venti => 20
tall => 12


In [76]:
# In order to iterate in order, we have to sort the
# dict by value (as opposed to key)
# By default, sort() will sort by key--
# usually not what we want!

for k in sorted(mydict, key=mydict.get):
    print(k, '=>', mydict[k])

tall => 12
grande => 16
venti => 20
trenta => 31


# __`get()`__/__`setdefault()`__: Dealing with missing dict values

In [77]:
d = {'foo': 'bar'}

In [78]:
d['foo']

'bar'

In [79]:
d['foot']

KeyError: 'foot'

In [80]:
d.get('foot')

In [81]:
if 'foot' in d:
    print(d['foot'])
# or just... d.get('foot')

In [82]:
d.setdefault('foo', 23) # get the value of 'foo' or add 'foo' 
# to dict with value = 23
#if 'foo' in d:
    #val = d['foo']
#else:
    #d['foo'] = 23
    #val = 23

'bar'

In [83]:
d

{'foo': 'bar'}

In [84]:
print(d.setdefault('foot', 23))
d

23


{'foo': 'bar', 'foot': 23}

## Removing items from a dict
* __`del`__ = remove an item from the dict
* __`dict.pop(key)`__ = remove item and return value
* __`dict.clear()`__ = empty out the dict

In [90]:
mydict = {'trenta': 31, 'grande': 16, 'venti': 20,
          'tall': 12}
print(mydict)

{'trenta': 31, 'grande': 16, 'venti': 20, 'tall': 12}


In [91]:
del mydict['trenta']
print(mydict)

{'grande': 16, 'venti': 20, 'tall': 12}


In [92]:
print(mydict.pop('venti'))

20


In [93]:
print(mydict)

{'grande': 16, 'tall': 12}


In [94]:
mydict.clear()
mydict

{}

## Lab: dictionary
* use a dict to translate Roman numerals into their Arabic equivalents
1. load the dict with Roman numerals M (1000), D (500), C (100), L (50), X (10), V (5), I (1)
2. read in a Roman numeral
3. print Arabic equivalent
4. try it with MCLX = 1000 + 100 + 50 + 10 = 1160
4. __If you have time, deal with the case where a smaller number precedes a larger number, e.g., XC = 100 - 10 = 90, or MCM = 1000 + (1000-100) = 1900__

## Dict Comprehension
* like a listcomp, a dictcomp creates a dict quickly

In [99]:
names = ['Sally', 'Bob', 'Martha', 'Dirk']
employee_ids = [345, 286, 453, 119]
id_dict = { name: emp_id + 1000
                   for name, emp_id in zip(names,
                                employee_ids)}
print(id_dict)

{'Sally': 1345, 'Bob': 1286, 'Martha': 1453, 'Dirk': 1119}


In [97]:
d = { 'foo': 4, 'bar': -1, 'baz': -1, 'blah': 3, 'what': 2 }
print(d)

{'foo': 4, 'bar': -1, 'baz': -1, 'blah': 3, 'what': 2}


In [98]:
d = { k: v for k, v in d.items()
               if v != -1 }
print(d)

{'foo': 4, 'blah': 3, 'what': 2}


## Now we understand this code!

In [102]:
s = """Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr."""

d = {}
for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print("".join([d.get(c, c) for c in s]))

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.


# Sets

## Sets
* unordered collection, no duplicates
* kind of a one-trick pony–remove duplicates

In [106]:
s = { 'Annie', 'Betty', 'Cathy', 'Donna' }
print(s)

{'Annie', 'Donna', 'Betty', 'Cathy'}


In [107]:
s.add('Ellen')
print(s)

{'Ellen', 'Betty', 'Donna', 'Annie', 'Cathy'}


In [109]:
s.add('Annie')
print(s)

oops
{'Ellen', 'Betty', 'Donna', 'Annie', 'Cathy'}


In [110]:
# we can use the 'in' operator
if 'Annie' in s:
    print('Yep!')

Yep!


## Deleting from a Set
* __`remove(item)`__: remove an item if it's in the set
* __`discard(item)`__: remove an item whether or not it's in the set
* __`pop()`__: pops a random element out of the set

In [111]:
print(s)

{'Ellen', 'Betty', 'Donna', 'Annie', 'Cathy'}


In [113]:
s.remove('Betty')

KeyError: 'Betty'

In [114]:
print(s)

{'Ellen', 'Donna', 'Annie', 'Cathy'}


In [115]:
s.discard('Loren')

In [116]:
print(s)

{'Ellen', 'Donna', 'Annie', 'Cathy'}


In [118]:
s.pop()
print(s)

{'Annie', 'Cathy'}


## sets (cont'd)

In [None]:
even = set(range(2, 11, 2))
odd = set(range(1, 10, 2))
print(even, odd, sep='\n')

In [None]:
prime = {2, 3, 5, 7}
prime & odd

In [None]:
prime & even

In [None]:
odd | even

In [None]:
prime - even

In [None]:
prime ^ odd

## sets + dicts

In [None]:
movies = {
    'Die Hard': { 'Bruce Willis', 'Alan Rickman', 'Bonnie Bedelia' },
    'The Sixth Sense' : { 'Toni Collete', 'Bruce Willis', 'Donnie Wahlberg' },
    'The Hunt for Red October' : { 'Sean Connery', 'Alec Baldwin' },
    'The Highlander': { 'Christopher Lambert', 'Sean Connery' },
    '16 Blocks': { 'Bruce Willis', ' Yasiin Bey', 'David Morse' }
}

In [None]:
for title, stars in movies.items():
    if 'Bruce Willis' in stars:
        print(title)

In [None]:
for title, stars in movies.items():
    if stars & { 'Alan Rickman', 'Sean Connery' }:
        print(title)

## Subsets

In [None]:
set1 = { 1, 2, 3 }
set2 = { 1, 2, 3, 5, 7, 9 }

In [None]:
set1 <= set2 # <= means "subset"

In [None]:
set1 <= set1 # a set is always a subset of itself

In [None]:
set1 < set1 # but a set is never a proper subset of itself

In [None]:
set1 < set2 # set1 is a proper subset of set2 because set2 has all of set1 *and more*

## Lab: Sets
* Use a set to find all of the unique words in the input and print them out in sorted order
* If the user entered __There is no there there__, your program should print out 
   <pre><b>
   is
   no
   there
   </b></pre>
* Note that There and there should be counted as the same word.

## Sets Recap
* unordered
* no duplicates
* operators &, |, -, ^
* use __`in`__ to test for membership
* subset vs. proper subset



# File I/O

## File I/O
* __`fileobj = open(filename, mode)`__
* mode is one or two letters
  * r = read
  * r+ = open for reading and writing
  * w = write (create/overwrite)
  * x = write, but only if file does not already exist
  * a = append, if file exists (unless a+, then create)
* second letter =
  * t = text file (default)
  * b = binary
* __`fileobj.close()`__

## File I/O: Open/Close

In [119]:
f = open('/tmp/test.txt', 'r')

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/test.txt'

In [120]:
f = open('/tmp/test.txt', 'w')
f.close()

In [121]:
f = open('/tmp/test.txt', 'x')

FileExistsError: [Errno 17] File exists: '/tmp/test.txt'

## File I/O: Read/Write

In [122]:
poem = '''TWO roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.'''

len(poem)

729

In [123]:
f = open('/tmp/poem.txt', 'w')
f.write(poem)

729

In [124]:
f.close()

In [125]:
f = open('/tmp/poem.txt', 'r')
poem2 = f.read()
f.close()

In [126]:
poem == poem2

True

## File I/O: __`write()`__ vs. __`print()`__


In [127]:
f = open('/tmp/poem.txt', 'w')
print(poem, file=f)
f.close()

In [128]:
f = open('/tmp/poem.txt', 'r')
poem2 = f.read()
f.close()

In [129]:
poem == poem2

False

In [130]:
len(poem2)

730

## __`print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)`__
* __`sep`__ = separator (default is space)
* __`end`__ = what to print at end (default is newline)
* __`file`__ = where to print, default is screen
* __`flush`__ = whether to flush output buffer, default is no

## File I/O: How to Read Data
* __`read()`__: slurps up entire file at once
  * __`read(x)`__ reads a most __`x`__ bytes
* __`readline()`__: reads a line at a time
* __`readlines()`__ reads a line at a time and returns the lines as a list of strings
* or use an iterator…

In [131]:
poem = ''
f = open('/tmp/poem.txt', 'r')
for line in f:
    poem += line
f.close()

In [132]:
len(poem)

730

## File I/O: __`with`__ statement
* the __`with`__ statement sets up a temporary "context" and closes the file automatically so we don't have to bother with closing it

In [133]:
with open('/tmp/poem.txt', 'r') as f:
    poem2 = f.read()
    # at this point file is open
    print('in with, f.closed =', f.closed)

in with, f.closed = False


In [134]:
poem == poem2

True

In [135]:
f.closed

True

## Lab: File I/O
* write a Python program which prompts the user for a filename, then opens that file and writes the contents of the file to a new file, in reverse order, i.e.,

<pre><b>
    Original file       Reversed file
    Line 1              Line 4
    Line 2              Line 3
    Line 3              Line 2
    Line 4              Line 1
</b></pre>

## Lab: File I/O + dicts
* write a Python program to read a file and count the number of occurrences of each word in the file
* use a dict, indexed by word, to count the occurrences
* remember __`d.get(key)`__ will return __`None`__ if there is no such key in the dict (vs. __`d[key]`__ which will throw an exception) and also the __`in`__ operator
* treat __The__ and __the__ as the same word when counting
* print out words and counts, from most common to least common
* EXTRA: remove punctuation, so __Hamlet,__ == __Hamlet__
* Road Not Taken and Hamlet are in your materials

## File I/O: recap
* __`open()`__ returns file object
* __`close()`__ closes the file
* __`read()`__ reads bytes
* __`readline()`__ reads a line at a time
* __`readlines()`__ reads all lines–shouldn't be used
* can also iterate through a file object a line at a time
* __`with`__ statement sets up a temporary context (block) for file I/O and automatically closes file when block is exited

# End of Day 2