# Think Python: Week 10

<img src="substitutions.png" style="float:right;" />
Slides: http://github.com/sboisen/training/ThinkPython/Week10

## Summer Break!

After today, our next class will be **Sept 29**. 

## Sidebar: using Git

* Basic command-line tools: https://git-scm.com/downloads
* Slightly better: https://desktop.github.com/
* Better still (but needs a paid license): http://www.syntevo.com/smartgit/

## Chapter 11 Review Goals

* Dictionaries
* Raising exceptions
* Global variables

### Dictionaries: Master Them!
* Most programming languages have an equivalent concept
* Key-value pairs are everywhere once you have eyes to see them
* Dictionaries are typically much faster than lists

## Dictionaries: the Basics

* Dictionaries contain *items* which pair a *key* with a *value*
* Each key must be unique
* Keys must be immutable
* Values can be any type
* Some sequence methods apply: `in`, `len()`

In [None]:
# dict mapping resource IDs to titles
myd = {"LLS:14.0.1": "New Bible Dictionary",
       "LLS:14.0.3": "Harper's Bible Dictionary", 
       "LLS:HLMNILLBBLDICT": "Holman Illustrated Bible Dictionary",
       "LLS:14.0.4": "The Anchor Yale Bible Dictionary",
       # and many, many others ...
       }

## Dict Order Isn't Useful


In [None]:
myd.keys()

In [None]:
# access items in order by keys
for k in sorted(myd.keys()):
    print k, myd[k]

## Key Uniqueness
* Either collect additional values in a list, or test before overwriting a value

## Fun with Dictionaries

In [None]:
# fast lookup in a large set of values: 
# read everything into a dictionary once initially, then look up repeatedly
customer_zipcodes = {
    '98225': True,
    '98226': True,
    '98230': True,
    # and a few thousand more
    }
'98227' in customer_zipcodes

In [None]:
# Counting repeated items in a sequence
# Exercise 11.2: histogram using get()
def histogram(l):
    counts = dict()
    for color in l:
        counts[color] = counts.get(color, 0) + 1
    return counts
    
histogram(['red', 'blue', 'red', 'green', 'blue', 'red', 'purple'])

Note the `collections` module has a `Counter` for this use case: this is just for expository purposes. 

In [None]:
# inverting a dictionary
# example: products by customer, customers by product

In [None]:
# memoization: cache results to speed things up
refcache = {}

for ref in long_list_of_Bible_references:
    if ref not in refcache:
        refcache[ref] = slow_reference_conversion(ref)
# now all the conversion results are in refcache
# but nothing got converted more than once
        

In [None]:
# URL Shortener
SHORTURLS = dict()
URLS = dict()
URLCOUNTER = 0

def shortener(url):
    if url.startswith('http://mydomain.com/'):
        return SHORTURLS[url[20:]]
    else:
        return define_short_url(url)

def define_short_url(url):
    global URLCOUNTER
    if url not in URLS:
        SHORTURLS[str(URLCOUNTER)] = url
        URLS[url] = str(URLCOUNTER)
        URLCOUNTER += 1
    return 'http://mydomain.com/' + URLS[url]
  
# then set up your web site to redirect using the contents of SHORTURLS

In [None]:
for url in ['http://logos.com/compare',
            'http://seanboisen.com',
            'https://github.com/sboisen/training/tree/master/ThinkPython/Week10', 
            'http://seanboisen.com',
            'http://mydomain.com/0']:
    print url, '=>', shortener(url)

In [None]:
# the value in a dictionary can be another dictionary
users = {
    'sboisen': {
        'first name': 'Sean',
        'last name': 'Boisen', 
        'tenure': 8,
        },
    'pvenable': {
        'first name': 'Peter',
        'last name': 'Venable', 
        'tenure': 4,
        },
    'rbrannan': {
        'first name': 'Rick',
        'last name': 'Brannan', 
        'tenure': 22,
        },
    }
users['sboisen']['first name']

## Raising Exceptions
* Examples so far: 
    * `NameError`: using a name that's not defined
    * `TypeError`: problems with types
    * `IndexError`: accessing a non-existent index in a sequence 
    * `KeyError`: accessing a non-existent key in a dictionary
    * `ValueError`: something wrong with the value of a parameter
* Lots of others. Later you'll learn how to define your own
* An exception like `TypeError` is a *class* of error: there might be many different kinds

* Best practices: 
    * include an informative message
    * don't go overboard

## Global Variables
* By default, Python uses local variables
* Use the global declaration to use the global value instead
* Strong convention: use UPPERCASE variable names for global values
* Use globals sparingly!
  * Encapsulation is generally a good thing
  * Code with globals is harder to test in a modular way

## During Summer Break

* Read chapter 12: Tuples
* Look at the bonus assignments for Week 09, and rewrite them using dictionaries

## Additional Resources

* If you have your own hosting service, you can set up [YOURLS](http://yourls.org/) as your own URL shortener (alas, using PHP rather than Python)
* <img src="bd.png" style="display: inline;" />I've looked at about a dozen Git tutorials (which tells you something about its complexity). [This one seems best suited to first-timers](http://readwrite.com/2013/09/30/understanding-github-a-journey-for-beginners-part-1).
* <img src="bd.png" style="display: inline;" />[Memoization](https://en.wikipedia.org/wiki/Memoization)
* <img src="bd.png" style="display: inline;" />The excellent [Natural Language Toolkit](http://www.nltk.org/) (NLTK) has a well-developed class for frequency distributions (histograms) called [FreqDist](http://nltk.googlecode.com/svn/trunk/doc/api/nltk.probability.FreqDist-class.html). Among other things, it's very useful for [text corpora](http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html).
* <img src="bd.png" style="display: inline;" /><img src="bd.png" style="display: inline;" />[Wikipedia: Hash table](http://en.wikipedia.org/wiki/Hash_table)
