# Think Python, Week 10: Dictionaries

<img src='../meta/images/python-logo.png' style="float:right">

## Objectives
---

* Understand the `dict` data type and some common use cases
* Understand the `raise` statement and raising errors
* Understand global variables

## Dictionary Basics
---

* Dictionaries contain *items* which pair a *key* with a *value*.
  * They support some of the same operations as lists: `len()`, `in`, iteration (over keys)
  * Dictionaries are mutable.
  * Dictionaries are "fast" no longer how long they get (unlike lists).
  * Ordered as of Python 3.7. 
* Each key must be unique.
  * Adding a new value to a key overwrites the old value.
* Keys must be *hashable*. 
* Values can be any type.


In [None]:
mydict = {"LLS:14.0.1": "New Bible Dictionary",
       "LLS:14.0.3": "Harper's Bible Dictionary", 
       "LLS:HLMNILLBBLDICT": "Holman Illustrated Bible Dictionary",
       "LLS:14.0.4": "The Anchor Yale Bible Dictionary",
       # and many, many others ...
       }

In [None]:
len(mydict)

In [None]:
'LLS:14.0.1' in mydict

In [None]:
'NBD' in mydict

In [None]:
mydict['LLS:14.0.1']

In [None]:
mydict['NBD']

In [None]:
mydict.get('NBD', 'Missing')

### Use cases

* Looking things up
* Keeping track of whether you've seen an item before
  * Finding duplicates
  * Caching
  * Grouping items with a common key (use lists as values)
* Counting things
* Efficiently collecting key-value pairs

In [None]:
# fast lookup for a large set of values
userzipcodes = {
    '98225': True,
    '98226': True,
    # ... and presumably hundreds more
}
# apparently Smokey the Bear has his own zip code: https://lite987.com/americas-5-most-interesting-zip-codes/
# but he's probably not a Logos users
'20252' in userzipcodes

In [None]:
# counting things
def counter(l):
    counts = dict()
    for item in l:
        counts[item] = counts.get(item, 0) + 1
    return counts
        
counter(['red', 'white', 'green', 'red', 'yellow', 'red'])
# Note the `collections` module has a `Counter` class for this use case: this is just for expository purposes. 


In [None]:
# collecting items with a common key
# words by first letter
firstletters = dict()
for word in ['alpha', 'beta', 'alfalfa', 'beets', 'aspirin']:
    initial = word[0]
    # initialize an empty list if not present
    if initial not in firstletters:
        firstletters[initial] = []
    # append items with the same first letter
    firstletters[initial].append(word)
    
firstletters

## Raising Exceptions
---

* Examples so far: 
    * `NameError`: using a name that's not defined
    * `TypeError`: problems with types
    * `IndexError`: accessing a non-existent index in a sequence 
    * `KeyError`: accessing a non-existent key in a dictionary
    * `ValueError`: something wrong with the value of a parameter
* There are lots of others. Later you'll learn how to define your own
* An exception like `TypeError` is a *class* of error: there might be many different kinds
* Best practices: 
    * Include an informative message
    * Rely on Python's existing errors
    * Don't go overboard


## Global Variables
---

* By default, Python uses local variables
* Use the global declaration to use the global value instead
* Strong convention: use UPPERCASE variable names for global values
* Use globals sparingly!
  * Encapsulation is generally a good thing
  * Code with globals is harder to test in a modular way


### Exercise 10-1

Create a function `shortener()` that generates short URLs using 'https://mydomain.com'. Create a second function `resolver()` that translates short URLs back to their long equivalents, and returns other URLs as-is. `resolver()` should raise a `ValueError` if given a non-existent short URL. 

Hint: you'll want some global variables. 

```
>>> shortener('https://faithlife.com/think-python/activity')
'https://mydomain.com/0'
>>> shortener('https://seanboisen.com')
'https://mydomain.com/1'
# only shorten it once
>>> shortener('https://faithlife.com/think-python/activity')
'https://mydomain.com/0'
# return the long version for short URLs
>>> resolver('https://mydomain.com/0')
'https://faithlife.com/think-python/activity'
# if not a short URL, just return it
>>> resolver('https://logos.com')
'https://logos.com'
# raise an error if the short URL is undefined
>>> resolver('https://mydomain.com/12345')
ValueError: 'Invalid short URL'
```

[My solution](#Exercise-10-1-Solution)

## Homework
---

* Read Chapter 11 and do the exercises. 


## Additional Resources
---

* If you have your own hosting service, you can set up [YOURLS](http://yourls.org/) as your own URL shortener (alas, using PHP rather than Python)
* <img src="../meta/images/bd.png" style="display: inline;" />[Memoization](https://en.wikipedia.org/wiki/Memoization)
* <img src="../meta/images/bd.png" style="display: inline;" />The excellent [Natural Language Toolkit](http://www.nltk.org/) (NLTK) has a well-developed class for frequency distributions (histograms) called [FreqDist](http://nltk.googlecode.com/svn/trunk/doc/api/nltk.probability.FreqDist-class.html). Among other things, it's very useful for [text corpora](http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html).
* <img src="../meta/images/bd.png" style="display: inline;" /><img src="../meta/images/bd.png" style="display: inline;" />[Wikipedia: Hash table](http://en.wikipedia.org/wiki/Hash_table)



## Exercise 10-1 Solution
---

In [None]:
# URL Shortener
SHORTURLS = dict()
URLS = dict()
URLCOUNTER = 0

def shortener(url):
    global SHORTURLS, URLS, URLCOUNTER
    if url not in URLS:
        shorturl = 'https://mydomain.com/' + str(URLCOUNTER)
        # map full URL to short version
        URLS[url] = shorturl
        # map short version to full URL for resolving
        SHORTURLS[shorturl] = url
        URLCOUNTER += 1
    return URLS[url]


def resolver(url):
    if url.startswith('https://mydomain.com/'):
        if url in SHORTURLS:
            return SHORTURLS[url]
        else:
            raise ValueError("Invalid short URL")
    else:
        return url
    

In [None]:
print(shortener('https://faithlife.com/think-python/activity'))
print(shortener('https://seanboisen.com'))
print(shortener('https://faithlife.com/think-python/activity'))
print(resolver('https://mydomain.com/0'))
print(resolver('https://logos.com'))
print(resolver('https://mydomain.com/12345'))

In [None]:
URLS

In [None]:
SHORTURLS