# Review, Installation, and Hashables

This notebook is for the Intro to Python for GIS workshop 1 at Arizona State, May 4-8.

## Review

The python interpreter is a *command line interface* to your computer. Python is a *programming language* that you can use to instruct your computer to do things. 

In [1]:
8 + 9

17

When the interpreter doesn't understand your directions, it will tell you:

In [2]:
8 + apple

NameError: name 'apple' is not defined

Following these errors is critical. Your workflows as GIS professionals usually will involve tons of:

1. interactive interpreter sessions
2. troubleshooting

It takes a long time using and understanding python to pop out multiple lines of code that does exactly what you want it to do without generating any errors along the way. Thus, don't look at the tracebacks as failures, look at them as ways to fix your ambiguities. 

###Problem : Following tracebacks

This is some real code to read in some comma separated value data and look at its header. Why is it wrong, and can you fix it?

In [11]:
import pysal as ps
fp = ps.examples.get_path('stl_hom.csv')
data = ps.open(fp)
data.head

AttributeError: 'csvWrapper' object has no attribute 'head'

This function is a classic programming exercise. It should print 'fizz' when a number is divisible by 3, 'buzz' when a number is divisible by 5, and 'fizzbuzz' if the number is divisible by both 3 and 5, for some range of numbers in the input. 

1. Fix the function
2. Can the function be modified so that it doesn't 'fizzbuzz' when i = 0?

In [13]:
for i in 100:
    if i % 15 = 0:
        print 'fizzbuzz', i
    elif i % 5 = 0:
        print 'buzz', i
    elif i % 3 = 0:
        print 'fizz', i

fizzbuzz 0
fizz 3
buzz 5
fizz 6
fizz 9
buzz 10
fizz 12
fizzbuzz 15
fizz 18
buzz 20
fizz 21
fizz 24
buzz 25
fizz 27
fizzbuzz 30
fizz 33
buzz 35
fizz 36
fizz 39
buzz 40
fizz 42
fizzbuzz 45
fizz 48
buzz 50
fizz 51
fizz 54
buzz 55
fizz 57
fizzbuzz 60
fizz 63
buzz 65
fizz 66
fizz 69
buzz 70
fizz 72
fizzbuzz 75
fizz 78
buzz 80
fizz 81
fizz 84
buzz 85
fizz 87
fizzbuzz 90
fizz 93
buzz 95
fizz 96
fizz 99


### Installation

Everything we do here will use [IPython](https://ipython.org). In the future, IPython may be called "jupyter," because the project is broadening itself out from just python language interfaces, so keep that in mind. 

If you have a "modern" python installation (2.7.5+), you should have a program called `pip` installed as well. If you do (and you're on OSX or Linux), installing new python packages is a breeze:

In [None]:
!pip install <PACKAGE>

If you don't have pip, you'll need to install packages manually. Alternatively, you can use one of the major free distributions of python for scientific work, like Anaconda from Continuum.IO or Canopy from Enthought, and use their builtin frameworks to install new software:

In [None]:
!conda install <PACKAGE>

We'll be using numpy, shapely, and pysal, at minimum. Enterprising individuals are also encouraged to look at fiona and geopands. 

### Hashables

Many times when we program, we're interested in using some kind of hierarchical data structure, where we can access some particular piece of data using its "name." 

In [16]:
zach = 'brother'

But, sometimes, we want to store a bunch of information about that person. We've taught you about tuples and lists, so you could imagine that a few lists or tuples could do this work for you, if you keep the indices together:

In [27]:
ages = [25, 23, 19]
names = ('Zachary', 'Levi', 'Atticus')
cities = ['Tucson', 'Phoenix', 'Flagstaff']

In [28]:
for i in range(3):
    print names[i], ':', ages[i] ,',', cities[i]

Zachary : 25 , Tucson
Levi : 23 , Phoenix
Atticus : 19 , Flagstaff


But this is very cumbersome and, if a list gets modified, it wont be valid anymore!

In [29]:
ages.sort()

In [30]:
for i in range(3):
    print names[i], ':', ages[i] ,',', cities[i]

Zachary : 19 , Tucson
Levi : 23 , Phoenix
Atticus : 25 , Flagstaff


You're probably used to thinking of data in *tabular* format, where all $n$ entries have $m$ columns. In python, these are usually implemented as *numpy arrays* or derivatives from there, and we'll talk about those later. For now, let's look at the information about my brothers listed above. 

Python uses the "dictionary" class to keep a "key" and a "value" together:

In [31]:
ages = {'Zachary':25, 'Levi': 23, 'Atticus':19}

In [32]:
ages

{'Atticus': 19, 'Levi': 23, 'Zachary': 25}

You can nest dictoinaries within dictionaries:

In [33]:
people = {'Zachary':{}, 'Levi':{}, 'Atticus':{}}

In [34]:
people['Zachary'].update({'Age':25})
people['Levi'].update({'Age':23})
people['Atticus'].update({'Age':19})

In [35]:
people

{'Atticus': {'Age': 19}, 'Levi': {'Age': 23}, 'Zachary': {'Age': 25}}

And this nested approach allows us to store entire tables:

In [36]:
people['Zachary'].update({'City':'Tucson'})
people['Levi'].update({'City':'Phoenix'})
people['Atticus'].update({'City':'Flagstaff'})
people['Zachary'].update({'Nickname':'Zach'})
people['Atticus'].update({'Nickname':'Atti'})

In addition, we can refer very concisely to the person using indexing:

In [38]:
people['Atticus']['Nickname']

'Atti'

Remember, to see the methods of a class, use the `dir` function. 

In [73]:
dir(people)

['__class__',
 '__cmp__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'has_key',
 'items',
 'iteritems',
 'iterkeys',
 'itervalues',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values',
 'viewitems',
 'viewkeys',
 'viewvalues']

Iterating over dictionaries usually uses one of the arguments with `iter` in front. 

In [79]:
for person,attrs in people.iteritems():
    if people[person]['Age'] < 24:
        print person, attrs
    else:
        print 'Too old!'

Levi {'City': 'Phoenix', 'Age': 23}
Too old!
Atticus {'City': 'Flagstaff', 'Age': 19, 'Nickname': 'Atti'}


### Problem: Reading Documentation

Given the `people` dictionary constructed above and the python documentation [here](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict), can you:

1. drop Zachary from the dictionary?
2. add yourself to the dictionary?
3. Give Levi a nickname?
3. check whether or not Zachary is still in the dictionary?
4. change Levi's nickname to 'TA'?

### Problem: Building Dictionaries from Data

Given the code we fixed above, build a dictionary from the `calempdensity` table in `pysal`. Use a unique key for each entry. And, add the items using iteration, not manually!

### Answer:

In [None]:
!cat 00_ex4.py

### Ex: Dictionaries are flexible.

In python, you can define functions on the fly using a `lambda` method:

In [85]:
q = lambda x: x**2 + x

In [86]:
q(5)

30

Dictionaries can be packed with whatever you need them to be packed with:

In [121]:
d = {'show' : 'x**2 + x', 'tell' : lambda x: eval('x**2 + x', {'x':x})}

In [175]:
d

{'show': 'x**2 + x', 'tell': <function __main__.<lambda>>}

In [122]:
d['show']

'x**2 + x'

In [123]:
d['tell'](6)

42

The other big important kind of hashable in python is the set. It functions very similarly to sets in mathematics. 

In [134]:
q = set()

In [135]:
q.update([1,2,3,4,5])

In [136]:
p = set([5,6,7,8,9,10])

In [137]:
q.union(p)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [140]:
q.intersection(p)

{5}

In [141]:
q.isdisjoint(p)

False

In [146]:
q.difference(p)

{1, 2, 3, 4}

In [147]:
p.difference(q)

{6, 7, 8, 9, 10}

In [149]:
q.symmetric_difference(p)

{1, 2, 3, 4, 6, 7, 8, 9, 10}

In [150]:
q.discard(5)

In [151]:
q

{1, 2, 3, 4}

### Problem: Making Things Unique

There are a few ways to make a list unique in python. But, python is a language where doing things the "right" way is usually also fastest, smallest, and simplest. Consider making a list of strings unique:

In [153]:
names = ['rob', 'tina', 'joe', 'sue', 'rob', 'joe']

1. Make this list of strings unique using only lists and string comparisons. 
2. Make this list of strings unique using sets.

Now, consider the list of all US counties:

In [168]:
f_counties = open('counties.txt')
counties = []
for x in f_counties.readlines():
    counties.append(x.strip('\n').split(','))

1. How many unique county names are there?
2. What are the 5 most common county names? *hint: there are many ways to do this. try a dictionary.*
3. You can time functions in python: 

        import time
        start = time.time()
        function(data)
        stop = time.time()
        stop - start
How fast was your uniquifier? Can you make it faster?

In [None]:
!cat 00_ex6.py