# W12 lab assignment

In [23]:
import pandas as pd
from urllib.request import urlopen
import json

## Choropleth map

Let's make a choropleth map with Pokemon statistics. The color of a county should correspond to the number of Pokemons found there. You can download the data from Canvas (`pokemon.csv`). The data is a subset of the [pokemon data from Kaggle](https://www.kaggle.com/semioniy/predictemall).

We'll also need an SVG map. You can download it from [Wikipedia](https://upload.wikimedia.org/wikipedia/commons/5/5f/USA_Counties_with_FIPS_and_names.svg).

If you open the SVG with a text editor, you'll see many `<path>` tags. Each of these is a county. We want to change their `style` tags, namely the `fill` color. We want the darkness of `fill` to correspond to the number of Pokemons in each county. 

In the SVG, there is also an `id` tag for each path, which is actually something called a FIPS code. FIPS stands for Federal Information Processing Standard. Every county has a unique FIPS code, and it’s how we are going to associate each path with our pokemon data.

For this we first need to do some data cleaning.

In [24]:
pokemon = pd.read_csv("pokemon.csv")
pokemon.head()

Unnamed: 0,pokemonId,latitude,longitude
0,16,20.525745,-97.460829
1,133,20.523695,-97.461167
2,16,38.90359,-77.19978
3,13,47.665903,-122.312561
4,133,47.666454,-122.311628


The data only has the latitude and longitude data. To convert this to an FIPS code, we need some reverse-geocoding. The Federal Communications Commission provides [an API](https://www.fcc.gov/general/census-block-conversions-api) for such tasks. 

The API works through an HTTP request, so we can use Python's `urllib` library to handle it. For example:

In [25]:
res = urlopen("http://data.fcc.gov/api/block/find?format=json&latitude=28.35975&longitude=-81.421988").read().decode('utf-8')
res

'{"messages":["FCC0001: The coordinate lies on the boundary of mulitple blocks, first FIPS is displayed. For a complete list use showall=true to display \'intersection\' element in the Block"],"Block":{"FIPS":"120950170151016"},"County":{"FIPS":"12095","name":"Orange"},"State":{"FIPS":"12","code":"FL","name":"Florida"},"status":"OK","executionTime":"188"}'

The result comes as a json object, so we need to parse it with Python's `json decoder`.

In [26]:
json.loads(res)

{'Block': {'FIPS': '120950170151016'},
 'County': {'FIPS': '12095', 'name': 'Orange'},
 'State': {'FIPS': '12', 'code': 'FL', 'name': 'Florida'},
 'executionTime': '188',
 'messages': ["FCC0001: The coordinate lies on the boundary of mulitple blocks, first FIPS is displayed. For a complete list use showall=true to display 'intersection' element in the Block"],
 'status': 'OK'}

Now we can access it as a dictionary and get the county's FIPS code.

In [32]:
def get_fips_val(pokemonRow):
    res1 = "http://data.fcc.gov/api/block/find?format=json&latitude=" + str(pokemonRow['latitude']) + "&longitude="+ str(pokemonRow['longitude'])
    res1 = urlopen(res1).read().decode('utf-8')
    return json.loads(res1)['County']['FIPS']

We can do this to all data in the dataframe.  Pandas's  [apply](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) is a very nice feature that you may want to use, it allows you to write a function and apply it to the dataframe.

In [33]:
# TODO: create a column in the dataframe called 'FIPS' for the FIPS codes. 
# You should have the dataframe look like the following.
# Note that looking up all the lat-lon pairs may take some time.
pokemon['FIPS'] = pokemon.apply(get_fips_val, axis=1)

In [34]:
pokemon.head()

Unnamed: 0,pokemonId,latitude,longitude,FIPS
0,16,20.525745,-97.460829,
1,133,20.523695,-97.461167,
2,16,38.90359,-77.19978,51059.0
3,13,47.665903,-122.312561,53033.0
4,133,47.666454,-122.311628,53033.0


We want to color the counties by the number of pokemons appearing in them, so now all we need is a table with the counties' FIPS and number of pokemons in them.

In [35]:
pokemon_density = pd.DataFrame(pokemon.groupby('FIPS').size().reset_index())
pokemon_density.columns = ['FIPS', 'Count']

In [36]:
pokemon_density.head()

Unnamed: 0,FIPS,Count
0,4013,21
1,6037,33
2,6047,4
3,6073,22
4,6075,12


Now we can turn to our SVG file. We want to find the paths for each county: there are over 3000 counties, so we'll need a nice way. For this, we can use the `BeautifulSoup` package. This is a package specialized at parsing XMLs. SVGs are essentially XML files, so can be handled in the same way as handling HTML and other XML files.

In [37]:
from bs4 import BeautifulSoup

Read in the svg

In [39]:
svg = open('USA_Counties_with_FIPS_and_names.svg', 'r').read()

Load it with BeautifulSoup

In [40]:
soup = BeautifulSoup(svg)



 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))


BeautifulSoup has a `findAll()` function that finds all given tags.

In [41]:
paths = soup.findAll('path')

In [42]:
paths[0]

<path d="M 62.678745,259.31235 L 63.560745,258.43135 L 64.220745,257.99135 L 64.439745,258.43135 L 64.000745,258.65135 L 64.439745,258.65135 L 66.643745,257.99135 L 68.626745,255.56635 L 70.388745,256.44835 L 70.388745,256.89035 L 69.727745,257.54935 L 69.727745,258.21235 L 70.388745,257.99135 L 70.829745,256.89035 L 71.269745,256.44835 L 71.930745,257.10835 L 72.150745,257.99135 L 72.811745,258.21235 L 73.030745,257.77135 L 74.131745,257.54935 L 75.894745,257.54935 L 76.113745,257.77135 L 75.673745,258.43135 L 75.673745,258.65135 L 76.996745,258.87235 L 76.774745,259.53235 L 77.656745,259.53235 L 78.757745,258.87235 L 81.180745,258.65135 L 82.722745,259.09235 L 83.386745,259.09235 L 84.044745,259.31235 L 84.267745,259.53235 L 85.148745,259.53235 L 86.249745,259.31235 L 87.572745,259.31235 L 89.114745,259.75435 L 89.554745,259.53235 L 90.436745,258.87235 L 90.655745,258.65135 L 91.096745,258.21235 L 92.639745,258.43135 L 96.163745,259.53235 L 97.264745,263.05835 L 97.925745,265.26135 L

We should also decide on the colors. [colorbrew](http://colorbrewer2.org/#type=sequential&scheme=YlOrRd&n=3) provides some nice palattes. Pick one of the sequential colors and make the hexadecimal encodings into a list.

In [43]:
colors = ['#fef0d9', '#fdd49e', '#fdbb84','#fc8d59','#e34a33','#b30000']

In [44]:
# TODO: substitute the above with a palatte of your choice.
colors = ['#edf8fb', '#ccece6', '#99d8c9','#66c2a4','#2ca25f','#006d2c']

Now we’re going to change the style attribute for each path in the SVG. We’re just interested in fill color, but to make things easier we’re going to replace the entire style instead of parsing to replace only the color. Define the style as the following:

In [45]:
path_style = 'font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-opacity:1;\
stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;\
marker-start:none;stroke-linejoin:bevel;fill:'

Based on the number of pokemons, we want to assign the county to a color class. For example, if number > 50, use color1, if 40 < number <= 50, use color 2, etc.

In [48]:
for p in paths:
    try:
        cnt = int(pokemon_density[pokemon_density['FIPS'] == p['id']]['Count'])
        if cnt > 50: color_class = 4
        elif (cnt> 40 and cnt <= 50):color_class = 3
        elif (cnt > 30 and cnt <= 40):color_class = 2
        elif (cnt > 20 and cnt <= 30):color_class = 1
        else:  color_class = 0 
    except:
        continue
    # TODO: decide color classes 
    color = colors[color_class]
    p['style'] = path_style +";fill:"+ color

Remember that we saved the svg in the `soup` object. Now that we have changed the svg to fill with colors, we can just write it out as a new file.

In [49]:
with open ('svg_colored.svg', 'w') as g:
    g.write(soup.prettify())

Open the new svg in your browser. You'll notice that only a few counties are colored: this is partly because we're only using a subset of the original data. The complete data has 296021 rows and looking up the FIPS will take too much time in class. If interested, you can download the full data and make a completed map.