# Visualization Iterations
### Casey Alvarado and Jay Woo

In this notebook, we attempted a bunch of different visualization methods to see if which would work best (if at all) for our storytelling purposes.

In [8]:
import pandas 
import matplotlib.pyplot as plt
import seaborn as sns
import vincent
import numpy as np
import operator
%matplotlib inline

In [2]:
df = pandas.read_csv('cleanedData.csv')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,Last,First,Ord,T,Status,D/O,Notes,Diocese,Source/Assignments,Ord_Mod
0,0,Abaya,Rubin,0,P,accused,Diocesan,One of seven priests named as defendants in ...,"Los Angeles, CA",Source:United Press International 02.08.84; U...,=
1,1,Abdon,Andrew,0,B,settled,Brothers of the Christian Schools,"In separate 1995 lawsuits, 2 brothers and anot...","Santa Fe, NM",Source:Obituary and Assignments 08.17.77;Orti...,=
2,2,Abercrombie,Leonard A.,1946,P,accused,Diocesan,"Letter 7/93 to Pope JP II, Stafford, and Mahon...","Denver, CO",Source:LA Archdiocesan Report 2.17.04 page 3;M...,=
3,3,Abeywickrema,Lionel Augustine,1951,P,accused,Diocesan,"Abeywickrema, originally from Sri Lanka, was a...","Santa Fe, NM",Source:Roswell Daily Record 7.1.93;The New Mex...,=
4,4,Abrams,John L.,1950,P,accused,Diocesan,"Some time after 2002 Abrams' ""victims"" report...","Brooklyn, NY",Source:Statement by Diocese of Brooklyn 11.08...,<


We got the number of sexual assault cases by state and saved that information into a dictionary.

In [21]:
from state_dict import *

states = {};

for l in df.Diocese:
    state = l.split(", ")[1]
    if state in state_dict:
        state = state_dict[state]
    else:
        continue
    
    if state in states.keys():
        states[state]+=1; 
    else: 
        states[state] =1; 

Unnamed: 0,State,Count
0,Mississippi,8
1,Oklahoma,8
2,Delaware,38
3,Minnesota,187
4,Alaska,54
5,Illinois,238
6,Arkansas,2
7,New Mexico,71
8,Indiana,47
9,Maryland,69


We also looked at religious demographic data by state to normalize the data.

In [27]:
# "Religious fervor" data
relig_df = pandas.read_csv('state_data.csv')
relig_df.head()

# Adds count columns
relig_df['Count'] = relig_df['State'].map(states)
relig_df['NormalizedCount'] = relig_df['Count']/relig_df['Population']
relig_df.head()

Unnamed: 0,State,Population,Percent_Catholic,Count,NormalizedCount
0,Alabama,4779736,0.07,14,2.929032e-06
1,Alaska,710231,0.16,54,7.60316e-05
2,Arizona,6392017,0.21,61,9.543154e-06
3,Arkansas,2915918,0.08,2,6.858903e-07
4,California,37253956,0.28,503,1.350192e-05


# Iteration #1: Using Vincent

Using Vincent, we whipped up a couple of map visualizations to get a feel for what the data 'looks' like.

Below, we showed the count of sexual assault cases by state, normalized by state population.

In [34]:
vincent.core.initialize_notebook()

state_topo = r'us_states.topo.json'
geo_data = [{'name': 'states',
             'url': state_topo,
             'feature': 'us_states.geo'}
             ]
vis = vincent.Map(data=relig_df, geo_data=geo_data, scale=1000, projection='albersUsa',
                 data_bind='NormalizedCount', data_key='State',
                 map_key={'states': 'properties.NAME'})

vis.to_json('vega.json')
vis.scales[0].type='threshold'
vis.scales['color'].type = 'threshold'
vis.scales['color'].domain = [0, 1e-5, 2e-5, 3e-5, 4e-5, 5e-5, 6e-5, 7e-5, 8e-5, 9e-5, 1e-4]
vis.display()

We compared that with the percentage of Catholics living in each state, and saw certain similarities in various regions (the north east and the West Coast).

In [32]:
state_topo = r'us_states.topo.json'
geo_data = [{'name': 'states',
             'url': state_topo,
             'feature': 'us_states.geo'}
             ]
vis = vincent.Map(data=relig_df, geo_data=geo_data, scale=1000, projection='albersUsa',
                 data_bind='Percent_Catholic', data_key='State',
                 map_key={'states': 'properties.NAME'})

vis.to_json('vega.json')
vis.scales[0].type='threshold'
vis.scales['color'].type = 'threshold'
vis.scales['color'].domain = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5]
vis.display()

# Iteration #2: Using Bokeh

We then looked at making interactive maps, at which point we realized that we didn't really need to focus on making map visualizations.

In [1]:
''' Sample code taken from Bokeh documentation '''

from bokeh.models import HoverTool
from bokeh.plotting import figure, show, output_file, ColumnDataSource
from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment

counties = {
    code: county for code, county in counties.items() if county["state"] == "tx"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

colors = ["#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043"]

county_names = [county['name'] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]
county_colors = [colors[int(rate/3)] for rate in county_rates]

source = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    color=county_colors,
    name=county_names,
    rate=county_rates,
))

TOOLS="pan,wheel_zoom,box_zoom,reset,hover,save"

p = figure(title="Texas Unemployment 2009", tools=TOOLS)

p.patches('x', 'y', source=source,
          fill_color='color', fill_alpha=0.7,
          line_color="white", line_width=0.5)

hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("Name", "@name"),
    ("Unemployment rate)", "@rate%"),
    ("(Long, Lat)", "($x, $y)"),
]

output_file("texas.html", title="texas.py example")

show(p)

RuntimeError: bokeh sample data directory does not exist, please execute bokeh.sampledata.download()

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go


Matplotlib is building the font cache using fc-list. This may take a moment.

