# Part 1 More on narrative data visualization

Exercise: Questions to the text

* What's the point of Figure 7?

Categorising the different information mediums, and listing some of the methods that each article is using.

* Use Figure 7 to find the most common design choice within each category for the Visual narrative and Narrative structure (the categories within visual narrative are 'visual structuring', 'highlighting', etc).

the Annotated Graph / Map, mostly uses a consistent visual platform, and also some animated transitions.
Partioned poster again uses a consistent visual platform but also makes use of an establishing shot.

The film and animation makes use of a lot of the different visula narratives.


* Check out Figure 8 and section 4.3. What is your favorite genre of narrative visualization? Why? What is your least favorite genre? Why?

I like Annotated charts and partioned posters, in these you can usually dig into different aspects of the presented theme and focus on what you find interesting. Film videos are also nice since you don't have to read :)

* Which genre is the "How the virus got out" NYT piece?
it's a slideshow

# Part 2 Toggling Histograms

First thing we need to do is import the data and convert it to the correct format.

We choose to only look at the focuscrimes.<br>
Then we want the date data to be easily accesible so we extract all the relevant information and put it in seperate collumns.

In [1]:
import numpy as np
import pandas as pd
from datetime import date

focuscrimes = set(['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT'])
police_data_all = pd.read_csv('../police_data.csv')

police_data = police_data_all.where(police_data_all.Category.isin(focuscrimes))

police_data['Date'] = pd.to_datetime(police_data['Date'], format="%m/%d/%Y")
police_data['Time'] = pd.to_datetime(police_data['Time'], format="%H:%M")
police_data['Year'] = police_data['Date'].dt.year
police_data['Month'] = police_data['Date'].dt.month
police_data['Hour'] = police_data['Time'].dt.hour
police_data['Hour_of_week'] = police_data['Date'].dt.dayofweek * 24 + (police_data['Hour'] + 1)
police_data['Day'] = police_data['Date'].dt.day
police_data['Minute'] = police_data['Time'].dt.minute


Now with the data imported we normalize it so it better compares between different crimetypes.

We do this by splitting every category up into 24 chunks of 1 hour, and the dividing each chunk by the total amount of that crime.

This results in a value between 0 and 1 that describes the percentage of a given crime in that time interval.<br>
The results can be seen below.

In [2]:
crime_hour_norm_df = pd.DataFrame()
crime_hour_norm_df.insert(0, 'Hour', np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23])) 
for i, category in enumerate(focuscrimes):
    df = police_data[police_data.Category == category]
    df_hour = df['Category'].groupby(df['Hour']).count()
    
    #normalize
    df_hour= df_hour/df['Category'].count()
    
    #insert normalized data into dataframe
    crime_hour_norm_df.insert(i+1, category, df_hour) 
    

crime_hour_norm_df.head()

Unnamed: 0,Hour,TRESPASS,VEHICLE THEFT,LARCENY/THEFT,DRIVING UNDER THE INFLUENCE,STOLEN PROPERTY,DRUNKENNESS,DISORDERLY CONDUCT,VANDALISM,BURGLARY,ROBBERY,ASSAULT,PROSTITUTION,WEAPON LAWS,DRUG/NARCOTIC
0,0,0.029565,0.037669,0.03979,0.126234,0.047094,0.08447,0.052291,0.056299,0.03819,0.053484,0.055754,0.08245,0.051902,0.031556
1,1,0.020669,0.02571,0.024538,0.114951,0.033134,0.07989,0.038347,0.039308,0.024764,0.054845,0.047541,0.063948,0.038275,0.019251
2,2,0.023292,0.020355,0.015983,0.11548,0.027752,0.067576,0.034363,0.037076,0.026687,0.057207,0.043088,0.047602,0.034137,0.015531
3,3,0.019024,0.012662,0.010043,0.051128,0.021529,0.026359,0.021514,0.02497,0.027495,0.034421,0.021999,0.032753,0.021139,0.010533
4,4,0.01414,0.009779,0.006533,0.019217,0.019174,0.013536,0.015936,0.016733,0.024393,0.022303,0.013981,0.028322,0.014257,0.007682


Now with the data ready we just need to displat it in the Bokeh plot.

This is mostly just fiddling with a lot of settings, <br>
in colors we store all the colors for the bars.<br>
In items we store the different crimes to create a legend later, notice how we link it to the bars in line 23 items.append((i, [bar[i]])) <br>

Most of the settings make sense on their own, but some interesting ones that we played with was: <br>
* visible, we decided to hide all the graphs except for one to begin with, since you usually only want to compare two at a time, and turning everything off is tedious.
* fill_alpha, this is the transparency of the bars and we needed to set it to something lower that 1 so that the bars would read through eachother.
* toolbar_location, we decided to hide the main toolbar since it didn't add much functionality and just cluttered up the plot.

In [5]:
from bokeh.io import output_file, show, output_notebook
from bokeh.models import ColumnDataSource
from bokeh.models import FactorRange
from bokeh.plotting import figure
from bokeh.models import Legend

output_notebook() # for outputting to notebook
source = ColumnDataSource(crime_hour_norm_df) # data importing
colors = ["#a83232", "#a86932", "#a8a232", "#7da832", "#32a83c", "#32a87f", "#3283a8","#324aa8", "#5d32a8", "#9432a8", "#a83273", "#a83248", "#b59399", "#2e292a"]

hours = []
for i in range(24):
    hours.append(str(i))

p = figure(x_range = FactorRange(factors=hours), plot_height=400, plot_width=900, title="Normalized crime rate over a day", 
           toolbar_location=None)
items=[]
bar ={} # to store vbars

for indx,i in enumerate(focuscrimes):
    
    bar[i] = p.vbar(x='Hour',  top=i, source= source, visible = False, width= 0.8, color =colors[indx], fill_alpha =0.6) 
    items.append((i, [bar[i]]))
    
#p.legend.click_policy="hide" ### assigns the click policy (you can try to use ''hide'
#p.legend.location = 'top_left'

legend = Legend(items=items, location=(0,0))
legend.click_policy="hide"

p.add_layout(legend, 'left')
p.xaxis.axis_label = "Hour of day"
p.yaxis.axis_label = "Percentage of crime"

bar['TRESPASS'].visible = True #we start with weapons laws just to display something

show(p) #displays your plot
