# Part 1: More on narrative data visualization

What's the point of Figure 7?
> Present how the visualisations were created, which mechanism were used, how the interactiveness were achieved in order to present the data.

Use Figure 7 to find the most common design choice within each category for the Visual narrative and Narrative structure (the categories within visual narrative are 'visual structuring', 'highlighting', etc).
> Visual Narrative:
> * Consistent Visual Platform
> * Feature Distinction
> Narrative Structure:
> * User Directed Path
> * Hover Highlighting / Details
> * Filtering / Selection / Search
> * Very Limited Interactivity
> * Explicit Instruction
> * Captions / Headlines

Check out Figure 8 and section 4.3. What is your favorite genre of narrative visualization? Why? What is your least favorite genre? Why?
> Partition poster – it looks like infographics containing a lot of information in a very comprehendable way.
> Magazine style – to acquire the knowledge it is crucial to read the text which may take a lot of time for human being.

Which genre is the "How the virus got out" NYT piece?
> Combination of slide show and video. The video is interactive and split into sections.

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
from bokeh.io import output_notebook, show
from bokeh.models import ColumnDataSource, FactorRange, Legend
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral5
import seaborn as sns

In [2]:
output_notebook()

## DataFrame creation

In [3]:
dataframe = pd.read_csv('../data/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv')

In [4]:
start, end = datetime(2010, 1, 1), datetime(2018, 12, 31)
dataframe['Date'] = dataframe['Date'].astype('datetime64[ns]')
focuscrimes = set(['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT'])

dataframe = dataframe.loc[
    (dataframe['Category'].isin(focuscrimes)) &
    (dataframe['Date'] >= start) &
    (dataframe['Date'] <= end)
]

In [5]:
dataframe['Hour'] = dataframe['Time'].apply(lambda x: int(x.split(':')[0]))

## Data preprocessing

In order to create a required dataframe, the aggregation should be used for calculating the number of crimes occurrences per hour. To do that, a group by method with count aggregate was used on _Hour, Category_ and any other column which was used for calulcuating the count, in this case _IncidntNum_.

In [6]:
crimes = dataframe[['Category', 'Hour', 'IncidntNum']]
crimes = crimes.groupby(['Hour', 'Category']).count().reset_index()

Afterward, the _Normalisation_ was calculated by dividing the number of occurrences of crimes per hour by the sum of the total sum of occurrences of a given crime. Then, the _IncidntNum_ column is not needed anymore.

In [7]:
crimes['Normalisation'] = crimes.reset_index().apply(
    lambda x: x['IncidntNum'] / crimes.groupby('Category').sum().loc[x['Category'], 'IncidntNum'],
    axis=1
)
crimes = crimes.drop(['IncidntNum'], axis=1)

Each row of the dataframe should represent the hour of the day and columns showing the categories of crimes. To rearrange the frame, a _pivot_table_ method was used, where the _Hour_ column becomes an index and the columns stand for categories.

In [8]:
crimes_pivot = pd.pivot_table(crimes, index='Hour', columns='Category')

After pivoting the table, the MultiIndex was created. It would be easier for later analysis to transform the frame to its simple form. To do that, _droplevel_ method was used to get rid of _Distribution_ index. And then, to remove the _Category_ level, the columns were reassigned. Moreover, for the sake of categorical variables, the hours were cast to the string type.

In [9]:
crimes_pivot.columns = crimes_pivot.columns.droplevel(0)
crimes_pivot.columns = ['ASSAULT', 'BURGLARY', 'DISORDERLY CONDUCT',
       'DRIVING UNDER THE INFLUENCE', 'DRUG/NARCOTIC', 'DRUNKENNESS',
       'LARCENY/THEFT', 'PROSTITUTION', 'ROBBERY', 'STOLEN PROPERTY',
       'TRESPASS', 'VANDALISM', 'VEHICLE THEFT', 'WEAPON LAWS']

crimes_pivot.index = crimes_pivot.index.map(str)

## Data visualisation

To create a bar chart using _bokeh_ and _pandas_ libraries, the _ColumnDatatSource_ class was used with the preprocessed frame. Then, the _figure_ object was created, containing the hours in a form of categorical variables. The bokeh toolbar was turned off and the labelling of axes and title was generated.

In [10]:
source = ColumnDataSource(crimes_pivot)
p = figure(
    x_range=FactorRange(factors=crimes_pivot.index),
    plot_width=1500, toolbar_location=None,
    title='Crimes per hour', x_axis_label='Hour of the day', y_axis_label='Relative frequency')

Lastly, categories are plotted iteratively in one figure, in order to layer the bars. For each focuscrime, a value per hour is taken. Additional parameters were provided:
* _width_ — the width of the bar,
* _color_ — colouring lines, bars and muted bars at the same time,
* _fill_alpha_ — opacity of the bar filling,
* _muted_ — mute (turn off visibility) of all features,
* _muted_alpha_ — opacity of muted bars.

Moreover, during the itetrations, additional object (_bar, items_) were used to create a better legend than the default one. The legend was created manually by passing _items_ object containing categories names and corresponding _vbar_ objects. The legend position was established and clicking policy to muting was established. 

In [11]:
# Using seaborn colour palette in the Hex format for colouring each category of the crime in each iteration.
cmap = sns.color_palette('husl', len(crimes_pivot.columns)).as_hex()

bar = {}
items = list()

for index, crime in enumerate(focuscrimes):
    bar[crime] = p.vbar(
        x='Hour', top=crime, source=source,
        width=0.6,
        color=cmap[index], fill_alpha=0.5,
        muted=True, muted_alpha=0.05)
    items.append((crime, [bar[crime]]))
    
# "Sticking" the bars to the x-axis
p.y_range.start = 0

legend = Legend(items=items)
p.add_layout(legend, 'left')
p.legend.click_policy = 'mute'
show(p)

The plot visualised in the GIF file:
![caption](https://github.com/pdarulewski/social_data_analysis_and_visualisation/blob/master/week_8/Screen%20Recording%202020-03-24%20at%2018.45.57.gif?raw=true)