# Week 6

## Exercise 1

Three elements to keep in mind:
- Start with a question (what do you want to communicate)
- Allow the user to explore
- Know your audience, design for them lysis?

Example of cool data viz:
- https://www.reuters.com/world-coronavirus-tracker-and-maps/graphics/world-coronavirus-tracker-and-maps/vaccination-rollout-and-access/
- Shows coronavirus vaccination rates and other information.
- Overview first:
    - Shows the whole world, using colors to show how vaccinated a country is
- Zoom and filter:
    - User can click on a country or scroll further down the page for more information on the specific continents and regions
- Details on demand:
    - Click on a country and get more detailed information. More graphs and plots further down the page, you can find a specific country and see much more   

Exploratory vs explanatory data analysis:
- Overall, explanatory data analysis is for when you've already explored the data, and need to show your findings to others
- Often, this is a new audience, who may not understand the dataset and specific data points
- So it is up to you to explain and teach the audience about your data. 

## Exercise 2

In [48]:
import pandas as pd
import matplotlib.pyplot as plt
import bokeh

data = pd.read_csv("Police_Department_Incident_Reports__Historical_2003_to_May_2018_20240130.csv")
focuscrimes = ['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 
               'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'TRESPASS', 'LARCENY/THEFT', 
               'VANDALISM', 'VEHICLE THEFT', 'FRAUD', 'DRUG/NARCOTIC', 'DISORDERLY CONDUCT']
data["Date"] = pd.to_datetime(data["Date"])
data = data[(data["Date"] >= "01-01-2010") & (data["Date"] <= "31-12-2017")]
data["Time"] = pd.to_datetime(data["Time"])
data["Hour"] = data["Time"].dt.hour
data_hour = data.groupby(by=["Hour", "Category"]).size()

# Group by 'Category' and 'Hour' and count the number of incidents.
category_hour_counts = data.groupby(['Category', 'Hour']).size().reset_index(name='Counts')

# Calculate the total counts for each category over the entire period.
total_counts_by_category = category_hour_counts.groupby('Category')['Counts'].sum().reset_index(name='TotalCounts')

# Merge the total counts back to the hourly data.
merged_data = category_hour_counts.merge(total_counts_by_category, on='Category')

# Normalize the data by dividing the hourly counts by the total counts for each category.
merged_data['Normalized'] = merged_data['Counts'] / merged_data['TotalCounts']

# Pivot the table to have hours as rows and categories as columns with the normalized values as the data.
normalized_pivot = merged_data.pivot(index='Hour', columns='Category', values='Normalized').fillna(0)

# You may want to sort the categories if needed
sorted_categories = normalized_pivot.reindex(sorted(normalized_pivot.columns), axis=1)
sorted_names = sorted(normalized_pivot.columns)
print(sorted_names)

    

  data = pd.read_csv("Police_Department_Incident_Reports__Historical_2003_to_May_2018_20240130.csv")
  data["Time"] = pd.to_datetime(data["Time"])


['ARSON', 'ASSAULT', 'BAD CHECKS', 'BRIBERY', 'BURGLARY', 'DISORDERLY CONDUCT', 'DRIVING UNDER THE INFLUENCE', 'DRUG/NARCOTIC', 'DRUNKENNESS', 'EMBEZZLEMENT', 'EXTORTION', 'FORGERY/COUNTERFEITING', 'FRAUD', 'GAMBLING', 'KIDNAPPING', 'LARCENY/THEFT', 'LIQUOR LAWS', 'LOITERING', 'MISSING PERSON', 'NON-CRIMINAL', 'OTHER OFFENSES', 'PORNOGRAPHY/OBSCENE MAT', 'PROSTITUTION', 'RECOVERED VEHICLE', 'ROBBERY', 'SECONDARY CODES', 'SEX OFFENSES, FORCIBLE', 'SEX OFFENSES, NON FORCIBLE', 'STOLEN PROPERTY', 'SUICIDE', 'SUSPICIOUS OCC', 'TREA', 'TRESPASS', 'VANDALISM', 'VEHICLE THEFT', 'WARRANTS', 'WEAPON LAWS']


In [59]:
sorted_categories.to_csv("sorted.csv", index=True);

In [72]:
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, FactorRange, Legend
from bokeh.palettes import Category20
from bokeh.transform import linear_cmap

source = bokeh.plotting.ColumnDataSource(sorted_categories)
hours = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23']
p = figure(x_range=FactorRange(factors=hours), height=350, title="Category Hour Counts",
           toolbar_location=None, tools="")
bar ={} # to store vbars
items = [] ### for the custom legend // you need to figure out where to add it
palette = Category20

# Map each crime to a color
crime_to_color = {crime: Category20[len(focuscrimes)][i] for i, crime in enumerate(focuscrimes)}

### here we will do a for loop:
for indx,i in enumerate(focuscrimes):
    #cmap = linear_cmap(field_name = 'Hour', palette=Category20, low=sorted_categories.min().min(), high=sorted_categories.max().max())
    bar[i] = p.vbar(x = 'Hour',  top=i, source= source, 
                    ### we will create a vbar for each focuscrime
                    muted_alpha = 0.2, muted = True, width = 0.8, color=crime_to_color[i])
    
    items.append((i, [bar[i]])) ### figure where to add it
#i stands for a column that we use, top=y; we are specifying that our numbers comes from column i
#read up what legend_label, muted and muted_alpha do... you can add more attributes (you HAVE TO)

legend = Legend(items=items, location=[0,-30]) ## figure where to add it
p.add_layout(legend, 'right') ## figure where to add it
### if you read the guide, it will make sense

p.legend.click_policy="mute" ### assigns the click policy (you can try to use ''hide'
show(p) #displays your plot
