## Part 2: A short data story / micro-project

> *Exercise*: Create a short data-story based on the work we've done in class so far. (This exercise is a kind of "micro version" of what we'll be doing in the final project). Follow the directions in the bulleted list below when you create your data-story.

* **Find your own story to tell in the work on analyzing the SF Crime Data that we've done so far** (Week 1 - Week 6). The idea is to choose an insight about crime in SF (perhaps how something changed over the years) and communicate that insight to a general audience reader.
* The web-page should be hosted on your GitHub Pages site (Week 7).
* The format must be the classic *Magazine Genre* presented on a single web-page (see the Segel \& Heer paper for details).
* The story should have a brief introduction to the dataset so new readers from outside the class can understand what's going on.
* Your story should include three visualizations. Not more than three, not less than three (but multi-panel figures are OK). The figures should be one of each of the following types
  - One time-series / bar chart (it's OK to use the "fancy" plot-typs like calendar plots or polar bar-charts from Week 2, Part 4).
  - One map (use techniques from Week 3 and 4)
  - One interactive visualization in Bokeh (Week 6)
* **At a minimum, the Bokeh visualization should contain different data** than the exercise we did for Week 6 (it's a plus if it's a new type of viz altogether). 
* The two other visualization may be repetitions of figures created during the previous lectures, or they may be new.
* Make the figures nice. Specifically:
  - Aim to make the figures visually consistent (color, fonts, etc)
  - Follow the recommendations from my video on nice figures (Week 2, part 3)
* In terms of the amount of text, I envision something like 500-1500 words (including figure captions). <font color="gray">Try to write in your own words - the LLMs have a tendency to write a lot of text and not be so precise. So if the writing is all elegantly written but empty prose, we will be critical. It is OK, however, to have the LLM help you get the grammer, etc. right.</font>
* It is a plus if you can back up some of your findings with external sources, such as news stories from the area, looking up which building is located at some set of `lat,lon` coordinates, or similar. (So when you see something happening at some time/place in the data, see if you can understand it more deeply by investigating outside the dataset.) Use real references at the end of the text to organize your links to the outside world.

In [6]:
import pandas as pd
import matplotlib.pyplot as plt 
import numpy as np

#First read in the csv file 
data = pd.read_csv("Police_Department.csv")

data['Date'] = pd.to_datetime(data['Date'])
data['Time'] = pd.to_datetime(data["Time"], format= '%H:%M')

focuscrimes = set(['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT'])

#Extract the Category Column from the dataset and create a plot for each element in Focus Crime array
focuscrimes_elems = data.loc[data["Category"].isin(focuscrimes)]

In [7]:
#Filter dataset for DRUG/NARCOTIC
data = data[(data['Date'].dt.year >=2003) & (data['Date'].dt.year <= 2017)]
data_narco = data[data['Category'] == 'DRUG/NARCOTIC']
data_narco

Unnamed: 0,PdId,IncidntNum,Incident Code,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,...,Fix It Zones as of 2017-11-06 2 2,DELETE - HSOC Zones 2 2,Fix It Zones as of 2018-02-07 2 2,"CBD, BID and GBD Boundaries as of 2017 2 2","Areas of Vulnerability, 2016 2 2",Central Market/Tenderloin Boundary 2 2,Central Market/Tenderloin Boundary Polygon - Updated 2 2,HSOC Zones as of 2018-06-05 2 2,OWED Public Spaces 2 2,Neighborhoods 2
18,3071603916100,30716039,16100,DRUG/NARCOTIC,POSSESSION OF HEROIN,Thursday,2003-06-12,1900-01-01 20:25:00,INGLESIDE,"ARREST, BOOKED",...,,,,,2.0,,,,,80.0
31,6000732516650,60007325,16650,DRUG/NARCOTIC,POSSESSION OF METH-AMPHETAMINE,Tuesday,2006-01-03,1900-01-01 01:15:00,TENDERLOIN,"ARREST, BOOKED",...,18.0,,18.0,6.0,2.0,1.0,1.0,,,20.0
37,3146788116710,31467881,16710,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA,Thursday,2003-12-18,1900-01-01 00:01:00,TENDERLOIN,"ARREST, BOOKED",...,,,,6.0,2.0,1.0,1.0,,,20.0
80,3061826916710,30618269,16710,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA,Wednesday,2003-05-21,1900-01-01 09:20:00,SOUTHERN,"ARREST, BOOKED",...,3.0,1.0,3.0,7.0,2.0,1.0,1.0,1.0,35.0,32.0
167,6010311116020,60103111,16020,DRUG/NARCOTIC,PLANTING/CULTIVATING MARIJUANA,Friday,2006-01-27,1900-01-01 11:59:00,RICHMOND,NONE,...,,,,,2.0,,,,,103.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2129328,17084842116704,170848421,16704,DRUG/NARCOTIC,LOITERING WHERE NARCOTICS ARE SOLD/USED,Tuesday,2017-10-17,1900-01-01 02:00:00,NORTHERN,"ARREST, BOOKED",...,,,,,2.0,1.0,1.0,,,20.0
2129398,17098433216030,170984332,16030,DRUG/NARCOTIC,POSSESSION OF MARIJUANA FOR SALES,Tuesday,2017-12-05,1900-01-01 11:23:00,NORTHERN,"ARREST, BOOKED",...,,,,,1.0,,,,,15.0
2129476,17087629016710,170876290,16710,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA,Thursday,2017-10-26,1900-01-01 16:24:00,SOUTHERN,"ARREST, BOOKED",...,,,,5.0,2.0,,,,35.0,19.0
2129517,16026805516710,160268055,16710,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA,Friday,2016-04-01,1900-01-01 10:36:00,NORTHERN,"ARREST, BOOKED",...,,,,10.0,2.0,,,,,50.0


In [8]:
new_data = data_narco.loc[:, ['Category', 'Date']]
grouped = new_data.groupby(new_data.Date.dt.year)['Date'].count()
grouped = grouped.to_frame()

x = grouped.index
y = grouped.iloc[:,0]

df = pd.DataFrame(data={
    'Year': x,
    'Crime Count': y
})

df

Unnamed: 0_level_0,Year,Crime Count
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2003,2003,9784
2004,2004,9792
2005,2005,8444
2006,2006,8943
2007,2007,10360
2008,2008,11456
2009,2009,11771
2010,2010,9036
2011,2011,6802
2012,2012,6307


In [14]:
textBased = x.astype(str)
x

Index([2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
       2015, 2016, 2017],
      dtype='int32', name='Date')

In [15]:
import plotly.express as px

fig = px.bar(x=x,
             y=y, 
             orientation='v', 
             title='Bar plot of DRUG/NARCOTIC related crimes per year',
             labels={'y': 'Crime count', 'x': 'Year'},
             )
fig.update_layout(showlegend=False, xaxis = dict(
                    tickmode='array', #change 1
                    tickvals = x, #change 2
                    ticktext = textBased, #change 3
                    ),)
#fig.update_traces(base="markers+lines", hovertemplate=None)

fig.show()
fig.update_layout(showlegend=False)
fig.write_html("intro-bar-chart.html")

In [49]:
descriptions = data_narco.loc[:, ['Category', 'Descript', 'Date']]
#descriptions.groupby(pd.Grouper(key='Date', axis=0, freq='Y'))['Descript']
descriptions.set_index('Date', inplace=True)
descriptions

Unnamed: 0_level_0,Category,Descript
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2003-06-12,DRUG/NARCOTIC,POSSESSION OF HEROIN
2006-01-03,DRUG/NARCOTIC,POSSESSION OF METH-AMPHETAMINE
2003-12-18,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA
2003-05-21,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA
2006-01-27,DRUG/NARCOTIC,PLANTING/CULTIVATING MARIJUANA
...,...,...
2017-10-17,DRUG/NARCOTIC,LOITERING WHERE NARCOTICS ARE SOLD/USED
2017-12-05,DRUG/NARCOTIC,POSSESSION OF MARIJUANA FOR SALES
2017-10-26,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA
2016-04-01,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA
