# Coding Discussion 03

# Instructions

## Task

Please read in the Chicago Summer 2018 Crimes Dataset located in the repository folder.

Using the data wrangling methods covered in class this week, create a new data frame where:

- the **_unit of observation_** is the crime type (i.e. `primary_type`),
- the **_column variables_** corresponds with the **_day of the month_**, and
- **_each cell_** is populated by the **_proportion of times that crime type was committed over all days of the month_**
    + For example, assume there were just two days in a month and 2 thefts were committed on the first day, and 1 on the second day, then the _proportion_ of thefts committed on the first day would be .66 and .33 on the second day).

Make sure that:

- all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
- the data is rounded to the second decimal place; and
- the data frame is printed at the end of the notebook.


## Things to keep in mind

To answer this question: we'll want to think carefully about assigning an index, aggregating data by groups, and reshaping data. Everything you need is in the lecture notes.


In [2]:
import pandas as pd
from dfply import *

In [3]:
# read in the data and observe the included columns
crime = pd.read_csv("chicago_summer_2018_crime_data.csv")
crime.head()

Unnamed: 0,month,day,year,day_of_week,description,location_description,block,primary_type,district,ward,arrest,domestic,latitude,longitude
0,8,4,2018,Saturday,FROM BUILDING,APARTMENT,039XX W WASHINGTON BLVD,THEFT,11,28.0,False,False,,
1,7,26,2018,Thursday,POCKET-PICKING,RESTAURANT,005XX W MADISON ST,THEFT,1,42.0,False,False,,
2,6,24,2018,Sunday,BOGUS CHECK,GROCERY FOOD STORE,004XX E 34TH ST,DECEPTIVE PRACTICE,2,4.0,False,False,,
3,6,13,2018,Wednesday,SIMPLE,RESIDENCE,098XX S EXCHANGE AVE,ASSAULT,4,10.0,False,True,,
4,6,14,2018,Thursday,TO VEHICLE,STREET,001XX S WALLER AVE,CRIMINAL DAMAGE,15,29.0,False,False,,


In [4]:
# create a subset dataframe that is grouped by the type of crime and day of the month
# count up the instances of each type of crime for each day, then assign this to a new column, counts

crime_days = crime.groupby(['primary_type','day']).size().reset_index(name='counts')
crime_days

Unnamed: 0,primary_type,day,counts
0,ARSON,1,4
1,ARSON,2,3
2,ARSON,3,3
3,ARSON,4,2
4,ARSON,5,4
...,...,...,...
797,WEAPONS VIOLATION,27,46
798,WEAPONS VIOLATION,28,51
799,WEAPONS VIOLATION,29,66
800,WEAPONS VIOLATION,30,56


In [5]:
# with the new counts variable in place, create a pivot table such that the type of event is the index,
# each day of the month is a column, and the count of events on each day is the listed value
crime_table = crime_days.pivot_table('counts', index='primary_type', columns='day')

# fill all NAs as 0, as an empty cell implies no event of that type on that day
crime_table = crime_table.fillna(0)

# for each type of crime, sum up the number of events that happened over each day of the month
crime_table['total']= crime_table.sum(axis=1)

In [6]:
# for each type of event, 
# divide the value of each daily event occurence by the total monthly occurence
# and round to the hundreth spot.

c=0
while c < len(crime_table):
    crime_table.iloc[c] = (crime_table.iloc[c] / crime_table.iloc[c,-1]).round(2)
    c += 1
crime_table

day,1,2,3,4,5,6,7,8,9,10,...,23,24,25,26,27,28,29,30,31,total
primary_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ARSON,0.04,0.03,0.03,0.02,0.04,0.05,0.04,0.04,0.02,0.02,...,0.01,0.05,0.01,0.02,0.01,0.03,0.05,0.03,0.03,1.0
ASSAULT,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.03,0.03,...,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.02,1.0
BATTERY,0.04,0.04,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,...,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.02,1.0
BURGLARY,0.04,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,...,0.03,0.04,0.03,0.03,0.04,0.03,0.03,0.03,0.02,1.0
CONCEALED CARRY LICENSE VIOLATION,0.05,0.02,0.05,0.05,0.02,0.05,0.05,0.0,0.02,0.05,...,0.0,0.05,0.07,0.07,0.02,0.02,0.0,0.02,0.05,1.0
CRIM SEXUAL ASSAULT,0.06,0.02,0.04,0.05,0.04,0.04,0.03,0.04,0.03,0.03,...,0.03,0.02,0.03,0.05,0.03,0.03,0.03,0.03,0.01,1.0
CRIMINAL DAMAGE,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,...,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.02,1.0
CRIMINAL TRESPASS,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.04,0.03,...,0.04,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.02,1.0
DECEPTIVE PRACTICE,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,...,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,1.0
GAMBLING,0.07,0.03,0.02,0.01,0.03,0.02,0.03,0.03,0.05,0.04,...,0.02,0.01,0.04,0.03,0.01,0.02,0.03,0.03,0.03,1.0
