## Coding Discussion 03 - Week 6
### 4 October 2020
#### Kryslette Bunyi

# Instructions

## Task

Please read in the Chicago Summer 2018 Crimes Dataset located in the repository folder.

Using the data wrangling methods covered in class this week, create a new data frame where:

- the **_unit of observation_** is the crime type (i.e. `primary_type`),
- the **_column variables_** corresponds with the **_day of the month_**, and
- **_each cell_** is populated by the **_proportion of times that crime type was committed over all days of the month_**
    + For example, assume there were just two days in a month and 2 thefts were committed on the first day, and 1 on the second day, then the _proportion_ of thefts committed on the first day would be .66 and .33 on the second day).

Make sure that:

- all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
- the data is rounded to the second decimal place; and
- the data frame is printed at the end of the notebook.

In [1]:
# Load needed package/s
import pandas as pd

In [2]:
# Read in the data
dat = pd.read_csv("chicago_summer_2018_crime_data.csv",
                  index_col="primary_type") #Set primary_type as index

In [3]:
# Conduct visual inspection of data
dat

Unnamed: 0_level_0,month,day,year,day_of_week,description,location_description,block,district,ward,arrest,domestic,latitude,longitude
primary_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
THEFT,8,4,2018,Saturday,FROM BUILDING,APARTMENT,039XX W WASHINGTON BLVD,11,28.0,False,False,,
THEFT,7,26,2018,Thursday,POCKET-PICKING,RESTAURANT,005XX W MADISON ST,1,42.0,False,False,,
DECEPTIVE PRACTICE,6,24,2018,Sunday,BOGUS CHECK,GROCERY FOOD STORE,004XX E 34TH ST,2,4.0,False,False,,
ASSAULT,6,13,2018,Wednesday,SIMPLE,RESIDENCE,098XX S EXCHANGE AVE,4,10.0,False,True,,
CRIMINAL DAMAGE,6,14,2018,Thursday,TO VEHICLE,STREET,001XX S WALLER AVE,15,29.0,False,False,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
BATTERY,6,1,2018,Friday,AGGRAVATED DOMESTIC BATTERY: KNIFE/CUTTING INST,SIDEWALK,085XX S ABERDEEN ST,6,21.0,False,True,41.738877,-87.650992
ASSAULT,6,1,2018,Friday,AGGRAVATED: HANDGUN,ALLEY,045XX W DEMING PL,25,31.0,False,False,41.927301,-87.740067
CRIMINAL TRESPASS,6,1,2018,Friday,TO RESIDENCE,APARTMENT,036XX W GEORGE ST,25,30.0,True,False,41.933720,-87.718571
BATTERY,6,1,2018,Friday,DOMESTIC BATTERY SIMPLE,RESIDENCE,046XX W WEST END AVE,11,28.0,True,True,41.883171,-87.741651


In [4]:
# Summarize data by counting number of observations per day per crime type
sum1 = dat.groupby(["day","primary_type"]).size()
sum1

day  primary_type                     
1    ARSON                                  4
     ASSAULT                              207
     BATTERY                              511
     BURGLARY                             126
     CONCEALED CARRY LICENSE VIOLATION      2
                                         ... 
31   PUBLIC PEACE VIOLATION                 5
     ROBBERY                               53
     SEX OFFENSE                            6
     THEFT                                400
     WEAPONS VIOLATION                     39
Length: 802, dtype: int64

In [5]:
# Configure display so that all columns will be visible
pd.set_option('display.max_columns', None)

# Unstack so that "day" will be transferred as row heading
sum2 = sum1.unstack(0)
sum2

day,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
primary_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
ARSON,4.0,3.0,3.0,2.0,4.0,6.0,5.0,5.0,2.0,2.0,8.0,8.0,2.0,3.0,3.0,2.0,5.0,1.0,5.0,5.0,3.0,5.0,1.0,6.0,1.0,2.0,1.0,3.0,6.0,3.0,3.0
ASSAULT,207.0,188.0,172.0,202.0,209.0,197.0,172.0,195.0,161.0,154.0,170.0,213.0,187.0,185.0,175.0,187.0,195.0,186.0,183.0,179.0,175.0,168.0,182.0,200.0,167.0,174.0,187.0,177.0,194.0,161.0,133.0
BATTERY,511.0,495.0,489.0,576.0,488.0,400.0,455.0,474.0,432.0,438.0,448.0,457.0,456.0,452.0,463.0,465.0,502.0,544.0,482.0,412.0,398.0,423.0,439.0,476.0,485.0,460.0,393.0,450.0,432.0,442.0,274.0
BURGLARY,126.0,109.0,118.0,117.0,101.0,126.0,99.0,96.0,107.0,110.0,111.0,90.0,105.0,102.0,95.0,108.0,102.0,115.0,97.0,130.0,115.0,135.0,102.0,119.0,104.0,104.0,140.0,113.0,107.0,108.0,79.0
CONCEALED CARRY LICENSE VIOLATION,2.0,1.0,2.0,2.0,1.0,2.0,2.0,,1.0,2.0,3.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0,1.0,,1.0,,2.0,3.0,3.0,1.0,1.0,,1.0,2.0
CRIM SEXUAL ASSAULT,24.0,9.0,17.0,21.0,16.0,18.0,13.0,16.0,13.0,13.0,12.0,9.0,8.0,13.0,14.0,20.0,18.0,12.0,13.0,11.0,11.0,14.0,11.0,10.0,14.0,21.0,11.0,15.0,15.0,13.0,5.0
CRIMINAL DAMAGE,254.0,242.0,250.0,265.0,264.0,246.0,251.0,250.0,272.0,235.0,244.0,275.0,234.0,274.0,247.0,286.0,258.0,260.0,251.0,246.0,214.0,287.0,289.0,260.0,273.0,295.0,278.0,265.0,229.0,267.0,170.0
CRIMINAL TRESPASS,63.0,61.0,60.0,52.0,57.0,47.0,57.0,64.0,64.0,58.0,55.0,51.0,59.0,68.0,61.0,49.0,49.0,57.0,56.0,57.0,46.0,51.0,69.0,64.0,60.0,58.0,67.0,65.0,50.0,60.0,44.0
DECEPTIVE PRACTICE,202.0,171.0,163.0,135.0,157.0,182.0,148.0,121.0,136.0,148.0,159.0,140.0,160.0,132.0,171.0,148.0,124.0,166.0,140.0,171.0,152.0,127.0,137.0,154.0,150.0,134.0,172.0,158.0,133.0,153.0,140.0
GAMBLING,8.0,3.0,2.0,1.0,4.0,2.0,4.0,3.0,6.0,5.0,3.0,7.0,9.0,6.0,2.0,2.0,3.0,5.0,6.0,5.0,3.0,2.0,2.0,1.0,5.0,4.0,1.0,2.0,3.0,3.0,3.0


In [6]:
# Take total crime incidence for entire dataset
totals = sum2.sum(axis=1)
totals

primary_type
ARSON                                  112.0
ASSAULT                               5635.0
BATTERY                              14111.0
BURGLARY                              3390.0
CONCEALED CARRY LICENSE VIOLATION       44.0
CRIM SEXUAL ASSAULT                    430.0
CRIMINAL DAMAGE                       7931.0
CRIMINAL TRESPASS                     1779.0
DECEPTIVE PRACTICE                    4684.0
GAMBLING                               115.0
HOMICIDE                               172.0
HUMAN TRAFFICKING                        2.0
INTERFERENCE WITH PUBLIC OFFICER       374.0
INTIMIDATION                            54.0
KIDNAPPING                              47.0
LIQUOR LAW VIOLATION                    83.0
MOTOR VEHICLE THEFT                   2608.0
NARCOTICS                             3047.0
NON-CRIMINAL                             8.0
NON-CRIMINAL (SUBJECT SPECIFIED)         2.0
OBSCENITY                               21.0
OFFENSE INVOLVING CHILDREN             532

In [7]:
#Convert crime count to percentages and round to 2nd decimal place
sum2_pct = round(sum2.div(totals,axis="index"),2)

#Replace missing/NA values with 0
sum2_pct = sum2_pct.fillna(0)

#Print final data frame
sum2_pct

day,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
primary_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
ARSON,0.04,0.03,0.03,0.02,0.04,0.05,0.04,0.04,0.02,0.02,0.07,0.07,0.02,0.03,0.03,0.02,0.04,0.01,0.04,0.04,0.03,0.04,0.01,0.05,0.01,0.02,0.01,0.03,0.05,0.03,0.03
ASSAULT,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.02
BATTERY,0.04,0.04,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.02
BURGLARY,0.04,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.04,0.03,0.04,0.03,0.03,0.04,0.03,0.03,0.03,0.02
CONCEALED CARRY LICENSE VIOLATION,0.05,0.02,0.05,0.05,0.02,0.05,0.05,0.0,0.02,0.05,0.07,0.05,0.02,0.02,0.02,0.05,0.02,0.02,0.05,0.02,0.0,0.02,0.0,0.05,0.07,0.07,0.02,0.02,0.0,0.02,0.05
CRIM SEXUAL ASSAULT,0.06,0.02,0.04,0.05,0.04,0.04,0.03,0.04,0.03,0.03,0.03,0.02,0.02,0.03,0.03,0.05,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.02,0.03,0.05,0.03,0.03,0.03,0.03,0.01
CRIMINAL DAMAGE,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.04,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.02
CRIMINAL TRESPASS,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.02
DECEPTIVE PRACTICE,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.04,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03
GAMBLING,0.07,0.03,0.02,0.01,0.03,0.02,0.03,0.03,0.05,0.04,0.03,0.06,0.08,0.05,0.02,0.02,0.03,0.04,0.05,0.04,0.03,0.02,0.02,0.01,0.04,0.03,0.01,0.02,0.03,0.03,0.03
