Please read in the Chicago Summer 2018 Crimes Dataset located in the repository folder.

Using the data wrangling methods covered in class this week, create a new data frame where:

the unit of observation is the crime type (i.e. primary_type),
the column variables corresponds with the day of the month, and
each cell is populated by the proportion of times that crime type was committed over all days of the month
For example, assume there were just two days in a month and 2 thefts were committed on the first day, and 1 on the second day, then the proportion of thefts committed on the first day would be .66 and .33 on the second day).
Make sure that:

all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
the data is rounded to the second decimal place; and
the data frame is printed at the end of the notebook

To answer this question: we'll want to think carefully about assigning an index, aggregating data by groups, and reshaping data. Everything you need is in the lecture notes.

In [1]:
import pandas as pd
import numpy as np

In [2]:
Chicago_crime = pd.read_csv(r'chicago_summer_2018_crime_data.csv')
Chicago_crime.columns

Index(['month', 'day', 'year', 'day_of_week', 'description',
       'location_description', 'block', 'primary_type', 'district', 'ward',
       'arrest', 'domestic', 'latitude', 'longitude'],
      dtype='object')

In [3]:
#We have crime data for June, July and August of 2018
Chicago_crime.month.unique()

array([8, 7, 6], dtype=int64)

In [4]:
#All 31 possible numbers for days are present. Therefore, we'll use the days column for the pivot tavle
Chicago_crime.day.nunique()

31

# Solution-1 (Cross tab function, normalized by index)

In [5]:
#created a data frame at crime type and month level, across days. 
#Crosstab method creates a frequency table. 
#The normalize='index' argument is used to create proportions of each frequency with respect to all the entries in that row.

CC_by_monthnday=pd.crosstab(index=[Chicago_crime.primary_type, Chicago_crime.month ], columns=Chicago_crime.day, values=None, normalize='index').round(2)
CC_by_monthnday.head()

Unnamed: 0_level_0,day,1,2,3,4,5,6,7,8,9,10,...,22,23,24,25,26,27,28,29,30,31
primary_type,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ARSON,6,0.06,0.03,0.06,0.03,0.03,0.03,0.06,0.03,0.0,0.03,...,0.06,0.03,0.14,0.0,0.03,0.03,0.0,0.03,0.03,0.0
ARSON,7,0.05,0.0,0.0,0.0,0.03,0.05,0.08,0.03,0.03,0.03,...,0.08,0.0,0.03,0.0,0.03,0.0,0.08,0.1,0.0,0.03
ARSON,8,0.0,0.05,0.03,0.03,0.05,0.08,0.0,0.08,0.03,0.0,...,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.03,0.05,0.05
ASSAULT,6,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.03,0.03,0.02,...,0.03,0.03,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.0
ASSAULT,7,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.04,0.03,0.03,...,0.03,0.03,0.04,0.03,0.03,0.03,0.02,0.04,0.02,0.04


In [6]:
#Changing from pivot table to records format

data_set = pd.DataFrame(CC_by_monthnday.to_records())
data_set

Unnamed: 0,primary_type,month,1,2,3,4,5,6,7,8,...,22,23,24,25,26,27,28,29,30,31
0,ARSON,6,0.06,0.03,0.06,0.03,0.03,0.03,0.06,0.03,...,0.06,0.03,0.14,0.00,0.03,0.03,0.00,0.03,0.03,0.00
1,ARSON,7,0.05,0.00,0.00,0.00,0.03,0.05,0.08,0.03,...,0.08,0.00,0.03,0.00,0.03,0.00,0.08,0.10,0.00,0.03
2,ARSON,8,0.00,0.05,0.03,0.03,0.05,0.08,0.00,0.08,...,0.00,0.00,0.00,0.03,0.00,0.00,0.00,0.03,0.05,0.05
3,ASSAULT,6,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.03,...,0.03,0.03,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.00
4,ASSAULT,7,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.04,...,0.03,0.03,0.04,0.03,0.03,0.03,0.02,0.04,0.02,0.04
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,THEFT,7,0.03,0.04,0.03,0.02,0.03,0.03,0.03,0.02,...,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03
87,THEFT,8,0.03,0.04,0.04,0.04,0.03,0.03,0.04,0.03,...,0.03,0.03,0.03,0.03,0.02,0.03,0.03,0.03,0.03,0.03
88,WEAPONS VIOLATION,6,0.03,0.03,0.03,0.03,0.03,0.03,0.02,0.02,...,0.03,0.04,0.05,0.03,0.04,0.03,0.04,0.05,0.04,0.00
89,WEAPONS VIOLATION,7,0.03,0.02,0.04,0.06,0.04,0.03,0.04,0.04,...,0.03,0.04,0.02,0.03,0.03,0.03,0.04,0.06,0.03,0.03


In [7]:
print( f'Data structure of the object CC_by_monthnday is: {type(CC_by_monthnday)} \nShape of the desired dataframe is: {CC_by_monthnday.shape} ')


Data structure of the object CC_by_monthnday is: <class 'pandas.core.frame.DataFrame'> 
Shape of the desired dataframe is: (91, 31) 


# Solution-2 (2-D numpy array for porportion calculations)

In [8]:
#Droping all columns except for the required.(Column wise operations)
CC_subset = Chicago_crime.loc[:,['primary_type','month','day']]
CC_subset.head()

Unnamed: 0,primary_type,month,day
0,THEFT,8,4
1,THEFT,7,26
2,DECEPTIVE PRACTICE,6,24
3,ASSAULT,6,13
4,CRIMINAL DAMAGE,6,14


In [9]:
#Created pivot table, missing values are replced by 0 and values are rounded to two decimals
data=pd.pivot_table(CC_subset, index=['primary_type', 'month'], columns='day', values=None, aggfunc=len,fill_value=0).round(2)
data=pd.DataFrame(data.to_records())

#We will convert the day columns in the above data frame into a 2-D numpy array. 
#We will then calculate proportions on the 2-D numpy array. 
#Once we have the proportions, we convert the 2-D array into a dataframe and insert primary_type and month columns.

primary_type =data.primary_type
month=data.month

data = data.set_index(['primary_type','month'])
data =data.to_numpy(dtype=float)

for row in range(data.shape[0]):
    data[row] = data[row]/ np.sum(data[row])  #Calculating proportions

#Numpy 2-D array to pandas dataframe
column_index = np.arange(1,32,1)
final =pd.DataFrame(data, columns=column_index).round(2)

#Inserting primary_type, month columns at 1 and 2 positions respectively
final.insert(0, 'primary_type', primary_type)
final.insert(1, 'month', month)
final

Unnamed: 0,primary_type,month,1,2,3,4,5,6,7,8,...,22,23,24,25,26,27,28,29,30,31
0,ARSON,6,0.06,0.03,0.06,0.03,0.03,0.03,0.06,0.03,...,0.06,0.03,0.14,0.00,0.03,0.03,0.00,0.03,0.03,0.00
1,ARSON,7,0.05,0.00,0.00,0.00,0.03,0.05,0.08,0.03,...,0.08,0.00,0.03,0.00,0.03,0.00,0.08,0.10,0.00,0.03
2,ARSON,8,0.00,0.05,0.03,0.03,0.05,0.08,0.00,0.08,...,0.00,0.00,0.00,0.03,0.00,0.00,0.00,0.03,0.05,0.05
3,ASSAULT,6,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.03,...,0.03,0.03,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.00
4,ASSAULT,7,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.04,...,0.03,0.03,0.04,0.03,0.03,0.03,0.02,0.04,0.02,0.04
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,THEFT,7,0.03,0.04,0.03,0.02,0.03,0.03,0.03,0.02,...,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03
87,THEFT,8,0.03,0.04,0.04,0.04,0.03,0.03,0.04,0.03,...,0.03,0.03,0.03,0.03,0.02,0.03,0.03,0.03,0.03,0.03
88,WEAPONS VIOLATION,6,0.03,0.03,0.03,0.03,0.03,0.03,0.02,0.02,...,0.03,0.04,0.05,0.03,0.04,0.03,0.04,0.05,0.04,0.00
89,WEAPONS VIOLATION,7,0.03,0.02,0.04,0.06,0.04,0.03,0.04,0.04,...,0.03,0.04,0.02,0.03,0.03,0.03,0.04,0.06,0.03,0.03
