# Task

## Please read in the Chicago Summer 2018 Crimes Dataset located in the repository folder.

Using the data wrangling methods covered in class this week, create a new data frame where:

the unit of observation is the crime type (i.e. primary_type),
the column variables corresponds with the day of the month, and
each cell is populated by the proportion of times that crime type was committed over all days of the month
For example, assume there were just two days in a month and 2 thefts were committed on the first day, and 1 on the second day, then the proportion of thefts committed on the first day would be .66 and .33 on the second day).
Make sure that:

all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
the data is rounded to the second decimal place; and
the data frame is printed at the end of the notebook.


In [42]:
import pandas as pd
crime_dta = pd.read_csv("chicago_summer_2018_crime_data.csv")
#Show the first 10 rows
crime_dta.head(10)

Unnamed: 0,month,day,year,day_of_week,description,location_description,block,primary_type,district,ward,arrest,domestic,latitude,longitude
0,8,4,2018,Saturday,FROM BUILDING,APARTMENT,039XX W WASHINGTON BLVD,THEFT,11,28.0,False,False,,
1,7,26,2018,Thursday,POCKET-PICKING,RESTAURANT,005XX W MADISON ST,THEFT,1,42.0,False,False,,
2,6,24,2018,Sunday,BOGUS CHECK,GROCERY FOOD STORE,004XX E 34TH ST,DECEPTIVE PRACTICE,2,4.0,False,False,,
3,6,13,2018,Wednesday,SIMPLE,RESIDENCE,098XX S EXCHANGE AVE,ASSAULT,4,10.0,False,True,,
4,6,14,2018,Thursday,TO VEHICLE,STREET,001XX S WALLER AVE,CRIMINAL DAMAGE,15,29.0,False,False,,
5,7,2,2018,Monday,CREDIT CARD FRAUD,RESIDENCE,083XX S JUSTINE ST,DECEPTIVE PRACTICE,6,21.0,False,False,,
6,6,1,2018,Friday,PREDATORY,RESIDENCE,087XX S COLFAX AVE,CRIM SEXUAL ASSAULT,4,7.0,False,False,,
7,7,25,2018,Wednesday,OVER $500,RESIDENCE,046XX S LAKE PARK AVE,THEFT,2,4.0,False,False,,
8,7,27,2018,Friday,CRIM SEX ABUSE BY FAM MEMBER,RESIDENCE,004XX E 40TH ST,OFFENSE INVOLVING CHILDREN,2,3.0,False,False,,
9,7,24,2018,Tuesday,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,053XX S CORNELL AVE,DECEPTIVE PRACTICE,2,5.0,False,False,,


In [43]:
#Create a data frame to store "primary_type","month" rows,and calculate the num. of each case in each month
crime_dta1 = crime_dta.filter(['primary_type','month','day']).groupby(['primary_type','month']).size().reset_index(name = 'month_amount')
crime_dta1

Unnamed: 0,primary_type,month,month_amount
0,ARSON,6,36
1,ARSON,7,39
2,ARSON,8,37
3,ASSAULT,6,1872
4,ASSAULT,7,1937
...,...,...,...
86,THEFT,7,6123
87,THEFT,8,6438
88,WEAPONS VIOLATION,6,495
89,WEAPONS VIOLATION,7,546


In [44]:
#Create a data frame to store "primary_type","month","day" rows and calculate the num. of case type in each day.
crime_dta2=crime_dta.filter(['primary_type','month','day']).groupby(['primary_type','month','day']).size().reset_index(name = 'day_amount')
crime_dta2

Unnamed: 0,primary_type,month,day,day_amount
0,ARSON,6,1,2
1,ARSON,6,2,1
2,ARSON,6,3,2
3,ARSON,6,4,1
4,ARSON,6,5,1
...,...,...,...,...
2037,WEAPONS VIOLATION,8,27,15
2038,WEAPONS VIOLATION,8,28,13
2039,WEAPONS VIOLATION,8,29,12
2040,WEAPONS VIOLATION,8,30,22


In [45]:
#Merge two data frames
crime_new=crime_dta2.merge(crime_dta1,how='left',on=["primary_type","month"], indicator=True)
print(crime_new)
#Create a new column called 'percentage' to calculate the target value we want
crime_new['percentage']=(crime_new['day_amount']/crime_new['month_amount']).round(2)
#Delete the unrealted columns
crime_new=crime_new.drop(columns=['_merge','month_amount'])
crime_new

           primary_type  month  day  day_amount  month_amount _merge
0                 ARSON      6    1           2            36   both
1                 ARSON      6    2           1            36   both
2                 ARSON      6    3           2            36   both
3                 ARSON      6    4           1            36   both
4                 ARSON      6    5           1            36   both
...                 ...    ...  ...         ...           ...    ...
2037  WEAPONS VIOLATION      8   27          15           560   both
2038  WEAPONS VIOLATION      8   28          13           560   both
2039  WEAPONS VIOLATION      8   29          12           560   both
2040  WEAPONS VIOLATION      8   30          22           560   both
2041  WEAPONS VIOLATION      8   31          23           560   both

[2042 rows x 6 columns]


Unnamed: 0,primary_type,month,day,day_amount,percentage
0,ARSON,6,1,2,0.06
1,ARSON,6,2,1,0.03
2,ARSON,6,3,2,0.06
3,ARSON,6,4,1,0.03
4,ARSON,6,5,1,0.03
...,...,...,...,...,...
2037,WEAPONS VIOLATION,8,27,15,0.03
2038,WEAPONS VIOLATION,8,28,13,0.02
2039,WEAPONS VIOLATION,8,29,12,0.02
2040,WEAPONS VIOLATION,8,30,22,0.04


In [46]:
#Make an arrangement on the columns and rows, and fill missing values with zero.
crime_new.pivot_table(values='percentage',columns='day', index=['primary_type','month'],
                fill_value=0)

Unnamed: 0_level_0,day,1,2,3,4,5,6,7,8,9,10,...,22,23,24,25,26,27,28,29,30,31
primary_type,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ARSON,6,0.06,0.03,0.06,0.03,0.03,0.03,0.06,0.03,0.00,0.03,...,0.06,0.03,0.14,0.00,0.03,0.03,0.00,0.03,0.03,0.00
ARSON,7,0.05,0.00,0.00,0.00,0.03,0.05,0.08,0.03,0.03,0.03,...,0.08,0.00,0.03,0.00,0.03,0.00,0.08,0.10,0.00,0.03
ARSON,8,0.00,0.05,0.03,0.03,0.05,0.08,0.00,0.08,0.03,0.00,...,0.00,0.00,0.00,0.03,0.00,0.00,0.00,0.03,0.05,0.05
ASSAULT,6,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.03,0.03,0.02,...,0.03,0.03,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.00
ASSAULT,7,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.04,0.03,0.03,...,0.03,0.03,0.04,0.03,0.03,0.03,0.02,0.04,0.02,0.04
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
THEFT,7,0.03,0.04,0.03,0.02,0.03,0.03,0.03,0.02,0.03,0.04,...,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03
THEFT,8,0.03,0.04,0.04,0.04,0.03,0.03,0.04,0.03,0.03,0.04,...,0.03,0.03,0.03,0.03,0.02,0.03,0.03,0.03,0.03,0.03
WEAPONS VIOLATION,6,0.03,0.03,0.03,0.03,0.03,0.03,0.02,0.02,0.04,0.03,...,0.03,0.04,0.05,0.03,0.04,0.03,0.04,0.05,0.04,0.00
WEAPONS VIOLATION,7,0.03,0.02,0.04,0.06,0.04,0.03,0.04,0.04,0.03,0.02,...,0.03,0.04,0.02,0.03,0.03,0.03,0.04,0.06,0.03,0.03
