# Coding Discussion 03
#### Madeline Kinnaird


In [1]:
## import pandas

import pandas as pd

## Load and check data:

In [3]:
## load the data 
data = pd.read_csv("chicago_summer_2018_crime_data.csv")

## take a look 
data.sample(5)

Unnamed: 0,month,day,year,day_of_week,description,location_description,block,primary_type,district,ward,arrest,domestic,latitude,longitude
831,6,25,2018,Monday,FRAUD OR CONFIDENCE GAME,APARTMENT,041XX N RAVENSWOOD AVE,DECEPTIVE PRACTICE,19,47.0,False,False,41.95652,-87.673855
43354,7,10,2018,Tuesday,AUTOMOBILE,STREET,044XX S MICHIGAN AVE,MOTOR VEHICLE THEFT,2,3.0,False,False,41.814155,-87.622967
17637,8,11,2018,Saturday,TO PROPERTY,STREET,014XX W 73RD ST,CRIMINAL DAMAGE,7,17.0,False,False,41.76133,-87.659785
37851,7,17,2018,Tuesday,CREDIT CARD FRAUD,OTHER,008XX N MICHIGAN AVE,DECEPTIVE PRACTICE,18,42.0,False,False,41.897895,-87.624097
24942,7,22,2018,Sunday,PUBLIC INDECENCY,RESIDENTIAL YARD (FRONT/BACK),022XX N LINCOLN AVE,SEX OFFENSE,18,43.0,False,False,41.922479,-87.644611


In [4]:
## check columns and types
data.dtypes

month                     int64
day                       int64
year                      int64
day_of_week              object
description              object
location_description     object
block                    object
primary_type             object
district                  int64
ward                    float64
arrest                     bool
domestic                   bool
latitude                float64
longitude               float64
dtype: object

## Problem Statement:
Using the data wrangling methods covered in class this week, create a new data frame where:

- the **_unit of observation_** is the crime type (i.e. `primary_type`),
- the **_column variables_** corresponds with the **_day of the month_**, and
- **_each cell_** is populated by the **_proportion of times that crime type was committed over all days of the month_**
    + For example, assume there were just two days in a month and 2 thefts were committed on the first day, and 1 on the second day, then the _proportion_ of thefts committed on the first day would be .66 and .33 on the second day).

Make sure that:

- all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
- the data is rounded to the second decimal place; and
- the data frame is printed at the end of the notebook.

In [5]:
def agg_and_pivot(group_by, agg_by, df):
    '''
    Function:
    Groups input dataframe by "group_by" variable and then calculates percentages based on "agg_by" variable
    
    Inputs: 
    group_by:column name of df, please type with parenthesis '' Ex: 'primary_type'
    agg_by: column name of df, please type with parenthesis '' Ex: 'day'
    df: your data frame
        
    Outputs: 
    data frame
    '''
    ## let's calculate the proportions that will be our tables values
    
    ## numerator
    group_per_agg = df.groupby([group_by, agg_by]).size()
    
    ## denominator: 
    count_of_group = df.groupby(group_by).size()
    
    ## dividing time! takes the ratio and rounds by 2
    ratio = (group_per_agg/ count_of_group).round(2)
    
    ##create a df to return, pivot so that we see "grouped_by" as first column and "agg_by" as first row.
    return_df = (pd.DataFrame({'Ratio per Unit of Time':ratio})
             .pivot_table(
                 index = group_by,
                 columns = agg_by,
                 fill_value = 0))
         


    return return_df

## Testing the function:

In [6]:
## bada bing bada boom
agg_and_pivot('primary_type', 'day', data)

Unnamed: 0_level_0,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time
day,1,2,3,4,5,6,7,8,9,10,...,22,23,24,25,26,27,28,29,30,31
primary_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
ARSON,0.04,0.03,0.03,0.02,0.04,0.05,0.04,0.04,0.02,0.02,...,0.04,0.01,0.05,0.01,0.02,0.01,0.03,0.05,0.03,0.03
ASSAULT,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.03,0.03,...,0.03,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.02
BATTERY,0.04,0.04,0.03,0.04,0.03,0.03,0.03,0.03,0.03,0.03,...,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.02
BURGLARY,0.04,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,...,0.04,0.03,0.04,0.03,0.03,0.04,0.03,0.03,0.03,0.02
CONCEALED CARRY LICENSE VIOLATION,0.05,0.02,0.05,0.05,0.02,0.05,0.05,0.0,0.02,0.05,...,0.02,0.0,0.05,0.07,0.07,0.02,0.02,0.0,0.02,0.05
CRIM SEXUAL ASSAULT,0.06,0.02,0.04,0.05,0.04,0.04,0.03,0.04,0.03,0.03,...,0.03,0.03,0.02,0.03,0.05,0.03,0.03,0.03,0.03,0.01
CRIMINAL DAMAGE,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,...,0.04,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.03,0.02
CRIMINAL TRESPASS,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.04,0.04,0.03,...,0.03,0.04,0.04,0.03,0.03,0.04,0.04,0.03,0.03,0.02
DECEPTIVE PRACTICE,0.04,0.04,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03,...,0.03,0.03,0.03,0.03,0.03,0.04,0.03,0.03,0.03,0.03
GAMBLING,0.07,0.03,0.02,0.01,0.03,0.02,0.03,0.03,0.05,0.04,...,0.02,0.02,0.01,0.04,0.03,0.01,0.02,0.03,0.03,0.03


## Just for fun, a couple others:

In [7]:
## check arrests by day of week
agg_and_pivot('arrest', 'day_of_week', data)

Unnamed: 0_level_0,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time
day_of_week,Friday,Monday,Saturday,Sunday,Thursday,Tuesday,Wednesday
arrest,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
False,0.15,0.14,0.14,0.14,0.14,0.14,0.14
True,0.16,0.13,0.15,0.15,0.14,0.13,0.14


In [8]:
## check type of crime by ward
agg_and_pivot('primary_type', 'ward', data)


Unnamed: 0_level_0,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time,Ratio per Unit of Time
ward,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,...,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0
primary_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
ARSON,0.0,0.06,0.01,0.0,0.01,0.04,0.03,0.02,0.02,0.05,...,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.0
ASSAULT,0.01,0.04,0.03,0.02,0.03,0.04,0.03,0.03,0.04,0.02,...,0.01,0.04,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01
BATTERY,0.01,0.04,0.03,0.02,0.02,0.04,0.03,0.03,0.03,0.02,...,0.01,0.04,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
BURGLARY,0.02,0.02,0.02,0.02,0.03,0.03,0.04,0.04,0.03,0.02,...,0.01,0.01,0.02,0.01,0.01,0.01,0.02,0.01,0.02,0.02
CONCEALED CARRY LICENSE VIOLATION,0.0,0.05,0.05,0.02,0.0,0.05,0.05,0.0,0.05,0.0,...,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
CRIM SEXUAL ASSAULT,0.02,0.05,0.02,0.03,0.02,0.03,0.03,0.03,0.03,0.01,...,0.0,0.06,0.01,0.02,0.01,0.03,0.01,0.03,0.02,0.01
CRIMINAL DAMAGE,0.01,0.04,0.03,0.02,0.03,0.04,0.03,0.03,0.03,0.02,...,0.01,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.02,0.01
CRIMINAL TRESPASS,0.01,0.04,0.03,0.02,0.03,0.03,0.02,0.02,0.03,0.02,...,0.02,0.06,0.02,0.04,0.01,0.02,0.01,0.02,0.03,0.01
DECEPTIVE PRACTICE,0.02,0.07,0.02,0.03,0.02,0.02,0.01,0.02,0.02,0.01,...,0.02,0.17,0.03,0.03,0.01,0.02,0.02,0.02,0.01,0.01
GAMBLING,0.0,0.03,0.01,0.0,0.03,0.02,0.03,0.05,0.03,0.01,...,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0


In [9]:
## checking out my old chicago ward, s/o 43 !

ward =agg_and_pivot('primary_type', 'ward', data)
ward['Ratio per Unit of Time'][43]


primary_type
ARSON                                0.00
ASSAULT                              0.01
BATTERY                              0.01
BURGLARY                             0.02
CONCEALED CARRY LICENSE VIOLATION    0.00
CRIM SEXUAL ASSAULT                  0.01
CRIMINAL DAMAGE                      0.01
CRIMINAL TRESPASS                    0.02
DECEPTIVE PRACTICE                   0.03
GAMBLING                             0.00
HOMICIDE                             0.00
HUMAN TRAFFICKING                    0.00
INTERFERENCE WITH PUBLIC OFFICER     0.01
INTIMIDATION                         0.02
KIDNAPPING                           0.00
LIQUOR LAW VIOLATION                 0.02
MOTOR VEHICLE THEFT                  0.01
NARCOTICS                            0.00
NON-CRIMINAL                         0.00
NON-CRIMINAL (SUBJECT SPECIFIED)     0.00
OBSCENITY                            0.00
OFFENSE INVOLVING CHILDREN           0.01
OTHER OFFENSE                        0.01
PROSTITUTION         