# Coding Discussion 03
# Joanne Lauer
# jml450

# Instructions

## Task

Please read in the Chicago Summer 2018 Crimes Dataset located in the repository folder.

Using the data wrangling methods covered in class this week, create a new data frame where:

- the **_unit of observation_** is the crime type (i.e. `primary_type`),
- the **_column variables_** corresponds with the **_day of the month_**, and
- **_each cell_** is populated by the **_proportion of times that crime type was committed over all days of 
      the month_**
    + For example, assume there were just two days in a month and 2 thefts were committed on the first day, 
        and 1 on the second day, then the _proportion_ of thefts committed on the first day would be .66 and 
        .33 on the second day).

Make sure that:

- all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
- the data is rounded to the second decimal place; and
- the data frame is printed at the end of the notebook.


## Submit

Please submit your answer as a Jupyter Notebook in the `Submissions/` folder. Title the notebook with your 
lastname_firstname_netid (`doe_john_jd568.ipynb`). Be sure to submit a docstring if you write any functions 
indicating what your function does and all the arguments it takes.  As per usual, please submit your answer 
to the class repository by Sunday 11:59pm deadline.

## Things to keep in mind

To answer this question: we'll want to think carefully about assigning an index, aggregating data by 
groups, and reshaping data. Everything you need is in the lecture notes.



In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
from datetime import date
import csv

In [3]:
# Import data file
df = pd.read_csv ('chicago_summer_2018_crime_data.csv')
# View the first 5 and last 5 rows of data
df.head

<bound method NDFrame.head of        month  day  year day_of_week  \
0          8    4  2018    Saturday   
1          7   26  2018    Thursday   
2          6   24  2018      Sunday   
3          6   13  2018   Wednesday   
4          6   14  2018    Thursday   
...      ...  ...   ...         ...   
73368      6    1  2018      Friday   
73369      6    1  2018      Friday   
73370      6    1  2018      Friday   
73371      6    1  2018      Friday   
73372      6    1  2018      Friday   

                                           description location_description  \
0                                        FROM BUILDING            APARTMENT   
1                                       POCKET-PICKING           RESTAURANT   
2                                          BOGUS CHECK   GROCERY FOOD STORE   
3                                               SIMPLE            RESIDENCE   
4                                           TO VEHICLE               STREET   
...                        

In [4]:
#create an empty data frame to hold the new data
new_df={}
# Verify data is only for one year
uniqueYear = df['year'].unique()
print(uniqueYear)
# Verify how many months of data is in the df
uniqueMonth =df['month'].unique()
print(uniqueMonth)
df['date']=pd.to_datetime(df[['month','day','year']])
dff=(df
    .filter(['date','primary_type'])
    .sort_values(['date','primary_type'])
    .groupby(['date','primary_type'])
    .size()
    .reset_index(name='count')
)
dff.head

[2018]
[8 7 6]


<bound method NDFrame.head of            date            primary_type  count
0    2018-06-01                   ARSON      2
1    2018-06-01                 ASSAULT     69
2    2018-06-01                 BATTERY    165
3    2018-06-01                BURGLARY     41
4    2018-06-01     CRIM SEXUAL ASSAULT      6
...         ...                     ...    ...
2037 2018-08-31  PUBLIC PEACE VIOLATION      2
2038 2018-08-31                 ROBBERY     33
2039 2018-08-31             SEX OFFENSE      3
2040 2018-08-31                   THEFT    192
2041 2018-08-31       WEAPONS VIOLATION     23

[2042 rows x 3 columns]>

In [5]:
#convert the df from long to wide using primary_type as the index,the columns as month and day, and filling 
#missing data as 0.
sorted_df=dff.pivot_table(index='primary_type', columns=['date'],fill_value=0)
sorted_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                                      2          1          2          1   
ASSAULT                                   69         67         64         75   
BATTERY                                  165        178        198        149   
BURGLARY                                  41         31         35         35   
CONCEALED CARRY LICENSE VIOLATION          0          1          1          0   
CRIM SEXUAL ASSAULT                        6          4          8          5   
CRIMINAL DAMAGE                           86         76         91         75   
CRIMINAL TRESPASS                         18         14         23         19   
DECEPTIVE PRACTICE                        80         60         31         70  

In [36]:
#create a copy of the June data to perform a function on it later
june_df = sorted_df[sorted_df.columns[0:30].copy()]
june_df.head


<bound method NDFrame.head of                                        count                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                                      2          1          2          1   
ASSAULT                                   69         67         64         75   
BATTERY                                  165        178        198        149   
BURGLARY                                  41         31         35         35   
CONCEALED CARRY LICENSE VIOLATION          0          1          1          0   
CRIM SEXUAL ASSAULT                        6          4          8          5   
CRIMINAL DAMAGE                           86         76         91         75   
CRIMINAL TRESPASS                         18         14         23         19   
DECEPTIVE PRACTICE                        80         60         31         70  

In [39]:
#create a copy of the July data to perform a function on it later
july_df = sorted_df[sorted_df.columns[30:61].copy()]
july_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-07-01 2018-07-02 2018-07-03 2018-07-04   
primary_type                                                                    
ARSON                                      2          0          0          0   
ASSAULT                                   73         71         56         66   
BATTERY                                  191        167        134        225   
BURGLARY                                  47         37         53         34   
CONCEALED CARRY LICENSE VIOLATION          2          0          0          1   
CRIM SEXUAL ASSAULT                        9          3          1          5   
CRIMINAL DAMAGE                           87         84         85        103   
CRIMINAL TRESPASS                         25         24         23         16   
DECEPTIVE PRACTICE                        51         57         54         29  

In [40]:
#create a copy of the August data to perform a function on it later
august_df = sorted_df[sorted_df.columns[61:92].copy()]
august_df.head


<bound method NDFrame.head of                                        count                                   \
date                              2018-08-01 2018-08-02 2018-08-03 2018-08-04   
primary_type                                                                    
ARSON                                      0          2          1          1   
ASSAULT                                   65         50         52         61   
BATTERY                                  155        150        157        202   
BURGLARY                                  38         41         30         48   
CONCEALED CARRY LICENSE VIOLATION          0          0          1          1   
CRIM SEXUAL ASSAULT                        9          2          8         11   
CRIMINAL DAMAGE                           81         82         74         87   
CRIMINAL TRESPASS                         20         23         14         17   
DECEPTIVE PRACTICE                        71         54         78         36  

In [43]:
# using the apply and lambda to calculate the percentage of each value across the row and round to the 
# second decimal place
junep_df =june_df.iloc[:, 0:30].apply(lambda x: round(x/x.sum()*100,2), axis=1).copy()
junep_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                                   5.56       2.78       5.56       2.78   
ASSAULT                                 3.69       3.58       3.42       4.01   
BATTERY                                 3.56       3.84       4.27       3.21   
BURGLARY                                3.88       2.94       3.31       3.31   
CONCEALED CARRY LICENSE VIOLATION       0.00       7.14       7.14       0.00   
CRIM SEXUAL ASSAULT                     4.23       2.82       5.63       3.52   
CRIMINAL DAMAGE                         3.33       2.94       3.52       2.90   
CRIMINAL TRESPASS                       3.05       2.37       3.90       3.22   
DECEPTIVE PRACTICE                      5.22       3.91       2.02       4.56  

In [41]:
# using the apply and lambda to calculate the percentage of each value across the row and round to the 
# second decimal place
julyp_df =july_df.iloc[:, 0:31].apply(lambda x: round(x/x.sum()*100,2), axis=1)
julyp_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-07-01 2018-07-02 2018-07-03 2018-07-04   
primary_type                                                                    
ARSON                                   5.13       0.00       0.00       0.00   
ASSAULT                                 3.77       3.67       2.89       3.41   
BATTERY                                 3.92       3.43       2.75       4.62   
BURGLARY                                4.14       3.26       4.67       3.00   
CONCEALED CARRY LICENSE VIOLATION      15.38       0.00       0.00       7.69   
CRIM SEXUAL ASSAULT                     6.21       2.07       0.69       3.45   
CRIMINAL DAMAGE                         3.20       3.09       3.12       3.79   
CRIMINAL TRESPASS                       4.21       4.04       3.87       2.69   
DECEPTIVE PRACTICE                      3.23       3.61       3.42       1.84  

In [42]:
# using the apply and lambda to calculate the percentage of each value across the row and round to the 
# second decimal place
augustp_df =august_df.iloc[:, 0:31].apply(lambda x: round(x/x.sum()*100,2), axis=1)
augustp_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-08-01 2018-08-02 2018-08-03 2018-08-04   
primary_type                                                                    
ARSON                                   0.00       5.41       2.70       2.70   
ASSAULT                                 3.56       2.74       2.85       3.34   
BATTERY                                 3.37       3.26       3.41       4.39   
BURGLARY                                3.17       3.42       2.50       4.00   
CONCEALED CARRY LICENSE VIOLATION       0.00       0.00       5.88       5.88   
CRIM SEXUAL ASSAULT                     6.29       1.40       5.59       7.69   
CRIMINAL DAMAGE                         3.08       3.12       2.82       3.31   
CRIMINAL TRESPASS                       3.36       3.87       2.35       2.86   
DECEPTIVE PRACTICE                      4.52       3.44       4.97       2.29  

In [45]:
#create a new data frame and Join dataframes together
join_df=junep_df.join(julyp_df, on='primary_type', how='left')
new_df=join_df.join(augustp_df, on='primary_type', how='left')
new_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                                   5.56       2.78       5.56       2.78   
ASSAULT                                 3.69       3.58       3.42       4.01   
BATTERY                                 3.56       3.84       4.27       3.21   
BURGLARY                                3.88       2.94       3.31       3.31   
CONCEALED CARRY LICENSE VIOLATION       0.00       7.14       7.14       0.00   
CRIM SEXUAL ASSAULT                     4.23       2.82       5.63       3.52   
CRIMINAL DAMAGE                         3.33       2.94       3.52       2.90   
CRIMINAL TRESPASS                       3.05       2.37       3.90       3.22   
DECEPTIVE PRACTICE                      5.22       3.91       2.02       4.56  

In [46]:
print(new_df)

                                       count                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                                   5.56       2.78       5.56       2.78   
ASSAULT                                 3.69       3.58       3.42       4.01   
BATTERY                                 3.56       3.84       4.27       3.21   
BURGLARY                                3.88       2.94       3.31       3.31   
CONCEALED CARRY LICENSE VIOLATION       0.00       7.14       7.14       0.00   
CRIM SEXUAL ASSAULT                     4.23       2.82       5.63       3.52   
CRIMINAL DAMAGE                         3.33       2.94       3.52       2.90   
CRIMINAL TRESPASS                       3.05       2.37       3.90       3.22   
DECEPTIVE PRACTICE                      5.22       3.91       2.02       4.56   
GAMBLING                    