# Coding Discussion 03
# Joanne Lauer
# jml450

# Instructions

## Task

Please read in the Chicago Summer 2018 Crimes Dataset located in the repository folder.

Using the data wrangling methods covered in class this week, create a new data frame where:

- the **_unit of observation_** is the crime type (i.e. `primary_type`),
- the **_column variables_** corresponds with the **_day of the month_**, and
- **_each cell_** is populated by the **_proportion of times that crime type was committed over all days of 
      the month_**
    + For example, assume there were just two days in a month and 2 thefts were committed on the first day, 
        and 1 on the second day, then the _proportion_ of thefts committed on the first day would be .66 and 
        .33 on the second day).

Make sure that:

- all missing values are filled with zeros. Zeros in this case means no crimes were committed that day;
- the data is rounded to the second decimal place; and
- the data frame is printed at the end of the notebook.


## Submit

Please submit your answer as a Jupyter Notebook in the `Submissions/` folder. Title the notebook with your 
lastname_firstname_netid (`doe_john_jd568.ipynb`). Be sure to submit a docstring if you write any functions 
indicating what your function does and all the arguments it takes.  As per usual, please submit your answer 
to the class repository by Sunday 11:59pm deadline.

## Things to keep in mind

To answer this question: we'll want to think carefully about assigning an index, aggregating data by 
groups, and reshaping data. Everything you need is in the lecture notes.



In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from datetime import date
import csv

In [2]:
# Import data file
df = pd.read_csv ('chicago_summer_2018_crime_data.csv')
# View the first 5 and last 5 rows of data
df.head

<bound method NDFrame.head of        month  day  year day_of_week  \
0          8    4  2018    Saturday   
1          7   26  2018    Thursday   
2          6   24  2018      Sunday   
3          6   13  2018   Wednesday   
4          6   14  2018    Thursday   
...      ...  ...   ...         ...   
73368      6    1  2018      Friday   
73369      6    1  2018      Friday   
73370      6    1  2018      Friday   
73371      6    1  2018      Friday   
73372      6    1  2018      Friday   

                                           description location_description  \
0                                        FROM BUILDING            APARTMENT   
1                                       POCKET-PICKING           RESTAURANT   
2                                          BOGUS CHECK   GROCERY FOOD STORE   
3                                               SIMPLE            RESIDENCE   
4                                           TO VEHICLE               STREET   
...                        

In [3]:
#create an empty data frame to hold the new data
# Verify data is only for one year
uniqueYear = df['year'].unique()
print(uniqueYear)
# Verify how many months of data is in the df
uniqueMonth =df['month'].unique()
print(uniqueMonth)
df['date']=pd.to_datetime(df[['month','day','year']])
dff=(df
    .filter(['date','primary_type'])
    .sort_values(['date','primary_type'])
    .groupby(['date','primary_type'])
    .size()
    .reset_index(name='count')
)
dff.head

[2018]
[8 7 6]


<bound method NDFrame.head of            date            primary_type  count
0    2018-06-01                   ARSON      2
1    2018-06-01                 ASSAULT     69
2    2018-06-01                 BATTERY    165
3    2018-06-01                BURGLARY     41
4    2018-06-01     CRIM SEXUAL ASSAULT      6
...         ...                     ...    ...
2037 2018-08-31  PUBLIC PEACE VIOLATION      2
2038 2018-08-31                 ROBBERY     33
2039 2018-08-31             SEX OFFENSE      3
2040 2018-08-31                   THEFT    192
2041 2018-08-31       WEAPONS VIOLATION     23

[2042 rows x 3 columns]>

In [4]:
#convert the df from long to wide using primary_type as the index,the columns as month and day, and filling 
#missing data as 0.
new_df=df.pivot_table(index='primary_type', columns=['date'],fill_value=0)
        
#new_df.iloc[:, 0:29].apply(lambda x: round(x/x.sum()*100,2), axis=1)
 #       .iloc[:, 30:60].apply(lambda x: round(x/x.sum()*100,2), axis=1)
 #       .iloc[:, 31:91].apply(lambda x: round(x/x.sum()*100,2), axis=1)
       
new_df.head

<bound method NDFrame.head of                                       arrest                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                               0.000000   0.000000   0.000000   0.000000   
ASSAULT                             0.101449   0.179104   0.125000   0.173333   
BATTERY                             0.236364   0.151685   0.237374   0.181208   
BURGLARY                            0.000000   0.064516   0.057143   0.057143   
CONCEALED CARRY LICENSE VIOLATION   0.000000   1.000000   1.000000   0.000000   
CRIM SEXUAL ASSAULT                 0.000000   0.000000   0.000000   0.000000   
CRIMINAL DAMAGE                     0.046512   0.026316   0.043956   0.040000   
CRIMINAL TRESPASS                   0.555556   0.357143   0.565217   0.368421   
DECEPTIVE PRACTICE                  0.012500   0.050000   0.096774   0.085714  

In [5]:
#split the sorted data frame off into june df for calculations
june_df = sorted_df[sorted_df.columns[0:29]]
june_df.head

NameError: name 'sorted_df' is not defined

In [None]:
# using the apply and lambda to calculate the percentage of each value across the row and round to the 
# second decimal place
junep_df =june_df.iloc[:, 0:29].apply(lambda x: round(x/x.sum()*100,2), axis=1)
junep_df.head

<bound method NDFrame.head of                                        count                                   \
date                              2018-06-01 2018-06-02 2018-06-03 2018-06-04   
primary_type                                                                    
ARSON                                   5.71       2.86       5.71       2.86   
ASSAULT                                 3.80       3.69       3.52       4.13   
BATTERY                                 3.71       4.01       4.46       3.35   
BURGLARY                                3.99       3.02       3.40       3.40   
CONCEALED CARRY LICENSE VIOLATION       0.00       7.69       7.69       0.00   
CRIM SEXUAL ASSAULT                     4.35       2.90       5.80       3.62   
CRIMINAL DAMAGE                         3.45       3.05       3.65       3.01   
CRIMINAL TRESPASS                       3.13       2.43       4.00       3.30   
DECEPTIVE PRACTICE                      5.39       4.04       2.09       4.71  

In [None]:
    
sorted_df['June'] = sorted_df[sorted_df.columns[0:29]].sum(axis=1)
#sorted_df.apply(lambda x: x.between(0, 29), axis=1).sum(axis=1)
print(June)
sorted_df['July']= sorted_df[sorted_df.columns[30:60]].sum(axis=1)
print(July)
#July = sorted_df.apply(lambda y: y.between(30, 60), axis=1).sum(axis=1)
#print(July)
sorted_df['August'] = sorted_df[sorted_df.columns[61:91]].sum(axis=1)
print(August)
#August = sorted_df.apply(lambda z: z.between(61, 91), axis=1).sum(axis=1)
#print(August)

primary_type
ARSON                                  35
ASSAULT                              1816
BATTERY                              4442
BURGLARY                             1028
CONCEALED CARRY LICENSE VIOLATION      13
CRIM SEXUAL ASSAULT                   138
CRIMINAL DAMAGE                      2494
CRIMINAL TRESPASS                     575
DECEPTIVE PRACTICE                   1485
GAMBLING                               32
HOMICIDE                               53
HUMAN TRAFFICKING                       1
INTERFERENCE WITH PUBLIC OFFICER      108
INTIMIDATION                           13
KIDNAPPING                             17
LIQUOR LAW VIOLATION                   25
MOTOR VEHICLE THEFT                   784
NARCOTICS                             908
NON-CRIMINAL                            2
NON-CRIMINAL (SUBJECT SPECIFIED)        0
OBSCENITY                               3
OFFENSE INVOLVING CHILDREN            183
OTHER OFFENSE                        1443
PROSTITUTION         