<center><h1>Introduction to dplyr Package</h1></center>
<center><h3>Ellen Duong</h3></center>
<center><h3>August Guang</h3></center>
<center><h3>Paul Stey</h3></center>

# 1. The _dplyr_ Package

  - "dplyr" is short for "data plyer"
  - R package for aggregating, summarizing, reshaping, and generally wrangling data
  - Extremely popular in the R community
  - Authored by Hadley Wickham
  - Part of the "tidyverse" set of packages

## 1.1 The _dplyr_ Verbs

  - The _dplyr_ package is organized around a set of "verbs", which are functions that operate on data
    + `filter()` - function is used to subset a data frame, retaining all rows that satisfies your conditions
    + `summarise()` - creates a new data frame. It returns one row for each combination of grouping variables. 
    + `select()` - selects variables in a data frame
    + `mutate()` - creates new columns that are functions of existing variables
    + `arrange()` - orders the rows of a data frame by the values of selected columns

## 1.2 The Pipe Operator

  - Can be used to pipe some object into a function call
  - `%>%`
    + `x %>% f(y)` is the same as `f(x, y)`
    

# 2. `filter()` Examples with _dplyr_

In [1]:
library(dplyr)           # load the package


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [2]:
arrests_df <- read.csv("data/pvd_arrests_2021-10-03.csv") 

In [3]:
arrests_df %>% 
    filter(gender == "Male") 

arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
2019-08-24T02:23:00.0,2019,8,Male,White,NonHispanic,1981,37,No Permanent Address,providence,Rhode Island,,,,,2019-00084142,"YGonzalez, LTaveras",pvd2218242150382148273
2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2.1,Chemical Test Refusal,1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2,Driving Under the Influence of Liqour or Drugs (=>.08<.1),1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
2019-08-23T21:38:00.0,2019,8,Male,White,Hispanic,1996,22,DOUGLAS,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00084031,"RCarlin, SKennedy",pvd15614289459563584867
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,11-32-1,OBSTRUCTING OFFICER IN EXECUTION OF DUTY,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T14:42:00.0,2019,8,Male,White,Hispanic,1998,20,LAURA ST,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00083892,"JCotugno, ALevesque, JButen, JJohnson",pvd17953747948212880432


### 2.1.1 Comparing `filter()` with Logical Indexing

In [7]:
# dplyr approach
arrests_df %>% 
    filter(gender == "Male")


# "base" R approach
is_male <- arrests_df$gender == "Male"      # create vector of bools

arrests_df[is_male, ]                       # get male

arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
2019-08-24T02:23:00.0,2019,8,Male,White,NonHispanic,1981,37,No Permanent Address,providence,Rhode Island,,,,,2019-00084142,"YGonzalez, LTaveras",pvd2218242150382148273
2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2.1,Chemical Test Refusal,1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2,Driving Under the Influence of Liqour or Drugs (=>.08<.1),1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
2019-08-23T21:38:00.0,2019,8,Male,White,Hispanic,1996,22,DOUGLAS,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00084031,"RCarlin, SKennedy",pvd15614289459563584867
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,11-32-1,OBSTRUCTING OFFICER IN EXECUTION OF DUTY,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T14:42:00.0,2019,8,Male,White,Hispanic,1998,20,LAURA ST,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00083892,"JCotugno, ALevesque, JButen, JJohnson",pvd17953747948212880432


Unnamed: 0_level_0,arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
1,2019-08-24T02:23:00.0,2019,8,Male,White,NonHispanic,1981,37,No Permanent Address,providence,Rhode Island,,,,,2019-00084142,"YGonzalez, LTaveras",pvd2218242150382148273
8,2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2.1,Chemical Test Refusal,1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
9,2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2,Driving Under the Influence of Liqour or Drugs (=>.08<.1),1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
10,2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
11,2019-08-23T21:38:00.0,2019,8,Male,White,Hispanic,1996,22,DOUGLAS,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00084031,"RCarlin, SKennedy",pvd15614289459563584867
12,2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
13,2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
14,2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1,2019-00083963,JHanley,pvd1675234703933765967
15,2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,11-32-1,OBSTRUCTING OFFICER IN EXECUTION OF DUTY,1,2019-00083963,JHanley,pvd1675234703933765967
18,2019-08-23T14:42:00.0,2019,8,Male,White,Hispanic,1998,20,LAURA ST,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00083892,"JCotugno, ALevesque, JButen, JJohnson",pvd17953747948212880432


## 2.2 `filter()` Examples (cont.)

In [8]:
# Here we create a new data.frame from result of filter()

arrests_males <- arrests_df %>% 
    filter(gender == "Male")                

In [9]:
head(arrests_males)

Unnamed: 0_level_0,arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
1,2019-08-24T02:23:00.0,2019,8,Male,White,NonHispanic,1981,37,No Permanent Address,providence,Rhode Island,,,,,2019-00084142,"YGonzalez, LTaveras",pvd2218242150382148273
2,2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2.1,Chemical Test Refusal,1.0,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
3,2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-27-2,Driving Under the Influence of Liqour or Drugs (=>.08<.1),1.0,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
4,2019-08-23T23:43:00.0,2019,8,Male,Black,NonHispanic,1991,28,PUBLIC ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1.0,2019-00084056,"CVingi, SCooney",pvd6431558757894418021
5,2019-08-23T21:38:00.0,2019,8,Male,White,Hispanic,1996,22,DOUGLAS,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1.0,2019-00084031,"RCarlin, SKennedy",pvd15614289459563584867
6,2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1.0,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829


## 2.2 Using `filter()` with Multiple Conditions

In [10]:
arrests_teen_male <- arrests_df %>%
    filter(
        gender == "Male",
        age < 20
    )

arrests_teen_male

arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-21T13:09:00.0,2019,8,Male,White,Hispanic,1999,19,MELISSA AVE,Providence,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1,2019-00083170,"ITavarez, IYousif, CBrown, EDelgado",pvd5047836359365815220
2019-08-21T13:09:00.0,2019,8,Male,White,Hispanic,1999,19,MELISSA AVE,Providence,,RI Statute Violation,11-45-1,DISORDERLY CONDUCT,1,2019-00083170,"ITavarez, IYousif, CBrown, EDelgado",pvd5047836359365815220
2019-08-21T13:09:00.0,2019,8,Male,White,Hispanic,1999,19,MELISSA AVE,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083170,"ITavarez, IYousif, CBrown, EDelgado",pvd5047836359365815220
2019-08-20T02:00:00.0,2019,8,Male,White,Hispanic,1999,19,MINK RD,Providence,Rhode Island,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00078616,"JGagnon, RMalloy",pvd1076862233562848683
2019-08-20T00:00:00.0,2019,8,Male,Black,NonHispanic,2000,19,SMITH ST,Providence,,,,,,2019-00082826,"RPapa, MCamardo",pvd12708633210022966227
2019-08-17T00:00:00.0,2019,8,Male,White,Hispanic,2001,18,BENEDICT ST,Providence,,RI Statute Violation,11-37-2,SEXUAL ASSAULT -1ST DEGREE - FRC RAPE,1,2019-00081517,"RMendez, JNajarian",pvd9938776757456909177
2019-08-15T18:12:00.0,2019,8,Male,Black,NonHispanic,2000,18,CROSS ST,Providence,,RI Statute Violation,11-17-1,FORGERY AND COUNTERFEITING IN GENERAL,1,2019-00080859,"JBenros, JStanzione, NOC Officer, ACalle, EDelgado, JManown",pvd17954097329236445270
2019-08-15T00:00:00.0,2019,8,Male,Black,Hispanic,2000,19,SILVER LAKE AVE,Providence,,,,,,2019-00076472,,pvd14598067460260984586


### 2.2.1 Using `filter()` with Logical OR

  - Recall the `||` operator is the logical OR
  - The `|` operator performs the same role, but elementwise for columns (or vectors)

In [13]:
young_old_male <- arrests_df %>%
    filter(
        gender == "Male",
        age < 25 | age > 65  
    )
   
young_old_male

arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
2019-08-23T21:38:00.0,2019,8,Male,White,Hispanic,1996,22,DOUGLAS,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00084031,"RCarlin, SKennedy",pvd15614289459563584867
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,11-32-1,OBSTRUCTING OFFICER IN EXECUTION OF DUTY,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T14:42:00.0,2019,8,Male,White,Hispanic,1998,20,LAURA ST,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00083892,"JCotugno, ALevesque, JButen, JJohnson",pvd17953747948212880432
2019-08-23T00:57:00.0,2019,8,Male,White,,1998,21,AUSTIN ST,newbrdford,,RI Statute Violation,11-5-3,SIMPLE ASSAULT OR BATTERY,1,2019-00083725,PSalmons,pvd3024232238010666153
2019-08-22T12:05:00.0,2019,8,Male,White,Hispanic,1999,20,ROCKINGHAM ST,Providence,Rhode Island,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083486,"JBenros, MClary",pvd8008374038901187780
2019-08-22T01:38:00.0,2019,8,Male,Black,Hispanic,1998,21,HOLLIS ST,Providence,,RI Statute Violation,11-47-5.2,POSSESSION OF A STOLEN FIREARM,1,2019-00083396,"RFedo, KRosado, ALugo",pvd6386847572324309475
2019-08-21T18:03:00.0,2019,8,Male,Unknown,Hispanic,1998,21,CUMERFORD ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083274,JMadeira,pvd14028206778351997883


### 2.2.2 Using `filter()` with Logical OR (cont.)

In [14]:
ptk_young_old_male <- arrests_df %>%
    filter(
        gender == "Male",
        age < 25 | age > 65 | from_city == "Pawtucket"
    )

ptk_young_old_male

arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,id
<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>
2019-08-23T21:38:00.0,2019,8,Male,White,Hispanic,1996,22,DOUGLAS,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00084031,"RCarlin, SKennedy",pvd15614289459563584867
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-27-4,"Reckless Driving, Drag Racing - Attempting to Elude",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T19:50:00.0,2019,8,Male,White,Hispanic,2000,19,MOWRY ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083996,"SCampbell, RMalloy",pvd900460037611487829
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T18:26:00.0,2019,8,Male,White,Hispanic,1996,23,CUMERFORD ST,Providence,,RI Statute Violation,11-32-1,OBSTRUCTING OFFICER IN EXECUTION OF DUTY,1,2019-00083963,JHanley,pvd1675234703933765967
2019-08-23T14:42:00.0,2019,8,Male,White,Hispanic,1998,20,LAURA ST,Providence,,RI Statute Violation,11-44-1,DOMESTIC-VANDALISM/MALICIOUS INJURY TO PROP,1,2019-00083892,"JCotugno, ALevesque, JButen, JJohnson",pvd17953747948212880432
2019-08-23T00:57:00.0,2019,8,Male,White,,1998,21,AUSTIN ST,newbrdford,,RI Statute Violation,11-5-3,SIMPLE ASSAULT OR BATTERY,1,2019-00083725,PSalmons,pvd3024232238010666153
2019-08-22T12:05:00.0,2019,8,Male,White,Hispanic,1999,20,ROCKINGHAM ST,Providence,Rhode Island,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083486,"JBenros, MClary",pvd8008374038901187780
2019-08-22T01:38:00.0,2019,8,Male,Black,Hispanic,1998,21,HOLLIS ST,Providence,,RI Statute Violation,11-47-5.2,POSSESSION OF A STOLEN FIREARM,1,2019-00083396,"RFedo, KRosado, ALugo",pvd6386847572324309475
2019-08-21T18:03:00.0,2019,8,Male,Unknown,Hispanic,1998,21,CUMERFORD ST,Providence,,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1,2019-00083274,JMadeira,pvd14028206778351997883


<center><h1>Using <code>select()</code> Function in dplyr</h1></center>

# 3. Using `select()` to Extract Columns
  - Recall that `filter()` can be used to filter rows
  - Similarly, `select()` is used to select columns
  - These functions can be "chained"

## 3.1 Example of `select()`

In [15]:
arrests_subset <- arrests_df %>% 
    select(id, age, gender, statute_desc)

head(arrests_subset)

Unnamed: 0_level_0,id,age,gender,statute_desc
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>
1,pvd2218242150382148273,37,Male,
2,pvd15166785558364246202,25,,"Driving after Denial, Suspension or Revocation of License"
3,pvd3142917706201385905,34,Female,RESISTING LEGAL OR ILLEGAL ARREST
4,pvd3142917706201385905,34,Female,DISORDERLY CONDUCT
5,pvd460449304532374599,18,Female,RESISTING LEGAL OR ILLEGAL ARREST
6,pvd460449304532374599,18,Female,DISORDERLY CONDUCT


### 3.1.1 Comparing `select()` to `[, ]` notation

In [16]:
# dplyr example
arrests_df %>% 
    select(id, age, gender, statute_desc)


# equivalent in "base" R example
cols <- c("id", "age", "gender", "statute_desc")

arrest_sub <- arrests_df[, cols]

head(arrest_sub)

Unnamed: 0_level_0,id,age,gender,statute_desc
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>
1,pvd2218242150382148273,37,Male,
2,pvd15166785558364246202,25,,"Driving after Denial, Suspension or Revocation of License"
3,pvd3142917706201385905,34,Female,RESISTING LEGAL OR ILLEGAL ARREST
4,pvd3142917706201385905,34,Female,DISORDERLY CONDUCT
5,pvd460449304532374599,18,Female,RESISTING LEGAL OR ILLEGAL ARREST
6,pvd460449304532374599,18,Female,DISORDERLY CONDUCT


## 3.2 Example of `select()` (cont.)

In [17]:
arrests_vio <- arrests_df %>%
    select(
        id,
        age,
        gender,
        statute_desc
    )

In [18]:
head(arrests_vio)           # see first few lines of new dataframe

Unnamed: 0_level_0,id,age,gender,statute_desc
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>
1,pvd2218242150382148273,37,Male,
2,pvd15166785558364246202,25,,"Driving after Denial, Suspension or Revocation of License"
3,pvd3142917706201385905,34,Female,RESISTING LEGAL OR ILLEGAL ARREST
4,pvd3142917706201385905,34,Female,DISORDERLY CONDUCT
5,pvd460449304532374599,18,Female,RESISTING LEGAL OR ILLEGAL ARREST
6,pvd460449304532374599,18,Female,DISORDERLY CONDUCT


# 4. Chaining _dplyr_ Operators
  - One key reason for _dplyr_ popularity
  - _dplyr_ verbs/functions are "composable"
    + $(f \circ g)(x) == f(g(x))$

In [21]:
female_vio <- arrests_df %>%
    filter(gender == "Female") %>%
    select(id, age, gender, statute_desc)

head(female_vio)

Unnamed: 0_level_0,id,age,gender,statute_desc
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>
1,pvd3142917706201385905,34,Female,RESISTING LEGAL OR ILLEGAL ARREST
2,pvd3142917706201385905,34,Female,DISORDERLY CONDUCT
3,pvd460449304532374599,18,Female,RESISTING LEGAL OR ILLEGAL ARREST
4,pvd460449304532374599,18,Female,DISORDERLY CONDUCT
5,pvd460449304532374599,18,Female,OBSTRUCTING OFFICER IN EXECUTION OF DUTY
6,pvd8555094992612905738,45,Female,VANDALISM/MALICIOUS INJURY TO PROPERTY


## 4.1 More Chaining

In [26]:
female_midage <- arrests_df %>%
    filter(
        gender == "Female",
        age > 45,
        statute_desc != ""
    ) %>%
    select(
        id, 
        age, 
        gender,
        statute_desc
    ) %>%
    arrange(
        age
    )

head(female_midage)

Unnamed: 0_level_0,id,age,gender,statute_desc
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>
1,pvd5910286289754155205,46,Female,LOITERING FOR INDECENT PURPOSES PROSTITUTION - PROSTITUTION
2,pvd14925567736676696725,46,Female,SHOPLIFTING-MISD - SHOPLIFTING
3,pvd17492545928832438170,46,Female,"Driving after Denial, Suspension or Revocation of License"
4,pvd6439318455139528590,46,Female,BENCH WARRANT ISSUED FROM 6TH DISTRICT COURT
5,pvd5910286289754155205,46,Female,LOITERING FOR INDECENT PURPOSES PROSTITUTION - PROSTITUTION
6,pvd13975960782588463013,46,Female,SIMPLE ASSAULT OR BATTERY


<center><h1>Challenge Problem</h1></center>

In addition to the arrests data, the Providence Police Department also makes data concerning criminal cases available. This is in the CSV file called `"pvd_cases_2021-10-03.csv"` in the `data/` directory of this repository. 

  1. Let's read the cases data in from a CSV and call it `cases_df`. 
  2. Then, let's create a new dataframe that is a subset of `cases_df`. In particular, let's create a dataframe called `cases_summer_df` that contains only those cases that were heard in June, July, or August. 
  

**Note**: The `month` columns is coded numerically in the data set, so keep that in mind.

In [45]:
cases_df <- read.csv("data/pvd_cases_2021-10-03.csv") 

cases_summer_df <- case_df %>%
    filter(
        month >= 6,
        month < 9
    ) %>%
    group_by(month) %>%
    summarize(
        n_rows = n()
    )

cases_summer_df

month,n_rows
<int>,<int>
6,3107
7,3105
8,4801


In [46]:
cases_summer_df_2 <- case_df %>%
    filter(
        month == 6 | month == 7 | month == 8
    ) %>%
    group_by(month) %>%
    summarize(
        n_rows = n()
    )

cases_summer_df_2

month,n_rows
<int>,<int>
6,3107
7,3105
8,4801


<center><h1>Using <code>group_by()</code> and <code>summarise()</code> in dplyr</h1></center>

# 5. Why use `group_by()` and `summarise()` from _dplyr_?
  - Being able to aggregate and summarize by grouping is hugely common
  - _split-apply-combine_ pattern
  - These operations can be "chained" with other _dplyr_ functions
  - Often makes for concise, intuitive, and readable code

## 5.1 Example of `group_by()` and `summarise()`

In [47]:
gender_tbl <- arrests_df %>%
    group_by(gender) %>%
    summarise(
        n_rows = n(),
        mean_age = mean(age)
    ) 

gender_tbl

gender,n_rows,mean_age
<chr>,<int>,<dbl>
,21,29.47619
Female,2777,32.10839
Male,10170,33.3589
,36,26.61111
Unknown,8,35.125


# 6. Chaining `filter()` with `group_by()` and `summarise()`

In [48]:
gender_tbl <- arrests_df %>%
    filter(
        from_city == "Providence",
        year == 2019
    ) %>%
    group_by(gender) %>%
    summarise(
        n_rows = n(),
        mean_age = mean(age),
        mean_cnts = mean(counts, na.rm = TRUE)
    )

head(gender_tbl)

gender,n_rows,mean_age,mean_cnts
<chr>,<int>,<dbl>,<dbl>
,9,23.88889,1.0
Female,515,33.46602,1.064039
Male,2039,33.38941,1.098027
Unknown,1,49.0,1.0


## 6.1 More Interesting Example of Chaining

In [49]:
is_summer <- function(month_num) {
    
    chk <- month_num %in% c(6, 7, 8)
    return(chk)
}

In [50]:
is_summer(6)   # TRUE
is_summer(2)   # FALSE
is_summer(8)   # TRUE


### 6.1.1 More Interesting Example (cont.)

In [51]:
vio_tbl <- arrests_df %>%
    filter(
        statute_desc != "",
        statute_desc != "NULL", 
        year == 2021
    ) %>%
    group_by(statute_desc) %>%
    summarise(
        n_vios = n(),
        prop_male = mean(gender == "Male"),
        mean_age = mean(age),
        prop_summer = mean(is_summer(month))
    ) %>%
    arrange(desc(n_vios))

head(vio_tbl)

statute_desc,n_vios,prop_male,mean_age,prop_summer
<chr>,<int>,<dbl>,<dbl>,<dbl>
DOMESTIC-SIMPLE ASSAULT/BATTERY,290,0.7482759,33.13448,0.3068966
"Driving after Denial, Suspension or Revocation of License",237,0.7552743,31.91561,0.3459916
DISORDERLY CONDUCT,132,0.8106061,30.73485,0.3409091
SIMPLE ASSAULT OR BATTERY,115,0.7304348,35.33913,0.3043478
BENCH WARRANT ISSUED FROM SUPERIOR COURT,106,0.8301887,37.04717,0.2735849
LICENSE OR PERMIT REQUIRED FOR CARRYING PISTOL,94,0.9787234,26.47872,0.2978723


<center><h1>Challenge Problem</h1></center>

Suppose we are interested in the distribution of states of origin (i.e., `from_state`) for the males arrested in the summer months (i.e., June, July, August). Let's use dplyr to create a table with the counts of individuals from the different cities in our arrests data. 

To accomplish this, we will use the `filter()`, `group_by()`, and `summarise()` functions. The table should end up having two columns `from_state`, and `num_arrests`.



In [56]:
male_summer_tbl <- arrests_df %>%
    filter(
        gender == "Male",
        is_summer(month)
    ) %>% 
    group_by(from_state) %>%
    summarize(
        num_arrests = n()
    ) %>%
    arrange(desc(num_arrests))

male_summer_tbl

from_state,num_arrests
<chr>,<int>
,1342
,682
Rhode Island,641
Massachusetts,20
Connecticut,6
Georgia,6
New York,5
New Mexico,4
Missouri,2
North Dakota,2
