# Section 3
Practicing `filter`, `select`, `summarise`

## Set Up
Run the cell below!

In [None]:
# RUN THIS CELL
# Load packages
library(testthat)
library(tidyverse) %>% suppressMessages()

We're starting with the same CTDC dataset. I filter to the confirmed cases of trafficking as we discussed in last section. 

In [None]:
# loading the dataset
ctdc <- read.csv("2024_CTDC_synthetic.csv")

# filtering to confirmed cases of trafficking
ctdc_confirmed <- ctdc %>% filter(isSexualExploit > 0 | isForcedLabour > 0 | isOtherExploit > 0) 

---
## Part 1: Tutorial

### The `%>%` or pipe operator

In [None]:
head(ctdc_confirmed)

In [None]:
ctdc_confirmed %>% head()

## Task: 
Compare the proportion of victims who reported psychological, physical, or sexual abuse between women and men. (There is too few data on transgender and/or non-binary victims to compare)

**Steps:**
1. Find the proportion in women (tutorial) 
2. Find the poportion in men (you try)

#### 1) `filter()` function to create a smaller dataset with just women

From `ctdc_confirmed`, filter to where `gender == "Woman"`

In [None]:
# EXAMPLE CODE
ctdc_woman <- ctdc_confirmed %>% filter(gender == "Woman")

# display
head(ctdc_woman)

#### 2) Use `select()` to isolate the column that indicates if the victim reported experiencing abuse. 
From `ctdc_woman`, select the column `meansAbusePsyPhySex`

In [None]:
# EXAMPLE CODE
ctdc_woman_means <- ctdc_woman %>% select(meansAbusePsyPhySex)

# display
head(ctdc_woman_means)

#### 3) Use `summarise()` and `mean()` to find the proportion of 1's in the column indicating abuse.
From `ctdc_woman_means` dataset, summarise the mean of `meansAbusePsyPhySex`

In [None]:
# EXAMPLE CODE
ctdc_woman_summary <- ctdc_woman_means %>% 
                summarise(meansAbusePsyPhySex = mean(meansAbusePsyPhySex, na.rm = T))

# display
ctdc_woman_summary

---
## Part 2: Your Turn! Find the proportion of men who reported abuse. 

You are essentially repeating the same steps as above, but for men. Please work with your group if you get stuck, and feel free to ask me for help as well. You will be seeing these three functions over and over again in the analytical assignments, so slowly, let's get familiarized with how to use them!

#### `1) filter()` function to create a smaller dataset with just men
From `ctdc_confirmed`, filter to where `gender == "Man"`, storing it in `ctdc_man`

In [None]:
# YOUR ANSWER HERE
ctdc_man <- NULL # YOUR CODE HERE

# display
head(ctdc_man)

In [None]:
. = ottr::check("tests/Q1.R")

#### 2)  Use `select()` to isolate the column that indicates if the victim reported experiencing abuse
From `ctdc_man`, select the column `meansAbusePsyPhySex`.

In [None]:
# YOUR ANSWER HERE
ctdc_man_means <- NULL # YOUR CODE HERE

# display
head(ctdc_man_means)

In [None]:
. = ottr::check("tests/Q2.R")

#### 3)  Use summarise() and mean() to find the proportion of 1's in the column indicating abuse
From `ctdc_man_means` dataset, summarise the mean of `meansAbusePsyPhySex¶`

In [None]:
# YOUR ANSWER HERE
ctdc_man_summary<- NULL # YOUR CODE HERE

# displaya
head(ctdc_man_summary)

In [None]:
. = ottr::check("tests/Q3.R")

---
## Part 2.5 (if time permits): Let's see how many transgender/nonbinary individuals are included in this dataset. 
Filter to `gender == "Trans/Transgender/NonConforming"`

In [None]:
ctdc_nonbinary <-  NULL # YOUR ANSWER HERE

How many confirmed non-binary victims are in the data?

In [None]:
n_nonbinary <- NULL # YOUR ANSWER HERE
n_nonbinary

The number of confirmed transgender and/or non-binary individuals in this dataset is 354. This compares to 105,136 women victims and 30,143 men. This is significantly less data to make concrete claims from. 

---
## Part 3: Interpretation

Below is a chart from CTDC's website comparing the means of control for female and male victims. Does there seem to be a discrepancy in the abuse (psychological abuse, physical abuse, sexual abuse) experienced by men and women?

<img src="gendered.png" width=80% />

Now compare this chart to the estimates we retrieved using the same data provided (the CTDC data)
#### Present the estimate for man

In [None]:
# RUN THIS CELL. NO ACTION NEEDED. 
ctdc_man_summary

#### Present the estimate for woman

In [None]:
# RUN THIS CELL. NO ACTION NEEDED. 
ctdc_woman_summary

### Why don't we see a gendered experience of abuse in the estimates we calculated?

_Replace this text_