# Assignment 1: Victims of Human Trafficking

### Substantive Objectives
In this assignment, we will be an existing dataset on victims to understand what human trafficking is. We will use a dataset on the reported victims of trafficking to identify the forms of trafficking, how victims are recruited, and how prevalent different forms of trafficking are. This will involve engaging with documents such as codebooks produced by organizations. 

### Coding Objectives

Learn how to use the following functions. 
1) `read.csv()`
2) `nrow()`
3) `filter()` and `select()`
4) `summarise()`

## Setup
The cell below loads the [packages](https://www.geeksforgeeks.org/packages-in-r-programming/) needed for this assignment. 

In [None]:
# You *must* run this cell first. Do not change the contents of this cell.
library(testthat)
library(ottr)
library(tidyverse)

<!-- BEGIN QUESTION -->

## Question 1: ChatGPT
**(3 points) Ask chat gpt the following: *"Define human trafficking. Include the different forms of trafficking, its impacts, and its prevalence."*** 
    
**Copy paste the output below. Is it consistent with the definition of human trafficking presented in class and in your readings? What are the strengths and weaknesses of its answer? What steps should you take to verify its response?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Question 2: Motivating the Question
<span style="color:#3268a4">

<!-- BEGIN QUESTION -->

**(2 points) We often hear that human trafficking is a violation of human rights. In 2-3 sentences, explain what this mean to you. Use examples and definitions from the readings to illustrate this point.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**(2 points) What can we learn about victims from in-depth interviews and individual cases? What can we learn about trafficking from large datasets of information?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Question 3: Understanding the Data
<span style="color:#3268a4">

The global counter-trafficking community has recognized the importance of inter-organizational coordination in the standardization and consolidation of human trafficking data worldwide. While we are far from reaching an ideal standard, organizations such as the IOM have started taking steps towards this. The code chunk below loads a synthetic dataset provided by the [Counter Trafficking Data Collaborative](https://www.ctdatacollaborative.org/page/about) that is "the first global data hub on human trafficking, publishing harmonized data from counter-trafficking organizations around the world," that aggregates individual level data from various organizations. This is the collaborative where IOM publishes their data. 

In [None]:
# RUN CODE. DO NOT CHANGE
ctdc <- read.csv("2024_CTDC_synthetic.csv")

#### a. (1 point) How many observations are included in this dataset? You can use the `nrow()` function or another alternative that returns the same number. The solution should be an integer.  


In [None]:
# TO DO
n_cases <- NULL # YOUR CODE HERE
n_cases

In [None]:
. = ottr::check("tests/q3a.R")

<!-- BEGIN QUESTION -->

#### b) (3 points) What does each observation represent?  Why can't we use the this dataset to estimate the prevalence of trafficking? 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

The above questions are more concerned about the dat structure and interpretation of data. The questions below are concerned about the content of the data itself, thinking through what type of data is being collected on victims, and why they have been decided as key information to collect. Look through the [codebook](https://www.ctdatacollaborative.org/sites/g/files/tmzbdl2011/files/2024-02/Codebook_CTDC_global_synthetic_data_v2024.pdf) provided by the CTDC to help answer the questions below. 

**c) (2 points) What do the variables that start with "means" represent, and why is this concept critical to trafficking?Pick one of the "means" and describe how it is used in the context of trafficking?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**d) (2 points) What are the types of trafficking included in the dataset, what forms of trafficking are in the "other" category, and what are their differences? Do you think this covers all forms of trafficking?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**e) (2 points) How are victims recruited? What is the role of friends, family, and intimate partners become recruiters?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Question 4: Replicating estimates
The CTDC synthetic data [dashboard](https://www.ctdatacollaborative.org/global-synthetic-data-dashboard) has generated these estimates for the prevalence of different types of human trafficking. Let's see if we can replicate these numbers. 


### Forms of Trafficking
**a) (3 points) The first step in replicating the estimates is making sure we are looking at the right subset of data. In words, explain what the code chunk below is doing. Discuss what the `filter` function is doing, what the `select` function is doing, and what `head` does, and what the final dataset includes.**

Hint: To look up what a function does, you can google search or run a code chunk with `?function_name` to get the documentation.  

In [None]:
# DO NOT CHANGE.  
forms_ht <- ctdc %>% filter((isForcedLabour >0 | isSexualExploit > 0 | isOtherExploit >0)) %>%
                select(isForcedLabour, isSexualExploit,isOtherExploit)
head(forms_ht)

**b) Now we want to find the prevalence of each form of exploitation in this dataset. Use the `summary()` function on the `forms_ht` to return the means (i.e. average) of each column.  Your answer should match the image above.** 


In [None]:
# TO DO
forms_summary <- NULL # YOUR CODE HERE
forms_summary

In [None]:
. = ottr::check("tests/q4b.R")

### Means of Trafficking

**c) Now we want to do the same thing, but with the means of trafficking. Filter to the subset of data that is affirmative for human trafficking (as seen above), then select all the variables that start with "means".**

In [None]:
# YOUR ANSWER HERE
means_ht <- NULL # YOUR CODE HERE
head(means_ht)

**d) Now we want to find the prevalence of each means. Use the `summary()` function on the `means_ht` to return the means (i.e. average) of each column.** 


In [None]:
# YOUR SOLUTION HERE
means_summary <- NULL # YOUR CODE HERE
means_summary

### Means of Recruitment

**c) Now we want to do the same thing, but with the recruiter. Filter to the subset of data that is affirmative for human trafficking, then select all the variables that start with "recruiter". Then, use the `summary()` function on the `recruiter_ht` to return the means (i.e. average) of each column**

In [None]:
# YOUR SOLUTION HERE
recruiter_ht <- NULL # YOUR CODE HERE
head(recruiter_ht)

recruiter_summary <- NULL # YOUR CODE HERE
recruiter_summary

## Conclusion

<!-- BEGIN QUESTION -->

**f) In this dataset of victims compiled by the CTDC, what was the most common form of trafficking that was reported? What was the most common means to keep victims in captivity, what was the most commonly reported relation besides "other"? What is your reaction?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->


# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save Notebook`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.

In [None]:
ottr::export("pset1.ipynb")

After you hit "Run" on the cell above, click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.