# Analytical Assignment 1: Introduction to Human Trafficking

### Substantive Objectives
What does human trafficking look like? In class, we have gone through readings that describe and analyze the practice. In this assignment, we will supplement our existing knowledge using data. The increasing emphasis on data in society and the social sciences has created a new demand for us to think critically through how others interpret data, double check their work, and be able to draw conclusions from data ourselves.

This assignment will be using dataset that aggregates reported cases of trafficking around the world, highlighting the forms of trafficking,  means of control, and recruiters involved. The goal is primarily for you to think critically about different measurements and interpretations of the same data, and additionally to introduce you to coding and working with large datasets. 

### Coding Objectives

Learn how to use the following functions. 
1) `read.csv()`
2) `nrow()`
3) `filter()` and `select()`
4) `summarise()`
5) `mean()`

## Setup
The cell below loads the [packages](https://www.geeksforgeeks.org/packages-in-r-programming/) needed for this assignment. You must run the cell for the rest of the assignment to work. 

In [None]:
# You *must* run this cell first. Do not change the contents of this cell.
library(testthat)
library(ottr)
library(tidyverse) 

<!-- BEGIN QUESTION -->

-----
## Question 1: ChatGPT
**Ask chat gpt the following: *"Define human trafficking. Expand on the history of human trafficking, how the modern concept of human trafficking emerged."*** 
    
**1a) (1 point) Copy paste the output below.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**1b) (3 points) Answer the following questions.**

>**(i) (1 point) What would you change or add to ChatGPT's answer based on the history and definition of human trafficking presented in class and in your readings? Name at least two. \
(ii) (1 point) What are the strengths and weaknesses of its answer? \
(iii) (1 point) What steps should you take to verify its response?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

-----
## Question 2: Types of Data and Human Trafficking
<span style="color:#3268a4">

<!-- BEGIN QUESTION -->

**2a)(4 points) What can we learn about victims from news articles and in-depth interviews? What can be gained by learning about trafficking from large datasets of information? In 5-6 sentences, describe the benefits of both types of data and the types of questions we can answer with each.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Question 3: Understanding the Data
<span style="color:#3268a4">

The global counter-trafficking community has recognized the importance of inter-organizational coordination in the standardization and consolidation of human trafficking data worldwide. While we are far from reaching an ideal standard, organizations such as the IOM have started taking steps towards this. The code chunk below loads a synthetic dataset provided by the [Counter Trafficking Data Collaborative](https://www.ctdatacollaborative.org/page/about) that is "the first global data hub on human trafficking, publishing harmonized data from counter-trafficking organizations around the world," that aggregates individual level data from various organizations. 

Note: A synthetic dataset is information that's been generated on a computer to augment or replace real data. In this case, the use of synthetic data is to protect sensitive data on human trafficking victims. 

In [None]:
# RUN CODE. DO NOT CHANGE
ctdc <- read.csv("2024_CTDC_synthetic.csv")

### Dataset Structure

**a. (1 point) How many observations are included in this dataset? Use the `nrow()` function.**

In [None]:
# TO DO
n_cases <- NULL # YOUR CODE HERE
n_cases

In [None]:
. = ottr::check("tests/q3a.R")

<!-- BEGIN QUESTION -->

**b) (3 points) Looking at the dataset, we see that each row/observation represents a person, but not every single person in this dataset has been identified as a victim of trafficking. Answer the following questions. (i) Which columns would you use to determine if an observation is a victim of trafficking? Look through the column names to see what makes sense. You can also reference the [codebook.](https://www.ctdatacollaborative.org/sites/g/files/tmzbdl2011/files/2024-02/Codebook_CTDC_global_synthetic_data_v2024.pdf) (ii) What is the difference between the estimated prevalence of trafficking and the number of detected trafficking cases?**



In [None]:
# NO ACTION NEEDED. RUN CELL.
# prints first 5 observations of the dataset
head(ctdc, 5)

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Dataset Variables
The above questions are more concerned about the data structure and interpretation of data. The questions below are concerned about the content of the data itself, thinking through what type of data is being collected on victims, and why they have been decided as key information to collect. Look through the [codebook](https://www.ctdatacollaborative.org/sites/g/files/tmzbdl2011/files/2024-02/Codebook_CTDC_global_synthetic_data_v2024.pdf) provided by the CTDC and what you have learned in lecture to help answer the questions below. 

**c) (2 points) One component in the definition of trafficking is the means of control. In your own words, what does this refer to? Pick one of the "means" included in the codebook and describe this in the context of trafficking based on the lectures and/or readings.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**d) (2 points) What are the types of trafficking included in the dataset, what forms of trafficking are in the "other" category? One concern with this data is systematic reporting biases. Name one group that is likely to be underreported or overreported in a dataset like CTDC. What is one source of bias?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**e) (2 points) What is the role of the recruiter? Who can become a recruiters? Is this what you would expect?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Question 4: Replicating estimates
The CTDC synthetic data [dashboard](https://www.ctdatacollaborative.org/global-synthetic-data-dashboard) has generated summary statistics of the human trafficking reports in their dataset. Let's try to replicate these numbers. 

**We are interested in estimating the proportion of (1) each form of trafficking, (2) means of control, and (3) recruiters among those who are trafficking victims.**



**a) (1 point) Filtering Rows**: Not all individuals in this dataset are trafficking victims. Let's first filter our rows to only victims of trafficking. Use the filter function to keep only individuals who have been coded as 1 in either `isForcedLabour`, `isSexualExploit`, and `isOtherExploit`.

Example code: `dat %>% filter(criteria1 | criteria2 | citeria3)` 

In [None]:
victims.df <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q3e.R")

###  Forms of Trafficking

**Example) Selecting Columns:** Not all individuals in this dataset are trafficking victims. Let's practice how to subset dataframes. 

In [None]:
# Step 2: Now select just the three rows that correspond with trafficking
forms.df <- victims.df %>% select(isForcedLabour, isSexualExploit, isOtherExploit) #SOLUTION
head(forms.df) # display

**Example) Calculating Summary Statistics:** Now we use the `summarise()` function to find the proportion of each form of trafficking within our data. Notice that we include, `na.rm = T`. What would happen if we didnt (rhetorical question)? 

In [None]:
# Step 3: Use the summarise function to find the proportion of victims. 

forms.summary <- forms.df %>% 
    # find the means of each type of trafficking
    summarise( 
        isForcedLabour = mean(isForcedLabour, na.rm = T),
        isSexualExploit = mean(isSexualExploit, na.rm = T), 
        isOtherExploit = mean(isOtherExploit, na.rm = T)
    ) 

# display
forms.summary

### Means of Control

**b) (2 points) Your task is to do the same thing, but with the means of control. From the victims dataset `victims.df`, select the three columns: `meansThreats`, `meansDebtBondageEarnings`, `meansAbusePsyPhySex` .**


In [None]:
# YOUR ANSWER HERE
# Step 1: Subset 
victims.df <- NULL # YOUR CODE HERE

means.df <- NULL # YOUR CODE HERE
head(means.df) 

In [None]:
. = ottr::check("tests/q4a.R")

**c) (2 point) Now we want to find the prevalence of each means. Use the `summarise()` function on your subset `means.df` to return the average of each column.** 
 

In [None]:
# YOUR SOLUTION HERE

means.summary <- NULL # YOUR CODE HERE

# Display
means.summary

In [None]:
. = ottr::check("tests/q4b.R")

### Recruiters

**d) (2 points) Now we want to do the same thing, but with the recruiter. Filter to the subset of data that is affirmative for human trafficking, then select all the variables that start with "recruiter". Then, use the `summarise()` and `mean()` function on the `recruiter_ht` to return the means (i.e. average) of each column**


In [None]:
# YOUR SOLUTION HERE
# PART 1: SUBSET
recruiter.df <- NULL # YOUR CODE HERE

# PART 2: SUMMARIZE
# Summarize: Make sure to retain the column order!
recruiter.summary <- NULL # YOUR CODE HERE

# display
recruiter.summary

In [None]:
. = ottr::check("tests/q4c.R")

#  Question 5: Interpretations

<!-- BEGIN QUESTION -->

I asked chatGPT to tell me the most commonly reported forms of human trafficking, means of control, and typical recruiter. It responded...
1.  Most commonly reported form of trafficking is sexual exploitation. It is the most prevalent form of human trafficking, where victims are forced into prostitution, pornography, and other forms of sexual exploitation.
2. Most commonly reported means of control is physical violence and threats are the most commonly reported methods used by traffickers to control their victims.
3. Most commonly reported recruiters are acquaintances or family members. The majority of traffickers are known to the victims, including friends, romantic partners, or even family members. This familiarity helps traffickers gain the trust of victims before exploiting them.

**5a)(3 points) Based on the data from CTDC [dashboard](https://www.ctdatacollaborative.org/global-synthetic-data-dashboard) (which you have replicated) how would you assess ChatGPT's answer? Do the numbers match the numbers on the dashboard? Does ChatGPT accurate use "prevalence" and "commonly reported"?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**b) (3 points) The first chart below is taken from the CTDC dashboard and should match your numbers. The second chart is taken from UNODC's 2022 report from your Week 2 readings. Please...**

> (1) Provide two possible explanations for why we see differences in their numbers.\
> (2) Provide at least one possible real world implications of the differences in results from two well-funded and credible organizations.

<img src="dashboard1.png" width="30%"/>
<img src="UNODC_Trafficking_Prevalence.png" width="70%"/>



_Type your answer here, replacing this text._

<!-- END QUESTION -->


# Submitting Your Notebook (please read carefully!)

Congrats, you're done! Hopefully, through this pset you have began to...
1. Understand how to work with ChatGPT, being aware of its limitations (and realize that through this course, you will be able to improve its answers)
2. Think critically about numbers and their interpretation. This will allow you to see through imprecise language and citations of statistics in general. 
3. Started working with datasets in R! An incredibly useful skill to have and we will keep on building your proficiency. 
4. Understand the transition from slavery to human trafficking and how the practice and concept of human trafficking emerged. 
4. Understand the forms, means of control, and recruiters in human trafficking on a global scale. 

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save Notebook`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.

In [None]:
ottr::export("pset1.ipynb")

After you hit "Run" on the cell above, click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.