# Assignment 2: Recommending a Target Country

### Substantive Objectives
The United States Agency for International Development (USAID) is looking to fund a new counter trafficking in persons (CTIP) project and has hired you as an advisor. There is only enough funds to expand activities in **one** country.

Your objective is to use quantitative data, supplemented by qualitative research, to identify one country that can benefit greatly from USAID support with anti-trafficking activities. This should be a realistic recommendation so take into consideration practical implementation concerns (e.g. an ongoing war, political openness to NGO activity) when making your recommendation. 

While there is no single right recommendation, all your decisions must be justified. 

### Coding Objectives
1) Practice retrieving summary statistics using `mean()`, `median()`, `min()`, `max()`
2) Practice isolating data of interest using `filter()` and `select()` 
3) Practice arranging datasets using `arrange()` and `desc()`

## Setup
The code chunk below loads the packages that we need. 

In [None]:
# You *must* run this cell first. Do not change the contents of this cell.
library(testthat)
library(ottr)
library(tidyverse)

The code chunk below loads the datasets that we will be using. These datasets are drawn from Walk Free Foundation's Global Slavery Index. This is one example of how an organization attempts to standardize global estimates of trafficking prevalence, vulnerability, and government responses. 

In [None]:
df_gsi <- read.csv("gsi-scores-2023.csv") %>% select(country, region, population, prev_per_1000, prev_total, vulnerability,
                                                          government_response)
df_gsi_full <- read.csv("gsi-scores-2023.csv") 

## <p style="color:#2272A8;">Question 1: Identifying a Country to Recommend

### <p style="color:#5F7BA4;">  ChatGPT's Recommendation
Let's first return to our semi-trusty advisor ChatGPT and see what it would recommend. I asked ChatGPT "What criteria should USAID take when expanding anti-human trafficking efforts? Based on these criteria, what country in the world should the US expand counter trafficking efforts in?" It responded:

> When expanding anti-human trafficking efforts, USAID should consider several criteria to ensure that resources are allocated effectively and that initiatives are tailored to address specific needs in different contexts. Key criteria may include:
>1. **Prevalence of Human Trafficking**: Assess the extent of human trafficking in potential target countries, including the number of victims and the types of trafficking (sex trafficking, labor trafficking, etc.).
2. **Legal and Policy Framework**: Evaluate the country's laws, regulations, and enforcement measures related to human trafficking, including the prosecution of traffickers and protection of victims.
3. **Government Commitment**: Look for a demonstrated commitment from the country's government to combating human trafficking, including the allocation of resources and political will to implement necessary changes.
4. **Civil Society Engagement**: Consider the strength and activity of civil society organizations working against trafficking and their ability to work with the government.
5. **Vulnerability Factors**: Analyze social, economic, and political factors that contribute to vulnerability, such as poverty, conflict, lack of education, and gender inequality.
6. **Existing Support Mechanisms**: Identify any existing efforts by other countries or organizations in the region that could be built upon or complemented.
7. **Regional Trends**: Consider regional dynamics, including human trafficking routes, and how transnational issues may affect trafficking patterns.

>Based on these criteria, one country that could be considered for expanded anti-human trafficking efforts is **India**. India has high prevalence levels of human trafficking, driven by socio-economic factors such as poverty and gender inequality. While there are laws in place to combat trafficking, enforcement can be inconsistent. Additionally, there is a vibrant civil society that is active in addressing these issues, which can support expanded efforts. 

> Overall, the selection should be based on comprehensive research and assessment to identify where USAID's support would be most effective in reducing human trafficking.
    
**a) Use the code block below to assign the variable `chatGPT_target` to `India`. With the exact capitalization and spelling.**

In [None]:
# Assign the variable below to the country name (e.g. "India")
chatGPT_target <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q1a.R")

<!-- BEGIN QUESTION -->

### <p style="color:#5F7BA4;"> Measurement Challenges in HT Reports
We are not yet at a point where we can fully rely on what ChatGPT says, so we definitely want to cross check with other sources. More traditional and commonly cited sources of reporting on trafficking include:
    
> 1. Reports by the ILO in partnership with academics and NGOs
2. The US Department of State's Trafficking In Person's Report
3. UNODC's Global Report on Trafficking in Persons
4. NGO reports such Polaris, Human Rights Watch, Walk Free Foundation
5. Official government reports and statistics from each country
6. Academic Studies
7. Published books

As we engage with reports on human rights, it is important to understand the **biases** that stem from (1) changes in standards and definitions over time, (2) the goals and capacity of the organization, (3) limitations imposed by the data collection environment, and (4) coder effects (Haschke and Arnon (2020)). 
    
It is also important to understand the **assumptions and limitations** of estimates relying on complex statistical methods. These include the theories that justify how source is dealing with missing data, which variables to include, how to weight data, etc. 
    
**b) (3 points) In this assignment, we will be using Walk Free Foundation's [Global Slavery Index (GSI)](https://www.walkfree.org/global-slavery-index/) to cross check ChatGPT's recommendation. Look through the [methodology](https://www.walkfree.org/global-slavery-index/methodology/methodology-content/#vulnerability) and list one potential bias, one limitation based on their modelling estimation strategy, and one strength of this dataset.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### <p style="color:#5F7BA4;"> Using quantitative data to assess the choice of India

Let's double check chatGPT's reasoning for recommending India. We have comprehensive data on 160+ countries though the Walk Free Foundation's 2023 [Global Slavery Index (GSI)](https://www.walkfree.org/global-slavery-index/). This dataset details three of the selection criteria that chatGPT has listed, namely: **the prevalence, vulnerability, and government responses** to trafficking.  You can look through their [website](https://www.walkfree.org/global-slavery-index/) or see this [excel file](2023-GSI-Data-Full.xlsx) for a detailed explanation. In short, the variables that we are looking at are as follows:

* `prev_per_1000`: estimated prevalence of human trafficking for 1000 individuals
* `prev_total`: estimated total prevalence of human trafficking
* `vulnerability`: score for vulnerability to human trafficking. Values range from 0 (low) to 100 (high).
* `government_response`: score for government responsiveness to human trafficking. Values range from 0 (low) to 100 (high).

In [None]:
# DO NOT CHANGE. Prints the first 5 rows of df.gsi_scores
head(df_gsi)

**c) (1 point) We start with looking at the prevalence estimates. One way we could go about identifying potential countries to expand its anti-trafficking efforts is looking at where trafficking prevalence is high. What is the estimated number of human trafficking victims in India? Store this estimate in an object called `gpt_prev_total`**


In [None]:
# YOUR ANSWER GOES HERE. 
# Step 1: use filter(country == XX) to filter to the country of interest
# Step 2: use select(column_name) to select the `prev_total` column
# Step 3: use pull(column_name) to pull the value of `prev_total` column out of the dataframe and as a number
gpt_prev_total_df <- NULL # YOUR CODE HERE

# print the value
gpt_prev_total

In [None]:
. = ottr::check("tests/q1c.R")

**d) (5 points) How does GPT's country's trafficking prevalence compare to the rest of the world?**

> **(i) (4 points)** Store the min, mean, median, and max of `prev_total` in their respective objects\
> **(ii)(1 point)** Use the `arrange()` function on the `df.gsi_scores` dataframe to sort the dataframe by `prev_total` with the *highest prevalence* first.

*Hints:* \
For part (i), make sure to set `na.rm = T`. Ex: `mean(x, na.rm = T)`. \
For part (ii), use `desc()` to sort from highest to lowest. Ex: `df %>% arrange(desc(y))`

In [None]:
# calculate summary statistics
min_prev_total <- NULL # YOUR CODE HERE
mean_prev_total <- NULL # YOUR CODE HERE
median_prev_total <- NULL # YOUR CODE HERE
max_prev_total <- NULL # YOUR CODE HERE

# sort dataset in order of prevalence
gsi_sort_by_prev_total <- NULL # YOUR CODE HERE

# print answers
paste0("min: ", min_prev_total)
paste0("mean: ", mean_prev_total)
paste0("median: ", median_prev_total)
paste0("max: ", max_prev_total)
gsi_sort_by_prev_total

#### <p style="color:#A0A0A0">  Where does your country fall? (**No action needed below, Example Distribution Plot**)
The code chunk below plots a histogram of the total prevalance estimates across all countries. The red line indicates the prevalence of human trafficking in ChatGPT's recommended country. The blue line indicates the median prevalence in the 160 countries.

In [None]:
# DO NOT EDIT
# Example code for plotting a histogram
gsi_sort_by_prev_total %>% ggplot(aes(x = prev_total)) +
    
    # plots histogram
    geom_histogram(bins = 10, alpha = .7) +

    # add vertical line for chat GPT's recommended country. 
    geom_vline(xintercept = gpt_prev_total, col = "red")+
    geom_vline(xintercept = mean(median_prev_total, na.rm = T), col = "blue")+
    
    # changes background color, lines, etc. 
    theme_bw() +

    # add labels
    labs(x = "Estimated Total Prevalence", y = "Count")

In [None]:
. = ottr::check("tests/q1d.R")

<!-- BEGIN QUESTION -->

**e) (2 points) Does India have a higher absolute prevalence of trafficking relative to other countries? What is a critique of the using absolute value of prevalence to determine which country to expand anti-trafficking efforts?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**f) Now we look at an alternate variable of interest `prev_per_1000` which represents the estimated prevalence of human trafficking per 1000 individuals. Repeat (c) and (d) for this variable. In other words:**
\
i) Get the prevalence per thousand for India stored in an object named `gpt_prev_1000`. \
ii) Store the min, mean, median, and max of the prevalence per thousand in their respective objects\
iii) Then store a dataframe of the countries sorted by highest to lowest estimated prevalence per thousand in an object named `gsi_sort_by_prev_1000`.

In [None]:
# store estimate for chatGPT's recommended country

# step 1: filter(country == XX)
# step 2: select(prev_per_1000)
# step 3: pull(prev_per_1000)
gpt_prev_1000 <- NULL # YOUR CODE HERE

# get summary statistics of prev_per_1000
min_prev_1000 <- NULL # YOUR CODE HERE
mean_prev_1000 <- NULL # YOUR CODE HERE
median_prev_1000 <- NULL # YOUR CODE HERE
max_prev_1000 <- NULL # YOUR CODE HERE

# arrange dataframe in descring order by prev_per_1000
gsi_sort_by_prev_1000 <- NULL # YOUR CODE HERE

# print your results
paste0(chatGPT_target, ": ", gpt_prev_1000)
paste0("min: ", min_prev_1000)
paste0("mean: ", mean_prev_1000)
paste0("median: ", median_prev_1000)
paste0("max: ", max_prev_1000)
gsi_sort_by_prev_1000

In [None]:
. = ottr::check("tests/q1f.R")

#### <p style="color:#A0A0A0">  Where does your country fall? (**No action needed below, Example Distribution Plot**)
The code chunk below plots a histogram of the total prevalance estimates across all countries. The red line indicates the prevalence of human trafficking in ChatGPT's recommended country and the blue line indicates the median for the 160 countries in the dataset. 

In [None]:
# DO NOT EDIT
# Example code for plotting a histogram
gsi_sort_by_prev_1000 %>% ggplot(aes(x = prev_per_1000)) +
    
    # plots histogram
    geom_histogram(bins = 10, alpha = 0.5) +

    # add vertical line for chat GPT's recommended country. 
    geom_vline(xintercept = gpt_prev_1000, col = "red")+

    # add vertical line for mean prevelence per 1000
    geom_vline(xintercept = median_prev_1000, col = "blue")+
    
    # changes background color, lines, etc. 
    theme_minimal() +

    # add labels
    labs(x = "Estimated Total HT Prevalence per 1000", y = "Count")

<!-- BEGIN QUESTION -->

**g) (2 points) How does ChatGPT's country compare to the other countries in the dataset when looking at prevalence per 1000 individuals? Looking at countries with higher prevalence per 1000 individuals in q1f, would you be inclined to change your recommendation, why?**

*Note: If you are unsure if you would change your recommendation, discuss additional information you need to make your decision.*

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### <p style="color:#5F7BA4;"> Finalizing your country recommendation

<!-- BEGIN QUESTION -->

**h) (2 points) Prevalence is a potential starting point, however, as mentioned in 1a, other criteria for selection include government commitment and vulnerability of the country to trafficking. Walk Free Foundation also identifies (1) vulnerability and (2) government action as important dimensions of analyses when evaluating the trafficking situation. See their [documentation](2023-GSI-Data-Full.xlsx) for more details on these measures.**

> (i) Pick one out of the five vulnerability dimensions (Governance Issues, Lack of Basic Needs, Inequality, Disenfranchised Groups, Effects of Conflict). Would a high score in this dimension make you more or less inclined to recommend that country, why?

> (ii) Pick one out of the five governance milestones (survivor support, criminal justice, coordination, risk factors, sourcing reform). Would a low score make you more or less inclined to recommend that country, why?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

#### <p style="color:#A0A0A0">  The tables below show the countries that have been scored to be the most vulnerable and with the least government response (No action needed, just run the cells to see the tables). 

**The 25 Countries with the Highest Vulnerability Scores (100 = highest or most vulnerable)** \
The code chunk below displays the data sorted by most vulnerable countries. To see how these scores are calculated, please see the [excel file](2023-GSI-Data-Full.xlsx). 

In [None]:
# RUN THIS CELL
# Sort DF by vulnerability (highest to lowest)
df_gsi_vulnerability <- df_gsi_full %>% select(country, region, vulnerability,
                                               governance_issues:effect_conflict, 
                                               government_response, prev_total, prev_per_1000,) %>%
                        arrange(desc(vulnerability)) 

df_gsi_vulnerability %>% head(25)

#### The 25 Countries with the Lowest Government Response Scores (lower scores = less responsive)
The code chunk below displays the data sorted by level of government response. Lower scores indicate less action. To see how these scores are calculated, please see the [excel file](2023-GSI-Data-Full.xlsx). 

In [None]:
# RUN THIS CELL
# Sort DF by government response (highest to lowest)
df_gsi_govt <- df_gsi_full %>% select(country, region, government_response, survivor_support:sourcing_reform,
                        vulnerability, prev_total, prev_per_1000) %>% arrange(government_response)

df_gsi_govt %>% head(25)

<!-- BEGIN QUESTION -->

**i) (5 points) Given this information, please make your own recommendation on which country USAID should expand counter-trafficking activities in. You can feel free to dig through the dataset yourself. Please justify your answer and cite any outside sources you use.** 

Recommended length: 200-250 words

_Type your answer here, replacing this text._

In [None]:
# OPTIONAL: use this cell to explore the data. You can use the dataframes: `df_gsi_govt` or `df_gsi_vulnerabiltiy`

## example code: uncomment the lines you want to run, edit the code, or write your own code!

## ----------------------------
## looking at a certain country
## ----------------------------
# df_gsi_govt %>% filter(country == "India")
# df_gsi_vulnerability %>% filter(country == "India")

## ----------------------------
# sorting by a certain column
## ----------------------------
# df_gsi_govt %>% arrange(desc(effect_conflict)) # highest to lowest
# df_gsi_govt %>% arrange(effect_conflict) # lowest to highest


## ----------------------------
# sorting by a certain column and region
## ----------------------------
# df_gsi_vulnerability %>% filter(region == "Asia and the Pacific")  %>%
#                           arrange(desc(effect_conflict))

## ----------------------------
# getting countries within a range of values
## ----------------------------
# df_gsi_govt %>% filter(government_response %in% 50:70)

## ----------------------------
# get the means of each column
## ----------------------------
#df_gsi_vulnerability %>% summarise_if(is.numeric, mean, na.rm = T)

## --------------------------------------------
# get the means for each column for each region
## --------------------------------------------
#df_gsi_vulnerability %>% group_by(region) %>% summarise_if(is.numeric, mean, na.rm = T) 


<!-- END QUESTION -->

# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save Notebook`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.tion:**

In [None]:
ottr::export("pset2.ipynb")

After you hit "Run" on the cell above, click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.