# Assignment 2: Recommending a Target Country

### Substantive Objectives
The United States Agency for International Development (USAID) is looking to fund a new counter trafficking in persons (CTIP) project and has hired you as an advisor. There is only enough funds to expand activities in **one** country.

Your objective is to use quantitative data, supplemented by qualitative research, to identify one country that can benefit greatly from USAID support with anti-trafficking activities. This should be a realistic recommendation so take into consideration practical implementation concerns (e.g. an ongoing war, political openness to NGO activity) when making your recommendation. 

While there is no single right recommendation, all your decisions must be justified. 

### Coding Objectives
1) Practice retrieving summary statistics using `mean()`, `median()`, `min()`, `max()`
2) Practice isolating data of interest using `filter()` and `select()` 
3) Practice arranging datasets using `arrange()` and `desc()`

## Setup
The code chunk below loads the packages that we need. 

In [None]:
# You *must* run this cell first. Do not change the contents of this cell.
library(testthat)
library(ottr)
library(tidyverse)

The code chunk below loads the datasets that we will be using. These datasets are drawn from Walk Free Foundation's Global Slavery Index. This is one example of how an organization attempts to standardize global estimates of trafficking prevalence, vulnerability, and government responses. 

In [None]:
df.gsi_scores <- read.csv("gsi-scores-2023.csv") %>% select(country, region, population, prev_per_1000, prev_total, vulnerability,
                                                          government_response)

## <p style="color:#2272A8;">Question 1: Identifying a Country to Recommend

### <p style="color:#5F7BA4;">  ChatGPT's Recommendation
**a) Let's first return to our semi-trusty advisor ChatGPT and see what it would recommend. Ask ChatGPT "What country in the world should there be more counter trafficking efforts?" and assign the variable `chatGPT_target` to the name of the country.**
    
*Note: Please spell the name of the country as it is spelled in the list of countries below. In the rare case that the country chatGPT suggested is not in this list of countries, please ask chatGPT to recommend another.*

In [None]:
# Run this cell to see how you should be spelling the country. Do not change. 
gsi_countries <- df.gsi_scores$country
gsi_countries

In [None]:
# Assign the variable below to the country name (e.g. "India")
chatGPT_target <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q1a.R")

<!-- BEGIN QUESTION -->

### <p style="color:#5F7BA4;"> Alternative Sources
We are not yet at a point where we can blindly trust what ChatGPT tells us. Other common sources of reporting on trafficking include NGO reports, the US Department of State Trafficking In Person's Report, UNODC's Global Report on Trafficking in Persons, official government reports and statistics, among others. 
    
**b) Pick three sources in which you could use to evaluate a country's human trafficking situation, and explain up to three strengths and three weaknesses of relying on each source.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### <p style="color:#5F7BA4;"> Working with the data yourself!

We so far have a recommendation from AI, and some third party sources to evaluate AI's recommendation. What if there is a question you have that wasn't answered in the report? What if you don't trust the numbers published in the report? You could go design a study to source your own data or search for datasets that you have trust in. 

However, before we get there, the first step is to know how to work with data once you get your hands on it, and understand how the existing counter-trafficking world is measuring the prevalence of trafficking and how governments are responding. 
    
    
    
For this question, we are working with Walk Free Foundation's Global Slavery Index (GSI) to evaluate ChatGPT's recommendation. The Walk Free Foundations [Global Slavery Index (GSI)](https://www.walkfree.org/global-slavery-index/) dataset details the prevalence, vulnerability, and government responses to trafficking for 160 countries.  You can look through their website and see this [excel file](2023-GSI-Data-Full.xlsx) for a detailed explanation. In short, the variables that we are looking at are as follows:

* `prev_per_1000`: estimated prevalence of human trafficking for 1000 individuals
* `prev_total`: estimated total prevalence of human trafficking
* `vulnerability`: score for vulnerability to human trafficking. Values range from 0 (low) to 100 (high).
* `government_response`: score for government responsiveness to human trafficking. Values range from 0 (low) to 100 (high).

In [None]:
# DO NOT CHANGE. Prints the first 5 rows of df.gsi_scores
head(df.gsi_scores)

**c) We start with looking at the prevalence estimates. One way we could go about recommending a country to expand its anti-trafficking efforts is looking at where trafficking prevalence is the highest. What is the estimated number of human trafficking victims in the country ChatGPT recommended? Store this estimate in an object called `gpt_prev_total`**

*Hint: Use the `filter` function to select the country you are interested, then use `pull` to pull the value for `prev_total` from the `dplyr` package*

In [None]:
# YOUR ANSWER GOES HERE
# Step 1: filter to the country of interest
# Step 2: pull the value from the `prev_total` column
gpt_prev_total <- NULL # YOUR CODE HERE

# print the value (NOTE: this object should be a number and NOT a dataframe)
gpt_prev_total 

In [None]:
. = ottr::check("tests/q1c.R")

**d) How does GPT's country's trafficking prevalence compare to the rest of the world?**

(i) Store the min, mean, median, and max of `prev_total` in their respective objects\
(iii) Use the `arrange()` function on the `df.gsi_scores` dataframe to sort the dataframe by `prev_total` with the *highest prevalence* first.

*Hints:* \
For part (i), make sure to set `na.rm = T`. Ex: `mean(x, na.rm = T)`. \
For part (ii), use `desc()` to sort from highest to lowest. Ex: `df %>% arrange(desc(y))`

In [None]:
# calculate summary statistics
min_prev_total <- NULL # YOUR CODE HERE
mean_prev_total <- NULL # YOUR CODE HERE
median_prev_total <- NULL # YOUR CODE HERE
max_prev_total <- NULL # YOUR CODE HERE

# sort dataset in order of prevalence
gsi_sort_by_prev_total <- NULL # YOUR CODE HERE

# print answers
paste0("min: ", min_prev_total)
paste0("mean: ", mean_prev_total)
paste0("median: ", median_prev_total)
paste0("max: ", max_prev_total)
gsi_sort_by_prev_total

#### <p style="color:#A0A0A0">  Where does your country fall? (**No action needed below, Example Distribution Plot**)
The code chunk below plots a histogram of the total prevalance estimates across all countries. The red line indicates the prevalence of human trafficking in ChatGPT's recommended country. The blue line indicates the median prevalence in the 160 countries.

In [None]:
# DO NOT EDIT
# Example code for plotting a histogram
gsi_sort_by_prev_total %>% ggplot(aes(x = prev_total)) +
    
    # plots histogram
    geom_histogram(bins = 10, alpha = .7) +

    # add vertical line for chat GPT's recommended country. 
    geom_vline(xintercept = gpt_prev_total, col = "red")+
    geom_vline(xintercept = mean(median_prev_total, na.rm = T), col = "blue")+
    
    # changes background color, lines, etc. 
    theme_bw() +

    # add labels
    labs(x = "Estimated Total Prevalence", y = "Count")

In [None]:
. = ottr::check("tests/q1d.R")

<!-- BEGIN QUESTION -->

**e) What is a critique of the using absolute value of prevalence to determine which country to expand anti-trafficking efforts?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**f) Now we look at an alternate variable of interest `prev_per_1000` which represents the estimated prevalence of human trafficking per 1000 individuals. Repeat (c) and (d) for this variable. In other words:**
\
i) Get the prevalence per thousand for the country chatGPT recommended stored in an object named `gpt_prev_1000`. \
ii) Store the min, mean, median, and max of the prevalence per thousand in their respective objects\
iii) Then store a dataframe of the countries sorted by highest to lowest estimated prevalence per thousand in an object named `gsi_sort_by_prev_1000`.

In [None]:
# store estimate for chatGPT's recommended country
gpt_prev_1000 <- NULL # YOUR CODE HERE

# get summary statistics of prev_per_1000
min_prev_1000 <- NULL # YOUR CODE HERE
mean_prev_1000 <- NULL # YOUR CODE HERE
median_prev_1000 <- NULL # YOUR CODE HERE
max_prev_1000 <- NULL # YOUR CODE HERE

# arrange dataframe in descring order by prev_per_1000
gsi_sort_by_prev_1000 <- NULL # YOUR CODE HERE

# print your results
paste0(chatGPT_target, ": ", gpt_prev_1000)
paste0("min: ", min_prev_1000)
paste0("mean: ", mean_prev_1000)
paste0("median: ", median_prev_1000)
paste0("max: ", max_prev_1000)
gsi_sort_by_prev_1000

In [None]:
. = ottr::check("tests/q1f.R")

#### <p style="color:#A0A0A0">  Where does your country fall? (**No action needed below, Example Distribution Plot**)
The code chunk below plots a histogram of the total prevalance estimates across all countries. The red line indicates the prevalence of human trafficking in ChatGPT's recommended country and the blue line indicates the median for the 160 countries in the dataset. 

In [None]:
# DO NOT EDIT
# Example code for plotting a histogram
gsi_sort_by_prev_1000 %>% ggplot(aes(x = prev_per_1000)) +
    
    # plots histogram
    geom_histogram(bins = 10, alpha = 0.5) +

    # add vertical line for chat GPT's recommended country. 
    geom_vline(xintercept = gpt_prev_1000, col = "red")+

    # add vertical line for mean prevelence per 1000
    geom_vline(xintercept = median_prev_1000, col = "blue")+
    
    # changes background color, lines, etc. 
    theme_minimal() +

    # add labels
    labs(x = "Estimated Total Prevalence per 1000", y = "Count")

<!-- BEGIN QUESTION -->

#### g) How does ChatGPT's country compare to the other countries in the dataset when looking at total prevalence and prevalence per 1000 individuals? After looking at the Walk Free Foundation's GSI's country prevalences, would you use ChatGPT's recommendation and why?


_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Your Turn: What is your country recommendation?

<!-- BEGIN QUESTION -->

**h) Walk Free Foundation also identifies (1) vulnerability and (2) government action as important dimensions of analyses when evaluating the trafficking situation. See their [documentation](2023-GSI-Data-Full.xlsx) for more details on these measures. What additional information do these measures provide? Is there anything else beyond what Walk Free has measured that you would want to take into consideration when selecting a country?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

#### <p style="color:#A0A0A0">  The tables below show the countries that have been scored to be the most vulnerable and with the least government response (No action needed, just run the cells to see the tables). 

**The 25 Countries with the Highest Vulnerability Scores (100 = highest or most vulnerable)** \
The code chunk below displays the data sorted by most vulnerable countries. To see how these scores are calculated, please see the [excel file](2023-GSI-Data-Full.xlsx). 

In [None]:
# RUN THIS CELL
# Sort DF by vulnerability (highest to lowest)
df.gsi_scores %>% arrange(desc(vulnerability)) %>% head(25)

#### The 25 Countries with the Lowest Government Response Scores (lower scores = less responsive)
The code chunk below displays the data sorted by level of government response. Lower scores indicate less action. To see how these scores are calculated, please see the [excel file](2023-GSI-Data-Full.xlsx). 

In [None]:
# RUN THIS CELL
# Sort DF by government response (highest to lowest)
df.gsi_scores %>% arrange(government_response) %>% head(25)

<!-- BEGIN QUESTION -->

**h) Given this information, please make your own recommendation on which country USAID should expand counter-trafficking activities in. Please justify your answer and cite any outside sources you use.** 

Recommended length: 200-250 words

_Type your answer here, replacing this text._

<!-- END QUESTION -->

***<p style="background-color: lightyellow;">Note: In the HT world, there are multiple ways to go about measuring prevalence, each with their own pros and cons. For the purposes of this assignment, we are using data from the GSI estimates of prevalence. Keep in mind that in the real world, you should be skeptical and critical of how the organization went about estimating HT prevalence. There will be more on different measurements and how to interpret them later in the semester.***
   

# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save Notebook`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.tion:**

In [None]:
ottr::export("pset2.ipynb")

After you hit "Run" on the cell above, click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.