# Assignment 5: Does Legalized Prostitution Increase Human Trafficking?

### Substantive Objectives
In this assignment, we will first review regression mechanics. Then we will be replicating a [study](https://www.sciencedirect.com/science/article/pii/S0305750X12001453)
conducted by Cho, Dreher, and Neumayer in 2012 that investigates the relationship between legalized prostitution and human trafficking. The goal is for you to be able to run a simplified version of their regression and have your own critical interpretation of the output. 

### Coding Objectives
1) Running regressions using `lm()`

## Setup
The code chunk below loads the packages that we need. 

In [None]:
# You *must* run this cell first. Do not change the contents of this cell.
library(testthat)
library(ottr)
library(kableExtra)%>% suppressMessages()
library(IRdisplay)
library(tidyverse) %>% suppressMessages()

The code chunk below loads the dataset we will be using. 

In [None]:
cho_df <- read.csv("cho_rep.csv")

# Question 1: Regression Mechanics

For the questions below, place the correct letter in the corresponding letter. 

### <p style="color:#5F7BA4;">  Independent and Dependent Variables
**i) A researcher runs on a study on the relationship between temperature and ice cream melting, hypothesizing that warmer temperature leads to a faster rate of melting. What is the independent variable?**

> A. Temperature\
B. Rate ice cream melts

**ii) A researcher runs a study on gender and human trafficking vulnerability, hypothesizing that females will have higher rates of trafficking reports. What is the dependent variable?**

> A. Sex\
B. Rate of trafficking reports

**iii) What are common names to refer to the independent variable?**
    
> A. explanatory variable\
B. x\
C. y\
D. regressors\
E. a, c, d\
F. a, b, d
    
**iv) What are common names to refer to the dependent variables?**

> A. Outcome variable\
B. x\
C. y\
D. a, c\
E. a, b
    

In [None]:
# YOUR SOLUTION GOES HERE
# Please write the answer in capital letters (E.g. i <- "A")
i <- NULL # YOUR CODE HERE
ii <- NULL # YOUR CODE HERE
iii <- NULL # YOUR CODE HERE
iv <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q1a.R")

<!-- BEGIN QUESTION -->

### <p style="color:#5F7BA4;">  Estimates: $\alpha$, $\beta_1$

**b) The following is an equation for investigating the association between gender and reports of human trafficking. Interpret the following equation in words. In your answer, make sure to explicitly explain the what $\alpha$ and $\beta_1$ represent.**
    
   $$ReportedTrafficking_i = \alpha + \beta_1 Female_i + \varepsilon_i$$

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**c) We run the regression on trafficking reports recorded in the CTDC. We find $\alpha = 0.26$ and $\beta_1 = 0.48$. What is the association between being female and reports of human trafficking?**   

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Question 2: Understanding the Study

<!-- BEGIN QUESTION -->

Now we have our basic ingredients for understanding the methods the authors use in this study. Let's apply the regression concepts in understanding a prominent question in the field, does legalizing prostitution increase human trafficking?

**a) The authors develop their theory from two effects of legalizing prostitution on human trafficking: 1) the scale effect and 2) the substitution effect. What are these two effects?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**b) For the questions below, put the corresponding letter of the correct answer in the respective roman numeral question.**

i) What is the independent variable?

> A) human trafficking\
B) legalized prostitution
    
ii) What is the dependent variable?

> A) human trafficking\
B) legalized prostitution
   

In [None]:
# YOUR ANSWER GOES HERE  (e.g. i <- "A" or i <- "B")
i <- NULL # YOUR CODE HERE
ii <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q2b.R")

<!-- BEGIN QUESTION -->

**c) How do the authors measure levels of human trafficking? What do they cite as the limitations of their measurement? How this variable should be interpreted?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Table of Variables
The table below is a table of the variables included in the author's regression model and a brief description. 

In [None]:
# NO ACTION NEEDED
# read in codebook
cho_codes <- read.csv("cho_codes.csv")

# diplay
display_html(
    as.character(cho_codes %>%  kable(format = "html"))
)

**d) Look at all of the control variables in the table above. Which variables do you think are included for their explanatory power? Would the ommission of these variables in a regression analysis be concerning to you?**

*To be clear, these variables should not be confounding. They are included for their explanatory power.*

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**e) Look at all of the control variables in the table below. Which variables do you think are included because they are confounders? Would the ommission of these variables in a regression analysis be concerning to you?**

*Reminder: confounders are correlated with both X and Y.*

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Bonus) Discuss one variable that the researchers should have accounted for but they did not.**

*Remember, this variable would need to be correlated with both the independent and dependent variables*

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**f) What do the authors conclude?**

A. Countries with legalized prostitution is associated with higher levels of human trafficking inflows. \
B.  Legalizing prostitution increases levels of human trafficking inflows. \
C. Countries with legalized prostitution is associated with lower levels of human trafficking inflows. \
D. Legalizing prostitution decreases levels of human trafficking inflows. 

In [None]:
d <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q2f.R")

## Question 3 Replicating the Regression

The dataset used by Cho covers up to 150 countries using data on human trafficking from 1996-2003. The main variables are:
* `htflowsunodc`: measures human trafficking flows
* `prostitutionlaw`: indicates if prostitution is legal (0 = not legal, 1 = legal)

In [None]:
# NO ACTION NEEDED. 

# Subsetting to tenth imputation and filtering out the low income countries
cho_q3 <- cho_df %>% filter(X_mi_m == 10, inc_low == 0) %>% 
# selecting relevant columns
    select(country, any_of(cho_codes$var_code))

# diplay first few rows
cho_q3 %>% head()

a) Their equation is as follows:

$$y_i = \alpha + \beta_1 Prostitution + \beta_2'X_i + \beta_3 Region_i + s_i$$

To run the regression, use the `lm()` function and the following specifications...
* data: `cho_q3`
* IV:`prostitutionlaw`
* DV:`htflowsunodc`
* Controls: `prostitutionbrothel`, `ruleWB_m`, `pop_ln`, `gdp_pc_const_ppp_ln`,
          `democracy`, `stockmigrants1990_ln`, `catholic2`, `reg_east_asia`, 
          `reg_west_europe`, `reg_latam`, `reg_mideast`, `reg_sasia`, `reg_ssa`


Note: your numbers won't exactly match the numbers in the table, but it should be relatively similar. To replicate exactly their steps is a bit more complicated since the authors also use data imputations. For the purposes of this assignment, we are keeping it simple.  

In [None]:
# YOUR ANSWER HERE
mod1 <- NULL # YOUR CODE HERE

mod1 %>% summary()

In [None]:
. = ottr::check("tests/q3a.R")

<!-- BEGIN QUESTION -->

**b) Interpret the the coefficient (the value in the `estimate`column) for `prostitutionlaw`.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**c) What might be an alternative explanation for this observed association? Name at least one. How much do you trust the author's conclusions?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**d) How would you interpret the statement "correlation is not causation" in this context?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save Notebook`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.tion:**

In [None]:
ottr::export("pset5.ipynb")

After you hit "Run" on the cell above, click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.