# Assignment 5: Does Legalized Prostitution Increase Human Trafficking?

### Substantive Objectives
In this assignment, we will first review regression mechanics. Then we will be replicating a [study](https://www.sciencedirect.com/science/article/pii/S0305750X12001453)
conducted by Cho, Dreher, and Neumayer in 2012 that investigates the relationship between legalized prostitution and human trafficking. The goal is for you to be able to run the same regression as the authors and have your own critical interpretation of the output. 

### Coding Objectives
1) Running regressions using `lm()`

## Setup
The code chunk below loads the packages that we need. 

In [None]:
# You *must* run this cell first. Do not change the contents of this cell.
library(testthat)
library(ottr)
library(tidyverse) %>% suppressMessages()

The code chunk below loads the datasets that we will be using. These datasets are drawn from Walk Free Foundation's Global Slavery Index. This is one example of how an organization attempts to standardize global estimates of trafficking prevalence, vulnerability, and government responses.

In [None]:
cho_df <- read.csv("cho_rep.csv")

# Question 1: Regression Mechanics

For the questions below, place the correct letter in the corresponding letter. 

### <p style="color:#5F7BA4;">  Independent and Dependent Variables
**i) A researcher runs on a study on the relationship between temperature and ice cream melting. Hypothesizing that warmer temperature leads to a faster rate of melting. What is the independent variable?**

> A. Temperature\
B. Rate ice cream melts

**ii) A researcher runs a study on gender and human trafficking vulnerability, hypothesizing that females will have higher rates of trafficking reports. What is the dependent variable?**

> A. Sex\
B. Rate of trafficking reports

**iii) What are common names to refer to the independent variable?**
    
> A. explanatory variable\
B. x\
C. y\
D. regressors\
E. a, c, d\
F. a, b, d
    
**iv) What are common names to refer to the dependent variables?**

> A. Outcome variable\
B. x\
C. y\
D. a, c\
E. a, b
    

In [None]:
# YOUR SOLUTION GOES HERE
# Please write the answer in capital letters (E.g. i <- "A")
i <- NULL # YOUR CODE HERE
ii <- NULL # YOUR CODE HERE
iii <- NULL # YOUR CODE HERE
iv <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q1a.R")

### <p style="color:#5F7BA4;">  Control Variables

Control variables are added if (1) there is reason to believe they have explanatory power for the variation in the dependent variable or (2) they are a confounding variable that may bias our estimate. 

**b) Indicate T or F for the following statements.** 
    
> i) (T/F) For the ommission of a variable to bias a result, it must be correlated with **both** the independent and dependent variable.
    
> ii) (T/F) The researcher gathers data on whether each observation in their study contains the "anti-melting potion", and includes it as a control variable in the regression. **The anti-melting potion is not correlated with the temperature.** After adding this variable in as a control variable, the original estimated relationship between temperature and speed of melting will change.
    
> iii) (T/F) The researcher adds this control variable in for reason (1) above.
    
> iv) (T/F) Even when we have not controlled for all confounding variables, we can identify a causal effect of X on Y (e.g. a 1 degree increase in Farenheit leads to a 5 second increase in the speed one scoop of ice cream melts).
    
> v) (T/F) When we have not controlled for all confounding variables, we can still make associational claims (e.g. 1 degree increase in Farenheit is associated with a 5 second increase in the speed one scoop of ice cream melts).
    


In [None]:
# YOUR SOLUTION HERE (e.g. i <- "T")
# Make sure it is in uppercase, this is case sensitive
i <- NULL # YOUR CODE HERE
ii <- NULL # YOUR CODE HERE
iii <- NULL # YOUR CODE HERE
iv <- NULL # YOUR CODE HERE
v <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q1b.R")

<!-- BEGIN QUESTION -->

### <p style="color:#5F7BA4;">  Estimates: Alpha, Beta

**c) Interpret the following regression equation in words. In your answer, make sure to explicitly explain the what $\alpha$ and $\beta_1$ represent.**
    
   $$minMelt_i = \alpha + \beta_1 Temperature_i + \varepsilon_i$$

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### <p style="color:#5F7BA4;">  Statistical Significant: the P-Value

**d) The researcher runs the regression and they get that $\beta_1$ has a magnitude of 0.1 with a p-value of 0.01. Are the following statements T or F?**

> i. (T/F) The conventional standard for statistical significance is 0.05. 

> ii. (T/F) A precise way to interpret the p-value above is that there is a 1% chance that the observed association ($\beta_1$) is correct. 
    
> iii. (T/F) A precise way to interpret the p-value above is that there is a 1% chance that the observed association ($\beta_1$) was due to random chance. 
    
> iv. (T/F) Lower p-values indicate an increased likelihood that an observed association is actually due to random chance (i.e. is not statistically significant). 
    
> v. (T/F) The observed association above ($\beta_1$) is statisically significant at the conventional level of 0.05. 
   

In [None]:
# YOUR SOLUTION HERE (e.g. i <- "T" or i <- "F")

i <- NULL # YOUR CODE HERE
ii <- NULL # YOUR CODE HERE
iii <- NULL # YOUR CODE HERE
iv <- NULL # YOUR CODE HERE
v <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q1d.R")

<!-- BEGIN QUESTION -->

**e) Explain statistical significance and the p-value in your own words.**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## Question 2: Understanding the Study

<!-- BEGIN QUESTION -->

Now we have our basic ingredients for understanding the methods the authors use in this study. Let's apply the regression concepts in understanding a prominent question in the field, does legalizing prostitution increase human trafficking?

#### a) In the abstract, the authors mention two effects of legalizing prostitution on human trafficking: 1) the scale effect and 2) the substitution effect. What are these two effects?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

#### b) For the questions below, put the corresponding letter of the correct answer in the respective roman numeral question. 
i) What is the independent variable?

> A) human trafficking\
B) legalized prostitution
    
ii) What is the dependent variable?

> A) human trafficking\
B) legalized prostitution
   

In [None]:
# YOUR ANSWER GOES HERE  (e.g. i <- "A" or i <- "B")
i <- NULL # YOUR CODE HERE
ii <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q2b.R")

<!-- BEGIN QUESTION -->

#### c) How do the authors measure levels of human trafficking? What do they cite as the limitations of their measurement, and how this variable should be interpreted?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

#### d) Which control variables are included as the known predictors of human trafficking? Which control variables are included to reduce the possibilities of bias from confounding variables?

_Type your answer here, replacing this text._

In [None]:
. = ottr::check("tests/q1d.R")

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**d) What do the authors conclude?**

A. Countries with legalized prostitution experienced higher levels of human trafficking inflows. \
B.  Legalizing prostitution increases levels of human trafficking inflows. \
C. Countries with legalized prostitution experienced lower levels of human trafficking inflows. \
D. Legalizing prostitution decreases levels of human trafficking inflows. 

In [None]:
d <- NULL # YOUR CODE HERE

In [None]:
. = ottr::check("tests/q2d.R")

<!-- END QUESTION -->

## Question 3 Replicating the Regression

The dataset used by Cho covers 161 countries from 1996-2003. The main variables are:
* `htflowsunodc`: measures human trafficking flows
* `prostitutionlaw`: indicates if prostitution is legal (0 = not legal, 1 = legal)
* `prostitutionbrothel`: indicates if prostitution is legal (0 = not legal, 1 = legal)

In [None]:
# NO ACTION NEEDED. Subsetting to tenth imputation and filtering out the low income countries
cho_q3 <- cho_df %>% filter(X_mi_m == 10, inc_low == 0)

a) Their equation is as follows:

$$y_i = \alpha + \beta_1 Prostitution + \beta_2'X_i + \beta_3 Region_i + s_i$$

To run the regression, use the `lm()` function and the following specifications...
* data: `cho_q3`
* IV:`prostitutionlaw`
* DV:`htflowsunodc`
* Controls: `prostitutionbrothel`, `ruleWB_m`, `pop_ln`, `gdp_pc_const_ppp_ln`,
          `democracy`, `stockmigrants1990_ln`, `catholic2`, `reg_east_asia`, 
          `reg_west_europe`, `reg_latam`, `reg_mideast`, `reg_sasia`, `reg_ssa`, 


Note: your numbers won't exactly match the numbers in the table, but it should be relatively similar. To replicate exactly their steps is a bit more complicated since the authors also use data imputations. For the purposes of this assignment, we are keeping it simple.  

In [None]:
# YOUR ANSWER HERE
mod1 <- NULL # YOUR CODE HERE

mod1 %>% summary()

In [None]:
. = ottr::check("tests/q3a.R")

<!-- BEGIN QUESTION -->

**b) Interpret the the coefficient and p-value for `prostitutionlaw`**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**c) What might be an alternative explanation for this observed association? Name at least one. How much do you trust the author's conclusions?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**d) How would you interpret the statement "correlation is not causation" in this context?**

_Type your answer here, replacing this text._

<!-- END QUESTION -->

# Submitting Your Notebook (please read carefully!)

To submit your notebook...

### 1. Click `File` $\rightarrow$ `Save Notebook`.

### 2. Wait 5 seconds.

### 3. Select the cell below and hit run.tion:**

In [None]:
ottr::export("pset5.ipynb")

After you hit "Run" on the cell above, click the download link. A .zip file should download to your computer.

(If you make changes to your notebook, you'll need to hit save and then run the cell above again before you submit to get a new version of it.)

### 4. Submit the .zip file you just downloaded <a href="https://www.gradescope.com/" target="_blank">on Gradescope here</a>.

Notes:

- **This does not seem to work on Chrome for iPad or iPhone.** If you're using an iPad or iPhone, you need to download the file using **Safari**.
- If your web browser automatically unzips the .zip file (so you see a folder instead of a .zip file), you can just upload the .ipynb file that is inside the folder.
- If this method is not working for you, try this: hit `File`, then `Download as`, then `Notebook (.ipynb)` and submit that.