# Causal Inference Homework 3
## Alex Pine (akp258@nyu.edu)
## February 26th

### 1a 
Perfect randomization, so ATE $= E[Y^1] - E[Y^0] = 1 - 0.5 = 0.5$

### 1b

The asymptotic value of $E[Y^1]$ and $E[Y^0]$ do not change are not dependent on the exact size of the population. If 10% of students from both the control and treatment groups leave the program, we can treat them as if there were never in the program to begin with.  Since our estimation of ATE is based on the expected values of $Y^0$ and $Y^1$ of the population, 10% random attrition has no effect.  


### 1c

If students leave the program when their potential scores are greater than 1.6, then that puts an upper bound on $Y^0$ and $Y^1$. Since $Y^0$ was already bounded from above by 1, only $Y^2$ is affected. Now, $Y^1 \sim ([0,1.6])$.

ATE $= E[Y^1] - E[Y^0] = 0.8 - 0.5 = 0.3 $


### 1d

If the top 20% leave the program, then $Y^1 \sim U([0, 1.6])$ and $Y^0 \sim U([0, 0.8])$.

ATE $= E[Y1] - E[Y0] = \frac{1.6}{2} - \frac{0.8}{2} = 0.8 - 0.4 = 0.4$.



# Problem 2

Reducing the data down to only those who were in the same class size for all four years:

In [56]:
data = read.csv("/Users/pinesol/causal/hw3/krueger.csv")
# Take out the nas. Leaves only 3085 rows out of 11598
data = data[complete.cases(subset(data, select=c('cltypek', 'cltype1', 'cltype2', 'cltype3'))),]
# Make sure the four class columns all have the same value in each row
data = data[data$cltypek == data$cltype1 | data$cltypek == data$cltype2 | data$cltypek == data$cltype3,]

# Remove NAs from scores
scores = data[c('tmathssk', 'tmathss1', 'tmathss2', 'tmathss3', 'treadssk', 'treadss1', 'treadss2', 'treadss3')]
data[c('tmathssk', 'tmathss1', 'tmathss2', 'tmathss3', 'treadssk', 'treadss1', 'treadss2', 'treadss3')][is.na(scores)] <- 0

Computing the percentile scores for the students who were in in 'regular' or 'regular + aide' sized classes:

In [68]:
# ... for each grade, compute the percentile(ranging from 0 to 100) using only students 
# in regular and regular with aide class types.
percentile <- function(scores) {
    return(100 * rank(scores) / length(scores))
}

reg_rows = data[data['cltypek']  == 'regular class' | data['cltypek']  == 'regular + aide class',]

reg_math_k = percentile(reg_rows[['tmathssk']])
reg_read_k = percentile(reg_rows[['treadssk']])
reg_avg_k = (reg_math_k + reg_read_k) / 2
reg_math_1 = percentile(reg_rows[['tmathss1']])
reg_read_1 = percentile(reg_rows[['treadss1']])
reg_avg_1 = (reg_math_1 + reg_read_1) / 2
reg_math_2 = percentile(reg_rows[['tmathss2']])
reg_read_2 = percentile(reg_rows[['treadss2']])
reg_avg_2 = (reg_math_2 + reg_read_2) / 2
reg_math_3 = percentile(reg_rows[['tmathss3']])
reg_read_3 = percentile(reg_rows[['treadss3']])
reg_avg_3 = (reg_math_3 + reg_read_3) / 2

Computing the percentile score for students in small classes using the percentile distribution for the students in large classes:

In [None]:
# "Compute percentile score for students in small classes using the percentiles just computed".

small_percentiles <- function(small_scores, reg_scores) {
    small_percs = vector("list", length(small_scores))
    for (i in 1:length(small_scores)) {
        perc = 100*length(reg_scores[reg_scores <= small_scores[[i]]]) / length(reg_scores)
        small_percs[[i]] = perc
    }
    return(unlist(small_percs))
}

small_rows = data[data['cltypek']  == 'small class',]

small_math_k = small_percentiles(small_rows[['tmathssk']], reg_rows[['tmathssk']])
small_read_k = small_percentiles(small_rows[['treadssk']], reg_rows[['treadssk']])
small_avg_k = (small_math_k + small_read_k) / 2
small_math_1 = small_percentiles(small_rows[['tmathss1']], reg_rows[['tmathss1']])
small_read_1 = small_percentiles(small_rows[['treadss1']], reg_rows[['treadss1']])
small_avg_1 = (small_math_1 + small_read_1) / 2
small_math_2 = small_percentiles(small_rows[['tmathss2']], reg_rows[['tmathss2']])
small_read_2 = small_percentiles(small_rows[['treadss2']], reg_rows[['treadss2']])
small_avg_2 = (small_math_2 + small_read_2) / 2
small_math_3 = small_percentiles(small_rows[['tmathss3']], reg_rows[['tmathss3']])
small_read_3 = small_percentiles(small_rows[['treadss3']], reg_rows[['treadss3']])
small_avg_3 = (small_math_3 + small_read_3) / 2

Wrote a function to merge the percentile scores regular and small class sizes together so their order matches the order from the original data frame. I'm sure R has a way to do this, but I could not figure it out. 

In [94]:
merge_by_index <- function(a, b, a_index, b_index) {
    merged = c()
    i = 1
    j = 1
    while(i <= length(a_index) && j <= length(b_index)) {
        if (a_index[i] < b_index[j]) {

            merged = c(merged, a[i])
            i = i + 1
        } else {
            merged = c(merged, b[j])
            j = j + 1
        }
    }
    if (i <= length(a_index)) {
        merged = c(merged, a[i:length(a)])
    }
    if (j <= length(b_index)) {
        merged = c(merged, b[j:length(b)])
    } 
    return(merged)
}

merge_scores <- function(reg_scores, small_scores) {
    reg_index = as.numeric(row.names(reg_rows))
    small_index = as.numeric(row.names(small_rows))
    return(merge_by_index(reg_scores, small_scores, reg_index, small_index))
}

In [97]:
merged_k_scores = merge_scores(reg_avg_k, small_avg_k)
merged_1_scores = merge_scores(reg_avg_1, small_avg_1)
merged_2_scores = merge_scores(reg_avg_2, small_avg_2)
merged_3_scores = merge_scores(reg_avg_3, small_avg_3)

Making the class size column into a dummy variable:

In [101]:
# Combining 'regular class' and 'regular + aide' class into one value for regression
combined_data = data
combined_data[combined_data == 'regular + aide class']  <- 'regular class'
size_factor = factor(combined_data[['cltypek']])

In [110]:
lm(merged_k_scores ~ size_factor)
lm(merged_1_scores ~ size_factor)
# TODO other grades


Call:
lm(formula = merged_k_scores ~ size_factor)

Coefficients:
           (Intercept)  size_factorsmall class  
                50.045                   7.413  



Call:
lm(formula = merged_1_scores ~ size_factor)

Coefficients:
           (Intercept)  size_factorsmall class  
                50.045                   9.282  


In [108]:
# TODO add fixed effects

#schidkn, schid1n

Unnamed: 0,newid,ssex,srace,sbirthq,sbirthy,stark,star1,star2,star3,cltypek,⋯,clad3,totexp3,sysidkn,sysid1n,sysid2n,sysid3n,schidkn,schid1n,schid2n,schid3n
2,1137,female,white,"1st qtr - jan,feb,march",1980,yes,yes,yes,yes,small class,⋯,apprentice,1,30,30,30,30,63,63,63,63
3,1143,female,black,"4th qtr - oct,nov,dec",1979,yes,yes,yes,yes,small class,⋯,ladder level 1,4,11,11,11,11,20,20,20,20
11,1277,male,white,"2nd qtr - april,may,june",1980,yes,yes,yes,yes,regular class,⋯,ladder level 1,7,35,35,35,35,69,69,69,69
12,1292,male,white,"2nd qtr - april,may,june",1980,yes,yes,yes,yes,small class,⋯,ladder level 1,14,41,41,41,41,79,79,79,79
13,1308,male,white,"2nd qtr - april,may,june",1980,yes,yes,yes,yes,regular class,⋯,ladder level 1,8,4,4,4,4,5,5,5,5
21,1441,female,white,"1st qtr - jan,feb,march",1980,yes,yes,yes,yes,regular class,⋯,ladder level 1,13,24,24,41,41,56,56,78,78
23,1465,female,black,"3rd qtr - july,aug,sept",1980,yes,yes,yes,yes,small class,⋯,ladder level 1,14,8,8,8,8,11,11,11,11
25,1499,female,white,"3rd qtr - july,aug,sept",1980,yes,yes,yes,yes,small class,⋯,chose no to be on career ladder,24,32,32,32,32,66,66,66,66
28,1552,male,white,"1st qtr - jan,feb,march",1980,yes,yes,yes,yes,regular class,⋯,ladder level 1,3,14,14,14,14,38,38,38,38
29,1561,male,white,"3rd qtr - july,aug,sept",1980,yes,yes,yes,yes,regular + aide class,⋯,level 3,12,35,35,35,35,69,69,69,69
