-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression based approach and point-identifying LATE #221
Comments
Ok, I get the example now. The regression approach tries to satisfy the first order conditions for a linear regression: In contrast, the moment approach tries to fit only 4 moments. Is this a deficiency of the regression approach? args <- list(data = AE,
outcome = 'worked',
target = 'late',
late.from = list(samesex = 0),
late.to = list(samesex = 1),
propensity = morekids ~ samesex,
m0 = ~ u,
m1 = ~ u,
solver = 'gurobi')
print(do.call(ivmte, args)) produces Point estimate of the target parameter: -0.08484221 Notice that there's nothing special about linear here, it's really just the number of parameters: args$m0 <- ~ I(u^2)
args$m1 <- args$m0
print(do.call(ivmte, args)) also produces Point estimate of the target parameter: -0.08484221 As for the discrepancy, I still get |
Hm, I cannot replicate either of your bounds.
When I matched those two things to the server on my machine, I was able to perfectly replicate the bounds from the server. So I've switched back to the generic BLAS/LAPACK libraries: Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 And I'm limiting myself to 1 thread via the option |
With Bounds on the target parameter: [-0.2466971, 0.07695303]
Audit terminated successfully after 1 round Does Gurobi say anything about threads and reproducibility in the manual or forums? |
Thanks for the detailed explanation, @a-torgovitsky. That makes complete sense. If I restrict to one thread, I get results that differ quite a bit from both of you:
|
Here, Gurobi says that it is deterministic if all inputs are the same. Here, Gurobi says the solutions can depend on hardware, which appears to be what we're seeing. Here, Gurobi says having more cores will speed up the barrier algorithm.
Ah, you matched on the Cholesky decomposition, which was related to the BLAS/LAPACK packages. To do another check on this, I ran @johnnybonney's example on Acropolis. Below is a figure on how the bounds depend on the number of threads. |
I'm working on a 2020 MacBook Pro. |
Hi @johnnybonney I want to revise my answer to your previous question. I think the reason you don't get a point for LATE is because The regression that library("ivmte")
library("tidyverse")
df <- AE
df %>% select(worked, morekids, samesex) -> df
args <- list(data = df,
outcome = 'worked',
target = 'late',
late.from = list(samesex = 0),
late.to = list(samesex = 1),
propensity = morekids ~ samesex,
m0 = ~u + I(u^2),
m1 = ~u + I(u^2),
solver = 'gurobi')
# estimate using regression-based approach
late_reg <- do.call(ivmte, args)
print(late_reg)
B <- late_reg$X
df$p <- late_reg$propensity$phat
# Manually reproduce B
Bm <- matrix(data = NA, nrow = nrow(B), ncol = ncol(B))
Bm[,1] <- 1 - df$morekids
Bm[,2] <- (1/2)*(1 + df$p)*(1 - df$morekids)
Bm[,3] <- (1/3)*(1 - df$p^3)/(1 - df$p) * (1 - df$morekids)
Bm[,4] <- df$morekids
Bm[,5] <- df$morekids * df$p/2
Bm[,6] <- df$morekids * (df$p^2)/3
diff <- abs(B - Bm)
print(paste("Maximum difference is", max(diff), "."))
dfB <- as.data.frame(B)
dfB <- cbind(df$worked, dfB)
names(dfB) <- c("worked", "B1", "B2", "B3", "B4", "B5", "B6")
r <- lm(data = dfB, worked ~ 0 + B1 + B2 + B3 + B4 + B5 + B6)
fit <- predict(r)
print(unique(fit))
df %>%
group_by(morekids, samesex) %>%
summarize(ey = mean(worked)) %>% print produces Bounds on the target parameter: [-0.2758405, 0.1061561]
Audit terminated successfully after 1 round
[1] "Maximum difference is 1.11022302462516e-16 ."
[1] 0.5807656 0.5837607 0.4418227 0.4376162
`summarise()` has grouped output by 'morekids'. You can override using the `.groups` argument.
# A tibble: 4 x 3
# Groups: morekids [2]
morekids samesex ey
<int> <int> <dbl>
1 0 0 0.581
2 0 1 0.584
3 1 0 0.438
4 1 1 0.442 It's a saturated regression (6 parameters, 4 values of regressors) so it exactly reproduces An unfortunate thing is that even setting args$criterion.tol <- 1e-15
print(do.call(ivmte, args)) yields Bounds on the target parameter: [-0.1764386, 0.006084727]
Audit terminated successfully after 1 round @jkcshea can you remind me:
|
@jkcshea the discrepancy might be worth opening on issue on the Gurobi forums for. |
Hm, that's a powerful machine.
It enters multiplicatively.
Are you referring to this LP approach for the direct regression?
Sure, I'll do that. Also, @johnnybonney, could you upload your models when you set |
Ok, perfect, thanks.
So there's an advantage of the |
@jkcshea here are my models with The minimum criterion I get is 0.2442724. @a-torgovitsky thanks for the updated explanation (and walking through the manual example - I appreciated that). Here's my anecdotal experience so far, in terms of LP vs. QP. (To be taken with a grain of salt, since I haven't run too much using the
I would think that instability is less of a problem when we can match all the moments perfectly (though you both probably know much better than I whether this is true). So, given the above discussion, my gut instinct (as a practitioner) would be to use LP if I can match the moments perfectly, and in all other cases use QP... |
Hi @johnnybonney , I think your instinct is right. We probably don't want to mix and match the approaches in the paper though. Maybe try results each way and based on that we can decide which to go with? I assume this is as simple as just looping over |
After looking at @johnnybonney's models, I don't think the large discrepancy between @johnnybonney's bounds and ours (@a-torgovitsky's and mine) is because Gurobi is very sensitive to hardware. When minimizing the criterion using the model generated by my machine, I get Recall also that those numbers above have been normalized: they have been divided by the sample size. |
What do you mean by this?
Also, this?
The magnitude of the constraint here is like 1e-1, no? |
The SSR can be written as This is our objective function when minimizing the criterion.
Ah sorry, the magnitude of the constraint is indeed My concern was that changing the RHS of the quadratic constraint by ~1e-5 had a big effect on the bounds. |
Got it on the SSR, that makes sense. One way to get a sense of how sensitive the bounds are to the quadratic constraint would be to look at the dual value for that constraint. Looks like this is how you get it in Gurobi, although not sure if you can query it from the R interface: But I'm not following what you're suggesting generally here...? |
Posting a new issue, lest we start talking about too many different things in the same thread
My original question:
Using the moment-based approach, we can point-identify the LATE in simple cases (i.e., no covariates, binary instrument). This is regardless of the MTR specifications. The regression-based approach does not produce such point identification. Is this expected behavior?
Here is my example:
produces
and
produces
Something else to look into:
At #220 (comment), @a-torgovitsky is unable to reproduce my initial bounds, and gets
[-0.2758405, 0.1061561]
.The text was updated successfully, but these errors were encountered: