Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assessing treatment heterogeneity in instrumental_forest #1244

Open
minnnjecho opened this issue Dec 2, 2022 · 1 comment
Open

Assessing treatment heterogeneity in instrumental_forest #1244

minnnjecho opened this issue Dec 2, 2022 · 1 comment
Labels

Comments

@minnnjecho
Copy link

minnnjecho commented Dec 2, 2022

Hi, everyone. Thank you for posting this fancy package.
I want to test whether a heterogeneous treatment effect exists in my instrumental forest model.
For your information, the sample size is around 2,300 and the number of covariates is around 40 in my data.

There are 2 major challenges that I am facing:
1) test_calibration function does not support instrumental_forest.
While the function supports some forest models, it is impossible to find any treatment heterogeneity in my instrumental forest model using the function. Is there any technical difficulty in supporting instrumental_forest in the test_calibration function? I wonder if there would be any plans for updates as best_linear_projection started to support instrumental_forest recently.

2) Rank average treatment effect is unstable.
Since I can not use the test_calibration function, I have tried to use the rank_average_treatment function. However, I found that the p-values vary significantly based on the parameters I am using.
For example, if I change tune.parameters from ‘all’ to c('sample.fraction', 'mtry', 'min.node.size', 'alpha', 'imbalance.penalty'), the p-value increases from 0.06 to 0.97, or decreases from 0.59 to 0.20 depending on different Ys. Moreover, the p-value also varies a lot if I change the seed of the instrumental forest model(e.g., seed=123 to seed=119, and so on). The following is the code that I'm using:

set.seed(123, kind = "Mersenne-Twister", normal.kind = "Inversion", sample.kind = "Rejection") 
cf.priority <- instrumental_forest( X[train, ], Y[train], W[train], Z[train], 
num.trees = 50000, 
#tune.parameters = 'all', 
tune.parameters = c('sample.fraction', 'mtry', 'min.node.size', 'alpha', 'imbalance.penalty'), 
tune.num.trees = 4000, tune.num.reps = 250, tune.num.draws = 4500)
set.seed(123, kind = "Mersenne-Twister", normal.kind = "Inversion", sample.kind = "Rejection")

# Estimate AUTOC on held-out data.
cf.eval <- instrumental_forest( X[-train, ], Y[-train], W[-train], Z[-train], 
num.trees = 50000, 
#tune.parameters = 'all', 
tune.parameters = c('sample.fraction', 'mtry', 'min.node.size', 'alpha', 'imbalance.penalty'),
tune.num.trees = 4000, tune.num.reps = 250, tune.num.draws = 4500)

Can we say that if a certain hyperparameter set yields a low p-value, then the tuning is valid? I wonder if there is any rule of thumb in tuning these hyperparameters.
Thank you for your time and all the work!

Best,
Minje

@erikcs
Copy link
Member

erikcs commented Dec 3, 2022

Hi @minnnjecho,

  1. The TOC/RATE can "subsume" the test_calibration exercise, so no plan for future IV support.
  2. A significant held-out RATE suggests the tuned forest was able to detect some HTEs. As you've seen getting to that point may require some back-and-forth in modeling. There's no rule of thumb except highlighting that a) tuning to find signal is hard, b) there are many sources of randomness that can affect the final result (train/test, tuning grid draws, etc). For a) reducing the number of parameters to search over may help, as you've done. In b), for a fixed set of hyperparameters different forest seeds should give very similar results, however passing different seeds when tuning may naturally produce different results as the resulting "optimal" forest may be different (there's randomness in tuning, i.e. initial parameters are drawn randomly, tune.num.reps increases the grid of draws and could perhaps make it more "stable").

@erikcs erikcs added the question label Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants