New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TMLE & Machine Learning #109
Comments
Getting into the semiparametric theory behind the estimator, some machine learning estimators are "smooth-enough" (i.e. Donsker class) to work with TMLE. As a result, Rather, it will keep it but I would like to write a check to see if the user is using an estimator that is not Donsker class (i.e. random forest). This would ideally trigger a warning about the confidence intervals. I also should update the docs to thoroughly explain this concept and when to use the cross-fitting procedure for TMLE instead As a future note; nuisance function estimators like LASSO and GAM are Donsker class and should provide appropriate coverage with TMLE. However, I would still push users to use the crossfit estimators over TMLE with machine learning (once implemented and available) |
I am thinking something like
However, I need to decide whether I check whether the input is in a set of Donsker estimators or is in a set of non-Donsker. I am leaning towards checking if the input is in a set of Donsker instead. This is a little more careful and will handle estimators with unknown Donsker class properties, until I can run some heurstic simulations. I am okay directing the use of crossfit estimators mistakenly for potentially Donsker class estimators since the crossfit estimators are valid for both Donsker and non-Donsker classes. I would rather encourage more care in the use of machine learning in these causal inference estimators than what is currently implemented elsewhere. This is a lot more work for me, but I think it fits with all the semiparametric theory wrt what works and what doesn't. Also I am not enforcing a strict rule (like the original plan for 0.9.0 was), users can still turn off the warning and ignore my recommendations and use random forests in regular TMLE. |
Thank you for this amazing package! I had a few questions regarding this issue, and was wondering if you could help:
|
Sure thing! There is a lot I don't fully understand yet, but below is what I currently know.
Based on the points in 3, I think a Donsker check for some estimators (that may be reasonably believed to be Donsker) is worthwhile. I would love to hear your thoughts though. Another note, I have not seen it discussed how a cross-fit estimator would work with censored (missing outcome) observations, so As a final note, the cross-fit procedure technically does not allow for any ML algorithm. The ML algorithm must still meet some convergence criteria (but that criteria is weaker than non-cross-fit). For extremely slow converging things (like random forests), I don't think there is a way to obtain valid confidence intervals yet (but I may be wrong). |
@emaadmanzoor saw you asked van der Laan about this point on Twitter and he wrote a blog post about it https://vanderlaan-lab.org/2019/12/24/cv-tmle-and-double-machine-learning/ |
Thank you for the detailed response! Yes I just saw his response today but will take a while to parse it completely. I’m hoping to write some simulations/benchmarks over the holidays comparing DML/TMLE under different scenarios so we can understand this work better. I’ll post back here with my findings. These methods are crucial in my work on causal inference with text, and implementations such as yours really pave the way for practitioners. Exciting times! |
TMLE is not guaranteed to attain nominal coverage when used with machine learning. A simulation paper showing major problems is: https://arxiv.org/abs/1711.07137
As a result, I don't feel like
TMLE
can continue to be supported with machine learning, especially since it implies the confidence intervals are way too narrow (sometimes resulting in 0% coverage). I know this is a divergence from R's tmleverse, but I would rather enforce the best practice/standards than allow incorrect use of methodsDue to this issue, I will be dropping support for
TMLE
with machine learning. In place of this, I plan on addingCrossfitTMLE
which will support machine learning approaches. The crossfitting will result in valid confidence intervals / inference.Tentative plan:
In v0.8.0,
TMLE
will throw a warning when using thecustom_model
argument.Once the Crossfit-AIPW and Crossfit-TMLE are available (v0.9.0),
TMLE
will lose that functionality. If users want to useTMLE
with machine learning, they will need to use a prior versionThe text was updated successfully, but these errors were encountered: