-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with binary outcomes using SuperLearner #20
Comments
Hi philipclare,
The issue is:
You can check that SL.ranger has Coef 0 and Risk NA at all of the L nodes, but nonNA Risk and usually positive Coef at the A and Y nodes (where the regressand is binary):
We could consider adding functionality to ltmle to allow the user to specify a separate SL.library for each regression or one for the initial Q regression and another for all subsequent Q regressions. Or we could try to detect which wrappers fail with non-binary regressands, but that might be tricky. Or we could just try to improve the messages (also tricky because these are mostly coming from SuperLearner) and/or documentation. Josh |
Hi Josh, Just a quick note that if the outcome variable is non-binary, e.g. a continuous range in [0, 1], then SuperLearner should be used with family = gaussian(). This will be necessary for almost any machine learning algorithm to work correctly. Thanks, |
Thanks Chris. |
Hi Josh, I was hoping to use the ltmle package with highly adaptive lasso, but I have this same issue that HAL doesn't support family="quasibinomial" and HAL is dropped from the SL library because the initial outcome variable is binary but the subsequent Q regressions are for a [0,1] outcome. I ended up hard-coding the ltmle estimator using HAL with family="binomial" for the initial Q regression and HAL with family="gaussian" for the subsequent Q regressions. Do you think that is a reasonable method for dealing with this issue? I didn't see documentation for adding a SL.family=NULL option as you described above (I am sorry if I missed it), but is there a way to do what you had described above with the ltmle package? Thanks, |
Hi Lauren,
If you try it, let me know how it works. This would be the same idea as the SL.family=NULL option. If it works well, I could add it to the next release. |
Hi Josh,
Thank you for your advice! I tried the code you suggested and didn't get it
to work for HAL, but maybe it would work with other learners. I appreciate
your time and look forward to the next release!
…-Lauren
On Mon, Nov 30, 2020 at 2:40 PM Joshua Schwab ***@***.***> wrote:
Hi Lauren,
Sorry, I never followed up on implementing the SL.family option. Yes, I
think your approach sounds reasonable. I don't know if HAL ever returns
predictions less than 0 or greater than 1 if all Y are continuous in [0,
1]. If it could, you might need to do something to bound the predictions.
You could also write your own HAL wrapper than automatically selects the
family, something like:
n <- 100
rexpit <- function(x) rbinom(n=length(x), size=1, prob=plogis(x))
W <- rnorm(n)
A1 <- rexpit(W)
L <- 0.3 * W + 0.2 * A1 + rnorm(n)
A2 <- rexpit(W + A1 + L)
Y <- rexpit(W - 0.6 * A1 + L - 0.8 * A2)
data <- data.frame(W, A1, L, A2, Y)
SL.hal9001.flexible <- function (Y, X, newX = NULL, max_degree = 3,
fit_type = c("glmnet", "lassi"),
n_folds = 10, use_min = TRUE,
family, #not actually used but needed for compatability with SuperLearner
obsWeights = rep(1, length(Y)), ...) {
if (all(Y %in% c(0, 1, NA))) {
family <- stats::binomial()
} else {
family <- stats::gaussian()
}
print(family) #temp
SL.hal9001(Y, X, newX, max_degree, fit_type, n_folds, use_min, family, obsWeights, ...)
#may need to bound predictions?
}
ltmle(data, Anodes=c("A1", "A2"), Lnodes="L", Ynodes="Y", abar=c(0, 0), SL.library = "SL.hal9001.flexible", estimate.time = F)
If you try it, let me know how it works. This would be the same idea as
the SL.family=NULL option. If it works well, I could add it to the next
release.
Josh
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AECY7FPLVZTD6CUICI2475LSSQNNBANCNFSM4FWX3LOQ>
.
|
I just pushed a fix to the SLfamily branch (I didn't end up making a SL.family option - it just happens internally). Will you try it and see if it works for you? |
Hi Josh,
I installed the SLfamily branch and confirmed that ltmle works for ranger.
Hal is throwing a different error, but I suspect because ltmle is
functional for ranger that this is separate from the family issue. Thank
you for working on this issue and happy holidays!
-Lauren
…On Thu, Dec 10, 2020 at 4:11 PM Joshua Schwab ***@***.***> wrote:
I just pushed a fix to the SLfamily branch (I didn't end up making a
SL.family option - it just happens internally). Will you try it and see if
it works for you?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AECY7FM22JNR5YR5X6YHGPTSUFPTDANCNFSM4FWX3LOQ>
.
|
Hi,
I'm having an issue with binary outcomes. From the source, and discussions in other places, it appears that in order to handle continuous outcomes, the program transforms them into the range 0<x<1 and then uses pseudobinomial analysis.
However, it appears that is causing issues with some SuperLearner wrappers, even when the variables included are true binomial variables. For example, when using the 'SL.ranger' wrapper to classify using random forests, an error is returned. I may be misunderstanding something, but I think maybe this could be fixed by using a quasibinomial family for transformed continuous variables, but binomial for binary variables?
A replicable (hopefully) example:
Thanks!
The text was updated successfully, but these errors were encountered: