Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support penalty.factor in the initial estimation step for must-keep variables #4

Closed
nanxstats opened this issue Oct 19, 2018 · 1 comment
Assignees

Comments

@nanxstats
Copy link
Owner

Under the context of clinical research, it sometimes makes sense to have some "must-keep" variables for Cox regressions. This is beyond the standard penalized linear models supported in msaenet and obviously not officially supported by glmnet or ncvreg.

As a workaround, we can add a new argument penalty.factor.init which will be assigned as the penalty.factor in the first estimation step (and first estimation step only). Users can assign a lower penalty factor for must-keep variables and a higher penalty factor for the other variables. The factor sizes will be subjective, though.

@nanxstats
Copy link
Owner Author

This has been supported in all relevant functions in the v3.0 release by introducing the new argument penalty.factor.init: 5c4e037. POC:

library("msaenet")

dat <- msaenet.sim.cox(
  n = 150, p = 500, rho = 0.6,
  coef = rep(1, 10), snr = 2, p.train = 0.7,
  seed = 1001
)
# without penalty.factor.init
fit1 <- aenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2),
  seed = 1003, verbose = TRUE
)

msaenet.nzv(fit1)
msaenet.tp(fit1, 1:10)
msaenet.fp(fit1, 1:10)
plot(fit1)


# [1]   1   2   3   4   5   6   7   8   9  10  18  44  47  48  51  69 113 114 147 187 191 196 207 208 209
# [26] 217 226 230 286 388 398 399 400 463 464
# with penalty.factor.init
fit2 <- aenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2),
  seed = 1003, verbose = TRUE,
  penalty.factor.init = c(rep(1, 10), rep(10, 490))
)

msaenet.nzv(fit2)
msaenet.tp(fit2, 1:10)
msaenet.fp(fit2, 1:10)
plot(fit2)

# [1]  1  2  3  4  5  6  7  8  9 10
# without penalty.factor.init
fit3 <- msaenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2), nsteps = 10L,
  seed = 1003, verbose = TRUE
)

msaenet.nzv(fit3)
msaenet.tp(fit3, 1:10)
msaenet.fp(fit3, 1:10)
plot(fit3)

# [1]   2   3   4   5   7   9  10  44  47  69 113 207 208 399 464
# with penalty.factor.init
fit4 <- msaenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2), nsteps = 10L,
  seed = 1003, verbose = TRUE,
  penalty.factor.init = c(rep(1, 10), rep(10, 490))
)

msaenet.nzv(fit4)
msaenet.tp(fit4, 1:10)
msaenet.fp(fit4, 1:10)
plot(fit4)

# [1]  2  3  4  5  7  9 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant