Support penalty.factor in the initial estimation step for must-keep variables #4

nanxstats · 2018-10-19T16:50:29Z

Under the context of clinical research, it sometimes makes sense to have some "must-keep" variables for Cox regressions. This is beyond the standard penalized linear models supported in msaenet and obviously not officially supported by glmnet or ncvreg.

As a workaround, we can add a new argument penalty.factor.init which will be assigned as the penalty.factor in the first estimation step (and first estimation step only). Users can assign a lower penalty factor for must-keep variables and a higher penalty factor for the other variables. The factor sizes will be subjective, though.

The text was updated successfully, but these errors were encountered:

nanxstats · 2018-12-14T16:25:56Z

This has been supported in all relevant functions in the v3.0 release by introducing the new argument penalty.factor.init: 5c4e037. POC:

library("msaenet")

dat <- msaenet.sim.cox(
  n = 150, p = 500, rho = 0.6,
  coef = rep(1, 10), snr = 2, p.train = 0.7,
  seed = 1001
)

# without penalty.factor.init
fit1 <- aenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2),
  seed = 1003, verbose = TRUE
)

msaenet.nzv(fit1)
msaenet.tp(fit1, 1:10)
msaenet.fp(fit1, 1:10)
plot(fit1)


# [1]   1   2   3   4   5   6   7   8   9  10  18  44  47  48  51  69 113 114 147 187 191 196 207 208 209
# [26] 217 226 230 286 388 398 399 400 463 464

# with penalty.factor.init
fit2 <- aenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2),
  seed = 1003, verbose = TRUE,
  penalty.factor.init = c(rep(1, 10), rep(10, 490))
)

msaenet.nzv(fit2)
msaenet.tp(fit2, 1:10)
msaenet.fp(fit2, 1:10)
plot(fit2)

# [1]  1  2  3  4  5  6  7  8  9 10

# without penalty.factor.init
fit3 <- msaenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2), nsteps = 10L,
  seed = 1003, verbose = TRUE
)

msaenet.nzv(fit3)
msaenet.tp(fit3, 1:10)
msaenet.fp(fit3, 1:10)
plot(fit3)

# [1]   2   3   4   5   7   9  10  44  47  69 113 207 208 399 464

# with penalty.factor.init
fit4 <- msaenet(
  dat$x.tr, dat$y.tr,
  family = "cox", init = "ridge",
  alphas = seq(0.2, 0.8, 0.2), nsteps = 10L,
  seed = 1003, verbose = TRUE,
  penalty.factor.init = c(rep(1, 10), rep(10, 490))
)

msaenet.nzv(fit4)
msaenet.tp(fit4, 1:10)
msaenet.fp(fit4, 1:10)
plot(fit4)

# [1]  2  3  4  5  7  9 10

nanxstats self-assigned this Oct 19, 2018

nanxstats added the enhancement label Oct 19, 2018

nanxstats added a commit that referenced this issue Dec 14, 2018

add new argument penalty.factor.init for #4

5c4e037

nanxstats closed this as completed Dec 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support penalty.factor in the initial estimation step for must-keep variables #4

Support penalty.factor in the initial estimation step for must-keep variables #4

nanxstats commented Oct 19, 2018

nanxstats commented Dec 14, 2018

Support penalty.factor in the initial estimation step for must-keep variables #4

Support penalty.factor in the initial estimation step for must-keep variables #4

Comments

nanxstats commented Oct 19, 2018

nanxstats commented Dec 14, 2018