Skip to content

sarahlotspeich/glmCensRd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

glmCensRd: Fitting generalized linear models with censored predictors in R

Sarah C. Lotspeich, Peter Guan, and Tanya P. Garcia

# Run once
# install.packages("devtools")
devtools::install_github("sarahlotspeich/glmCensRd", ref = "main")
# Reproducibility
set.seed(333)

Linear Regression Outcome Model

Set distY = "normal"

Predictor follows a normal distribution

Set distX = "normal"

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rnorm(n = n, mean = 0.75 + 0.25 * z, sd = 1)  
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

  • μY = 1 + 0.5X + 0.25Z,
  • σY2 = 1,
  • μX = 0.75 + 0.25Z, and
  • σX2 = 1.
# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "normal", data = temp, steptol = 1e-2, iterlim = 100, estSE = TRUE)
# Check convergence status
with(fit, code)
## [1] 2

The code is returned from glmCensRd()’s internal call to nlm() and per the documentation is interpreted as follows:

  • 1: relative gradient is close to zero, current iterate is probably solution.
  • 2: successive iterates within tolerance, current iterate is probably solution.
  • 3: last global step failed to locate a point lower than estimate. Either estimate is an approximate local minimum of the function or steptol is too small.
  • 4: iteration limit exceeded.
  • 5: maximum step size stepmax exceeded five consecutive times. Either the function is unbounded below, becomes asymptotic to a finite value from above in some direction or stepmax is too small.

In general, we accept code <= 2 as successful convergence to the MLE. Results of code = 3-4 might require tuning the steptol or iterlim parameters in the call to glmCensRd().

# Inspect output 
## Outcome model P(Y|X,Z)
with(fit, outcome_model)
## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.9730323 0.09091957
## X           0.4566375 0.08317825
## z           0.2815724 0.02841242
## 
## $sigma2
## [1] 0.9893482

From the outcome_model slot of the model fit, we get the following:

  • μ̂Y(X,Z) = 0.973 + 0.457X + 0.282Z and
  • σ̂Y2 = 0.989.
## Predictor model P(X|Z)
with(fit, predictor_model)
## $distX
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.7033991 0.05449020
## z           0.2993972 0.04264013
## 
## $sigma2
## [1] 1.000342

Then we get the remaining parameter estimates from predictor_model:

  • μ̂X(Z) = 0.703 + 0.299Z and
  • σ̂X2 = 1.000.

Predictor follows a gamma distribution

Set distX = "gamma"

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rgamma(n = n, shape = 2, scale = (0.75 + 0.25 * z) / 2)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

  • μY(X,Z) = 1 + 0.5X + 0.25Z,
  • σY2 = 1,
  • μX(Z) = 0.75 + 0.25Z,
  • νX = 2, and
  • λX(Z) = (0.75+0.25Z)/2,
# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "gamma", data = temp, estSE = TRUE)
# Check convergence status
with(fit, code)
## [1] 2
## Outcome model P(Y|X,Z)
with(fit, outcome_model)
## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 1.0323830 0.04597472
## X           0.5267726 0.02059062
## z           0.2339656 0.01716312
## 
## $sigma2
## [1] 1.001325

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

  • μ̂Y(X,Z) = 1.032 + 0.527X + 0.234Z and
  • σ̂Y2 = 1.001.
## Predictor model P(X|Z)
with(fit, predictor_model)
## $distX
## [1] "gamma"
## 
## $mean
##                 coeff         se
## (Intercept) 0.7075069 0.02304275
## z           0.2458071 0.02226052
## 
## $shape
##                coeff         se
## (Intercept) 2.027891 0.06915529

Then we get the remaining parameter estimates from predictor_model:

  • μ̂X = 0.708 + 0.246Z,
  • ν̂X = 2.028, and
  • λ̂X = μ̂X/ν̂X = (0.708+0.246Z)/2.028.

Predictor follows an inverse-Gaussian distribution

Set distX = "inverse-gaussian"

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- SuppDists::rinvGauss(n = n, nu = 0.75 + 0.25 * z, lambda = 2)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

  • μY(X,Z) = 1 + 0.5X + 0.25Z,
  • σY2 = 1,
  • μX(Z) = 0.75 + 0.25Z, and
  • λX = 2.
# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "inverse-gaussian", data = temp, estSE = TRUE)
# Check convergence status
with(fit, code)
## [1] 1
## Outcome model P(Y|X,Z)
with(fit, outcome_model)
## $distY
## [1] "normal"
## 
## $mean
##                 coeff          se
## (Intercept) 0.9769124 0.040886988
## X           0.5201065 0.003835241
## z           0.2435986 0.016672972
## 
## $sigma2
## [1] 0.9942551

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

  • μ̂Y(X,Z) = 0.977 + 0.520X + 0.243Z and
  • σ̂Y2 = 0.994.
## Predictor model P(X|Z)
with(fit, predictor_model)
## $distX
## [1] "inverse-gaussian"
## 
## $mean
##                 coeff         se
## (Intercept) 0.7854897 0.03956616
## z           0.2288820 0.03414016
## 
## $shape
##                coeff         se
## (Intercept) 2.031719 0.09624681

Then we get the remaining parameter estimates from predictor_model:

  • μ̂X = 0.785 + 0.229Z and
  • λ̂X = 2.032.

Predictor follows a Weibull distribution

Set distX = "weibull"

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rweibull(n = n, shape = 2, scale = 0.75 + 0.25 * z)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

  • μY(X,Z) = 1 + 0.5X + 0.25Z,
  • σY2 = 1,
  • λX(Z) = 0.75 + 0.25Z, and
  • νX = 2.
# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "weibull", data = temp, estSE = TRUE)
# Check convergence status
with(fit, code)
## [1] 2
## Outcome model P(Y|X,Z)
with(fit, outcome_model)
## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.8642203 0.06081294
## X           0.5924190 0.05924627
## z           0.2713237 0.01815378
## 
## $sigma2
## [1] 0.9295751

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

  • μ̂Y(X,Z) = 0.864 + 0.592X + 0.271Z and
  • σ̂Y2 = 0.930.
## Predictor model P(X|Z)
with(fit, predictor_model)
## $distX
## [1] "weibull"
## 
## $scale
##                 coeff         se
## (Intercept) 0.8015621 0.01755491
## z           0.2238740 0.01529947
## 
## $shape
##                coeff         se
## (Intercept) 1.925981 0.04179089

Then we get the remaining parameter estimates from predictor_model:

  • λ̂XX = 0.802 + 0.224Z and
  • ν̂X = 1.923.

Predictor follows a exponential distribution

Set distX = "exponential"

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rexp(n = n, rate = 0.75 + 0.25 * z)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

  • μY(X,Z) = 1 + 0.5X + 0.25Z,
  • σY2 = 1,
  • γX(Z) = 0.75 + 0.25Z.
# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "exponential", data = temp, estSE = TRUE)
# Check convergence status
with(fit, code)
## [1] 2
## Outcome model P(Y|X,Z)
with(fit, outcome_model)
## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 1.1186313 0.11496119
## X           0.4680319 0.01892211
## z           0.2151738 0.06292374
## 
## $sigma2
## [1] 0.9376589

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

  • μ̂Y(X,Z) = 1.119 + 0.468X + 0.215Z and
  • σ̂Y2 = 0.934.
## Predictor model P(X|Z)
with(fit, predictor_model)
## $distX
## [1] "exponential"
## 
## $rate
##                 coeff         se
## (Intercept) 0.7817447 0.07201083
## z           0.2815443 0.06723741

Then we get the remaining parameter estimates from predictor_model:

  • γ̂X(Z) = 0.782 + 0.282Z.

Predictor follows a Poisson distribution

Set distX = "poisson"

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rpois(n = n, lambda = 0.75 + 0.25 * z)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

  • μY(X,Z) = 1 + 0.5X + 0.25Z,
  • σY2 = 1,
  • γX(Z) = 0.75 + 0.25Z.
# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "poisson", data = temp, estSE = TRUE)
# Check convergence status
with(fit, code)
## [1] 2
## Outcome model P(Y|X,Z)
with(fit, outcome_model)
## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.9477904 0.05371840
## X           0.5084887 0.03455069
## z           0.2205852 0.02521094
## 
## $sigma2
## [1] 1.00549

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

  • μ̂Y(X,Z) = 0.948 + 0.508X + 0.221Z and
  • σ̂Y2 = 1.005.
## Predictor model P(X|Z)
with(fit, predictor_model)
## $distX
## [1] "poisson"
## 
## $rate
##                 coeff         se
## (Intercept) 0.7446170 0.05068968
## z           0.1911478 0.03251821

Then we get the remaining parameter estimates from predictor_model:

  • γ̂X(Z) = 0.745 + 0.191Z.

About

Fitting generalized linear models with censored covariates in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages