glmCensRd: Fitting generalized linear models with censored predictors in R

Sarah C. Lotspeich, Peter Guan, and Tanya P. Garcia

# Run once
# install.packages("devtools")
devtools::install_github("sarahlotspeich/glmCensRd", ref = "main")

# Reproducibility
set.seed(333)

Linear Regression Outcome Model

Set `distY = "normal"`

Predictor follows a normal distribution

Set `distX = "normal"`

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rnorm(n = n, mean = 0.75 + 0.25 * z, sd = 1)  
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

μ_Y = 1 + 0.5X + 0.25Z,
σ_Y² = 1,
μ_X = 0.75 + 0.25Z, and
σ_X² = 1.

# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "normal", data = temp, steptol = 1e-2, iterlim = 100, estSE = TRUE)

# Check convergence status
with(fit, code)

## [1] 2

The code is returned from glmCensRd()’s internal call to nlm() and per the documentation is interpreted as follows:

1: relative gradient is close to zero, current iterate is probably solution.
2: successive iterates within tolerance, current iterate is probably solution.
3: last global step failed to locate a point lower than estimate. Either estimate is an approximate local minimum of the function or steptol is too small.
4: iteration limit exceeded.
5: maximum step size stepmax exceeded five consecutive times. Either the function is unbounded below, becomes asymptotic to a finite value from above in some direction or stepmax is too small.

In general, we accept code <= 2 as successful convergence to the MLE. Results of code = 3-4 might require tuning the steptol or iterlim parameters in the call to glmCensRd().

# Inspect output 
## Outcome model P(Y|X,Z)
with(fit, outcome_model)

## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.9730323 0.09091957
## X           0.4566375 0.08317825
## z           0.2815724 0.02841242
## 
## $sigma2
## [1] 0.9893482

From the outcome_model slot of the model fit, we get the following:

μ̂_Y(X,Z) = 0.973 + 0.457X + 0.282Z and
σ̂_Y² = 0.989.

## Predictor model P(X|Z)
with(fit, predictor_model)

## $distX
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.7033991 0.05449020
## z           0.2993972 0.04264013
## 
## $sigma2
## [1] 1.000342

Then we get the remaining parameter estimates from predictor_model:

μ̂_X(Z) = 0.703 + 0.299Z and
σ̂_X² = 1.000.

Predictor follows a gamma distribution

Set `distX = "gamma"`

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rgamma(n = n, shape = 2, scale = (0.75 + 0.25 * z) / 2)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

μ_Y(X,Z) = 1 + 0.5X + 0.25Z,
σ_Y² = 1,
μ_X(Z) = 0.75 + 0.25Z,
ν_X = 2, and
λ_X(Z) = (0.75+0.25Z)/2,

# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "gamma", data = temp, estSE = TRUE)

# Check convergence status
with(fit, code)

## [1] 2

## Outcome model P(Y|X,Z)
with(fit, outcome_model)

## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 1.0323830 0.04597472
## X           0.5267726 0.02059062
## z           0.2339656 0.01716312
## 
## $sigma2
## [1] 1.001325

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

μ̂_Y(X,Z) = 1.032 + 0.527X + 0.234Z and
σ̂_Y² = 1.001.

## Predictor model P(X|Z)
with(fit, predictor_model)

## $distX
## [1] "gamma"
## 
## $mean
##                 coeff         se
## (Intercept) 0.7075069 0.02304275
## z           0.2458071 0.02226052
## 
## $shape
##                coeff         se
## (Intercept) 2.027891 0.06915529

Then we get the remaining parameter estimates from predictor_model:

μ̂_X = 0.708 + 0.246Z,
ν̂_X = 2.028, and
λ̂_X = μ̂_X/ν̂_X = (0.708+0.246Z)/2.028.

Predictor follows an inverse-Gaussian distribution

Set `distX = "inverse-gaussian"`

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- SuppDists::rinvGauss(n = n, nu = 0.75 + 0.25 * z, lambda = 2)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

μ_Y(X,Z) = 1 + 0.5X + 0.25Z,
σ_Y² = 1,
μ_X(Z) = 0.75 + 0.25Z, and
λ_X = 2.

# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "inverse-gaussian", data = temp, estSE = TRUE)

# Check convergence status
with(fit, code)

## [1] 1

## Outcome model P(Y|X,Z)
with(fit, outcome_model)

## $distY
## [1] "normal"
## 
## $mean
##                 coeff          se
## (Intercept) 0.9769124 0.040886988
## X           0.5201065 0.003835241
## z           0.2435986 0.016672972
## 
## $sigma2
## [1] 0.9942551

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

μ̂_Y(X,Z) = 0.977 + 0.520X + 0.243Z and
σ̂_Y² = 0.994.

## Predictor model P(X|Z)
with(fit, predictor_model)

## $distX
## [1] "inverse-gaussian"
## 
## $mean
##                 coeff         se
## (Intercept) 0.7854897 0.03956616
## z           0.2288820 0.03414016
## 
## $shape
##                coeff         se
## (Intercept) 2.031719 0.09624681

Then we get the remaining parameter estimates from predictor_model:

μ̂_X = 0.785 + 0.229Z and
λ̂_X = 2.032.

Predictor follows a Weibull distribution

Set `distX = "weibull"`

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rweibull(n = n, shape = 2, scale = 0.75 + 0.25 * z)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

μ_Y(X,Z) = 1 + 0.5X + 0.25Z,
σ_Y² = 1,
λ_X(Z) = 0.75 + 0.25Z, and
ν_X = 2.

# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "weibull", data = temp, estSE = TRUE)

# Check convergence status
with(fit, code)

## [1] 2

## Outcome model P(Y|X,Z)
with(fit, outcome_model)

## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.8642203 0.06081294
## X           0.5924190 0.05924627
## z           0.2713237 0.01815378
## 
## $sigma2
## [1] 0.9295751

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

μ̂_Y(X,Z) = 0.864 + 0.592X + 0.271Z and
σ̂_Y² = 0.930.

## Predictor model P(X|Z)
with(fit, predictor_model)

## $distX
## [1] "weibull"
## 
## $scale
##                 coeff         se
## (Intercept) 0.8015621 0.01755491
## z           0.2238740 0.01529947
## 
## $shape
##                coeff         se
## (Intercept) 1.925981 0.04179089

Then we get the remaining parameter estimates from predictor_model:

λ̂_{X_X} = 0.802 + 0.224Z and
ν̂_X = 1.923.

Predictor follows a exponential distribution

Set `distX = "exponential"`

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rexp(n = n, rate = 0.75 + 0.25 * z)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

μ_Y(X,Z) = 1 + 0.5X + 0.25Z,
σ_Y² = 1,
γ_X(Z) = 0.75 + 0.25Z.

# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "exponential", data = temp, estSE = TRUE)

# Check convergence status
with(fit, code)

## [1] 2

## Outcome model P(Y|X,Z)
with(fit, outcome_model)

## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 1.1186313 0.11496119
## X           0.4680319 0.01892211
## z           0.2151738 0.06292374
## 
## $sigma2
## [1] 0.9376589

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

μ̂_Y(X,Z) = 1.119 + 0.468X + 0.215Z and
σ̂_Y² = 0.934.

## Predictor model P(X|Z)
with(fit, predictor_model)

## $distX
## [1] "exponential"
## 
## $rate
##                 coeff         se
## (Intercept) 0.7817447 0.07201083
## z           0.2815443 0.06723741

Then we get the remaining parameter estimates from predictor_model:

γ̂_X(Z) = 0.782 + 0.282Z.

Predictor follows a Poisson distribution

Set `distX = "poisson"`

# Generate data
n <- 1000
## Generate fully observed covariate
z <- rlnorm(n = n)
## Generate the (censored) predictor
x <- rpois(n = n, lambda = 0.75 + 0.25 * z)
## Generate censoring value
c <- rweibull(n = n, shape = 1, scale = 3)  
## Generate the outcome 
y <- rnorm(n = n, mean = 1 + 0.5 * x + 0.25 * z, sd = 1)
## Calculate observed predictor + event indicator
w <- pmin(x, c)
d <- as.numeric(x <= c)
x[d == 0] <- NA
# Construct dataframe 
temp <- data.frame(y, w, x, z, d)

This data generation scheme leads to true values:

where

μ_Y(X,Z) = 1 + 0.5X + 0.25Z,
σ_Y² = 1,
γ_X(Z) = 0.75 + 0.25Z.

# Fit model 
fit <- glmCensRd::glmCensRd(Y = "y", W = "w", D = "d", Z = "z", distY = "normal", distX = "poisson", data = temp, estSE = TRUE)

# Check convergence status
with(fit, code)

## [1] 2

## Outcome model P(Y|X,Z)
with(fit, outcome_model)

## $distY
## [1] "normal"
## 
## $mean
##                 coeff         se
## (Intercept) 0.9477904 0.05371840
## X           0.5084887 0.03455069
## z           0.2205852 0.02521094
## 
## $sigma2
## [1] 1.00549

The code and outcome_model slots are interpreted in the same way as the first section, since once again distY = "normal", with

μ̂_Y(X,Z) = 0.948 + 0.508X + 0.221Z and
σ̂_Y² = 1.005.

## Predictor model P(X|Z)
with(fit, predictor_model)

## $distX
## [1] "poisson"
## 
## $rate
##                 coeff         se
## (Intercept) 0.7446170 0.05068968
## z           0.1911478 0.03251821

Then we get the remaining parameter estimates from predictor_model:

γ̂_X(Z) = 0.745 + 0.191Z.

Name		Name	Last commit message	Last commit date
Latest commit History 327 Commits
R		R
images		images
man		man
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
glmCensRd-hex.png		glmCensRd-hex.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

images

images

man

man

.gitignore

.gitignore

DESCRIPTION

DESCRIPTION

NAMESPACE

NAMESPACE

README.md

README.md

glmCensRd-hex.png

glmCensRd-hex.png

Repository files navigation

glmCensRd: Fitting generalized linear models with censored predictors in R

Linear Regression Outcome Model

Set `distY = "normal"`

Predictor follows a normal distribution

Set `distX = "normal"`

Predictor follows a gamma distribution

Set `distX = "gamma"`

Predictor follows an inverse-Gaussian distribution

Set `distX = "inverse-gaussian"`

Predictor follows a Weibull distribution

Set `distX = "weibull"`

Predictor follows a exponential distribution

Set `distX = "exponential"`

Predictor follows a Poisson distribution

Set `distX = "poisson"`

About

Releases

Packages

Contributors 2

Languages

sarahlotspeich/glmCensRd

Folders and files

Latest commit

History

Repository files navigation

glmCensRd: Fitting generalized linear models with censored predictors in R

Linear Regression Outcome Model

Set distY = "normal"

Predictor follows a normal distribution

Set distX = "normal"

Predictor follows a gamma distribution

Set distX = "gamma"

Predictor follows an inverse-Gaussian distribution

Set distX = "inverse-gaussian"

Predictor follows a Weibull distribution

Set distX = "weibull"

Predictor follows a exponential distribution

Set distX = "exponential"

Predictor follows a Poisson distribution

Set distX = "poisson"

About

Resources

Stars

Watchers

Forks

Languages

Set `distY = "normal"`

Set `distX = "normal"`

Set `distX = "gamma"`

Set `distX = "inverse-gaussian"`

Set `distX = "weibull"`

Set `distX = "exponential"`

Set `distX = "poisson"`