Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fit.mult.impute no longer accepts "fitter = lrm" #17

Closed
tormodb opened this issue Jan 6, 2016 · 8 comments
Closed

fit.mult.impute no longer accepts "fitter = lrm" #17

tormodb opened this issue Jan 6, 2016 · 8 comments

Comments

@tormodb
Copy link

tormodb commented Jan 6, 2016

I recently updated the rms package to version 4.4-1. When running through my analyses, I got an "Error in X[, mmcolnames, drop = FALSE] : subscript out of bounds".

The old models that use to work fine read:

fit <- fit.mult.impute(x ~ y, fitter = lrm, xtrans = imp, data = data)

but I have now have to fit using fitter = glm, family = "binomial" to run the models. I really miss the goodness of the plot(summary(fit)) and plot(anova(fit)) that I used to get, not to mention the convenience of getting the factor levels to print instead of simply: variable2.

Is this an error, or have you decided to drop fitter=lrm from fit.mult.impute?

@harrelfe
Copy link
Owner

harrelfe commented Jan 6, 2016

This is supposed to work. Either it is a bug or you haven't updated to the latest versions of Hmisc and rms on CRAN. Please provide the version numbers you are using and a minimal self-contained reproducible example.

@tormodb
Copy link
Author

tormodb commented Jan 6, 2016

Dear Prof. Harrel, thanks for your quick reply.

I am currently using version 4.4-1 of rms and version 3.17-1 of Hmisc which are the latest versions according to CRAN. I am having some troubles providing the reproducible example as I am not sure how to make a sample of the (unpublished) data I am using from the mids object created by mice for the imputed data.

@harrelfe
Copy link
Owner

harrelfe commented Jan 6, 2016

Don't take a sample. Generate simulated data using expand.grid, data.frame, rnorm, runif, etc. after set.seed(1) if using random numbers.

@tormodb
Copy link
Author

tormodb commented Jan 7, 2016

Dear Prof. Harrell.

I am not a very skilled programmer, simply a psychologist-researcher, so I have had some trouble generating simulated data. What I have done, however, is to upgrade all packages (and Rstudio), and then manually downgraded rms to package version 4.3-1 (installed from source from the CRAN archive). Then the fit.mult.impute function works again as expected.

I also inspected the changelog of the update to rms version 4.4.1 and noticed this:

"bj, cph, Glm, lrm, ols, orm: changed to subset model.matrix result on mmcolnames to rigorously require expected design matrix column names to be what model.matrix actually constructed"

which (from my absolutely-non-programming background) appear to perhaps relate to the mmcolnames error message I got from version 4.4-1.

Again, I apologize for not being able to provide you with simulated data, but at least downgrading the rms package seems to provide a workaround for me at this time.

Thanks for your patience.

@thaoz
Copy link

thaoz commented Jan 8, 2016

I have just sent you an email about similar issue. If I take out factor variable, the function works just fine.
For example, please take a look at these code:

str(nhanes2)
'data.frame': 25 obs. of 4 variables:
$ age: Factor w/ 3 levels "20-39","40-59",..: 1 2 1 3 1 3 1 1 2 2 ...
$ bmi: num NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
$ hyp: Factor w/ 2 levels "no","yes": NA 1 1 NA 1 NA 1 1 1 NA ...
$ chl: num NA 187 187 NA 113 184 118 187 238 NA ...
set.seed(1)
imp <- mice(nhanes2)
lm(bmihyp + age +chl, data = nhanes2) # no error
ols(bmi
hyp + age +chl, data = nhanes2) # no error
ols(bmi~hyp + age +chl, data = complete(imp, 1))
Error in X[, c("(Intercept)", mmcolnames), drop = FALSE] :
subscript out of bounds

However if I excluded factor variables out of the formula, this works

ols(bmi~chl, data = complete(imp, 1))

@harrelfe
Copy link
Owner

harrelfe commented Jan 8, 2016

tormodb: Fixing by downgrading doesn't help me fix the problem but it does show that there is a bug.

thaoz: The complete function in mice adds unnecessary contrast attributes to the factor variables. If you remove those attributes it works. But note that aregImpute may work better in many cases. aregImpute will not work for such a tiny dataset as nhanes2 though.

Here is a reproducible example that may serve as a model for how to simulate data to help debug problems:

require(rms)
require(mice)
set.seed(1)
n <- 50
d <- data.frame(x1=runif(n), x2=sample(c('a','b','c'), n, TRUE),
                x3=sample(c('A','B','C','D'), n, TRUE),
                x4=sample(0:1, n, TRUE),
                y=runif(n))
d$x1[1:5]  <- NA
d$x2[3:9]  <- NA
d$x3[7:14] <- NA

a <- aregImpute(~ x1 + x2 + x3 + x4 + y, data=d)
ols(y ~ x1 + x2 + x3 + x4, data=d)

fit.mult.impute(y ~ x1 + x2 + x3 + x4, ols, a, data=d)  # works

m <- mice(d)
d1 <- complete(m, 1)
ols(y ~ x1 + x2 + x3 + x4, data=d1)

w <- d1
attr(w$x2, 'contrasts') <- NULL
attr(w$x3, 'contrasts') <- NULL
ols(y ~ x1 + x2 + x3 + x4, data=w)

@tormodb
Copy link
Author

tormodb commented Jan 11, 2016

Dear Prof. Harrel, thanks to your example I have been able to make simulated data that reproduces the error.

require(rms)
require(mice)

n <- 50 # This is more than 10 000 in the actual dataset
d <- data.frame(age=sample(16:19, n, TRUE), 
                ethn=sample(c('no','eu','eaa'), n, TRUE),
                eco=sample(c('poor','equal','better'), n, TRUE),
                struc=sample(c('single','two'), n, TRUE),
                edu=sample(c('basic','intermediate','higher', 'uknown'), n, TRUE),
                work=sample(c('work','benefits','work/benefits'), n, TRUE),
                school=sample(c('vocational','general'), n, TRUE),
                physact=sample(c('no','yes'), n, TRUE))

d$ethn <- as.factor(d$ethn)
d$eco <- as.factor(d$eco)
d$struc <- as.factor(d$struc)
d$edu <- as.factor(d$edu)
d$work <- as.factor(d$work)
d$school <- as.factor(d$school)
d$physact <- as.factor(d$physact)

d$ethn[1:7]  <- NA
d$edu[3:9]  <- NA
d$school[7:14] <- NA
d$physact[15:20] <- NA

dd <- datadist(d)
options(datadist = "dd")

a <- mice(d)

fit.mult.impute(physact ~ age + ethn + eco + struc + edu + work + school, lrm, a, d)

Produces the error:

Error in X[, mmcolnames, drop = FALSE] : subscript out of bounds

in rms version 4.4-1, but runs fine in version 4.3-1.

(I used packrat to run the old version of rms in a separate project, and updated rms to the most recent version outside of that project. Just letting you know in case that could make a difference).

@harrelfe
Copy link
Owner

The little simulation I produced above demonstrates the point if you pass the result of mice() into fit.mult.impute. The problem is an error in how mice::complete() adds a contrasts attribute. In the next release of the Hmisc package I'll have fit.mult.impute take away this attribute. In the meantime you can use aregImpute or access the updated source file in Github for the Hmisc project which I'll have fixed today. The source file is transcan.s and you can source('https://raw.githubusercontent.com/harrelfe/Hmisc/master/R/transcan.s') after typing library(Hmisc) or require(Hmisc) to override fit.mult.impute to the new version once you see from https://github.com/harrelfe/Hmisc/blob/master/R/transcan.s that it is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants