Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with S4 slot names between lavaan and semTools (with reproducible example) #120

Closed
Chhr1s opened this issue Mar 29, 2023 · 5 comments
Labels
bug reprex showing an error or incorrect output

Comments

@Chhr1s
Copy link

Chhr1s commented Mar 29, 2023

I made a reproducible example (below) to demonstrate the issue.

In some cases, the lavaan.mi class object made by cfa.mi() has slot names which do not work with lavaan's expected slots. This occurs with several functions, including summary() and parameterestimates(), but others too.

The problem does not occur in all situations, but does when endogenous variables are categorical and the latent variable is regressed on some manifest variable.

I do some data conversion to the HolzingerSwineford data; obviously this model doesn't make any sense, but it demonstrates the example.

library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
library(semTools)
#> 
#> ###############################################################################
#> This is semTools 0.5-6.917
#> All users of R (or SEM) are invited to submit functions or ideas for functions.
#> ###############################################################################
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- lavaan::HolzingerSwineford1939

# Set the seed for reproducibility
set.seed(123)


df <- 
  df %>% 
  mutate(
# replace 10% of values to missing for example
    across(
      .fns = 
        ~ifelse(
          sample(
            c(TRUE, FALSE), 
            length(.x), 
            replace = TRUE, 
            prob = c(0.1, 0.9)
          ), NA, .x)
    ), 
# round to whole numbers for example
across(.cols = where(is.numeric), .fns = round)
  )

# Specify the CFA model 
# error occurs consistently 
# when latent with ordered manifest is regressed on a variable

mod1 <- '
F1 =~ x1 + x2 + x3

F1 ~ x9

'

fit1 <- 
  cfa.mi(
    model = mod1, 
    data = df, 
    miPackage = 'mice', 
    m = 5, 
    ordered = c('x1', 'x2', 'x3'),
    seed = 123
  )
#> Loading required namespace: mice

summary(fit1, fit = T, standardized = 'std.nox')
#> Error in lav_standardize_all_nox(object, est = est, GLIST = GLIST, partable = partable, : no slot of name "implied" for this object of class "lavaan.mi"
# same error occurs with: summary(fit1, fit = T, standardized = T)
@Chhr1s
Copy link
Author

Chhr1s commented Mar 29, 2023

Not a good long term solution out, but I forked this and modified the definition of lavaan.mi class to have an empty slot called "implied". At first look it works

@TDJorgensen
Copy link
Member

There should be no @implied slot in a class?lavaan.mi object because I do not define one, nor does class?lavaanList have one. If this is only occurring with "std.nox", then I assume you understand that you want to know the estimated parameters under the condition that all latent variables and observed outcomes have SD=1, but the exogenous predictors have their original (unstandardized) metric. Is this indeed the output you want to interpret?

I can look further into it, but first try the latest development version of lavaan:

remotes::install_github("yrosseel/lavaan")

The only time I see @implied being accessed in this file:
https://github.com/yrosseel/lavaan/blob/master/R/lav_standardize.R
is when conditional.x=TRUE, which is set by default whenever there is a categorical outcome. So the best workaround for you right now might be to set it FALSE:

fit2 <- 
  cfa.mi(
    model = mod1, 
    data = fit1, # use same imputations
    conditional.x = FALSE, 
    ordered = TRUE
  )
summary(fit2, standardized = "std.nox")
summary(fit2, standardized = "std.all") # also returned when TRUE, but "std.lv" no problem

I have wanted to redo the standardized solution options anyway, and will try to resolve this when I have time. Until then, maybe I'll just add a flag in the summary() method that prints a more informative error message when conditional.x=TRUE (which is the issue, regardless of whether there is a categorical outcome).

@TDJorgensen TDJorgensen added the bug reprex showing an error or incorrect output label Mar 30, 2023
@Chhr1s
Copy link
Author

Chhr1s commented Mar 30, 2023

Thanks for looking into this more deeply. Your long term solutions sound great. I use semTools quite often. Working with survey data, I often have ordinal indicators. In this case, I wanted std.nox because the latent is regressed on a dummy coded variable. I'm trying to report the standardized mean difference between dummy codes, so we're on the same page.

I realize your solution is better from a coding perspective (i.e., I can get results without additional changes); however, this path is the path of greatest interest in a project that I'm working on. If I want the most robust estimates in the meantime, is there any reason I should not just fork the repo and add a slot called implied?

If I understand correctly, conditional.x = FALSE will use a slightly less robust covariance matrix than conditional.x = TRUE. So if the estimates differ across these conditions, conditional.x = TRUE is more appropriate; correct?

This seemed like less work than going into lavaan and making this compatible from their end, but I'm open to your advice.

@TDJorgensen
Copy link
Member

I'm not sure what you mean by "robust". From my understanding, the primary reason to condition on X (rather than merely treating X as "fixed") is that it is easier on the optimizer to find a solution. When you condition on X with ordinal outcomes, Step 1 of DWLS is to estimate a matrix of polychoric/polyserial correlations among endogenous residuals, whereas merely fixed.x=TRUE, conditional.x=FALSE will begin with estimating a marginal matrix of polychoric/polyserial correlations. This may actually be more robust if the latent-normality assumption is violated without first conditioning on X (e.g., it could be a mixture of 2 normals, given your binary X), but I am not familiar with that literature.

If the robustness you are referring to is not wanting to treat your binary exogenous predictor as normal, then setting conditional.x=FALSE is not a problem because fixed.x=TRUE still just treats exogenous sample statistics as given rather than estimating them. Conditioning is only about how endogenous variables are modeled.

@Chhr1s
Copy link
Author

Chhr1s commented Mar 31, 2023

Thank you very much for this detailed answer; I'll go your route for the solution. I greatly appreciate your hard work and extensive contributions to structural equation modeling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug reprex showing an error or incorrect output
Projects
None yet
Development

No branches or pull requests

2 participants