Skip to content

Example 24: Nested Model Comparison

psunthud edited this page Dec 30, 2012 · 2 revisions

Model Description

This example will show how to find the fit indices cutoff for nested model comparison and find the statistical power for nested model comparison. We will investigate the weak invariance testing in longitudinal confirmatory factor analysis model. Three indicators from a common factor are measured three times. We would like to test whether the factor loadings are stable across time. The nested model is the model that factor loadings are equal across time. The parent model is that model that factor loadings are different across time. We will use a simulation study to derive the cutoffs of the change in fit indices (e.g., change in CFI or chi-squared value) when the factor loadings are trivially different across time (different less than or equal to .1).

In the nested model, the loadings from Indicator 1, 4, and 7 are fixed to 1 for scale identification. The loadings from other indicators values are uniformly distributed from 0.5 to 1.5. The equality constraints are imposed on Indicator 2, 5, and 8, and 3, 6, and 9 such that the loadings in each set are forced to be equal. All indicators' intercepts are 0. The factor variances in Time 1, 2, and 3 are 1, 1.2, and 1.4, respectively. All measurement error variances are 0.4. The first-order auto-correlations in factor level are .7. The second-order autocorrelation in factor level are .49. The first- and second-order auto-correlations in measurement errors are .2 and .04, respectively. The only difference between the nested model and parent model is the factor loading. The Indicators 2, 3, 5, 6, 8, and 9 in the parent model are separately random from the uniform distribution from 0.5 to 1.5. The factor loadings of the same indicator across time will be different. The trivial misspecification will be added into the model such that the loadings of Indicators 2, 3, 5, 6, 8, and 9 are added by values randomly drawn from the uniform distribution from -0.1 to 0.1. That is, the factor loadings are trivially different across time in the nested model.

In this example, the datasets are generated from both nested model and parent model. Both group of datasets are fitted by both nested model and parent model. When the datasets from the nested model are fitted by both models, the fit indices obtained from both models should not have a large difference, which means the nested model is preferred. When the datasets from the parent model are fitted by both models, the fit indices from the nested model should be worse than the fit indices from the parent model, which means the parent model is preferred. We will find the cutoffs to decide whether to select the nested (weak invariance) or parent (noninvariance) models.

Example 24 Model

Syntax

The nested model with equality constraint can be specified:

loadingNested <- matrix(0, 9, 3)
loadingNested[1, 1] <- 1
loadingNested[2:3, 1] <- c("con1", "con2")
loadingNested[4, 2] <- 1
loadingNested[5:6, 2] <- c("con1", "con2")
loadingNested[7, 3] <- 1
loadingNested[8:9, 3] <- c("con1", "con2")
loadingMis <- matrix(0, 9, 3)
loadingMis[2:3, 1] <- "runif(1, -0.1, 0.1)"
loadingMis[5:6, 2] <- "runif(1, -0.1, 0.1)"
loadingMis[8:9, 3] <- "runif(1, -0.1, 0.1)"
LYnested <- bind(loadingNested, "runif(1, 0.5, 1.5)", misspec = loadingMis)

facCor <- matrix(NA, 3, 3)
diag(facCor) <- 1
facCorVal <- diag(3)
facCorVal[1, 2] <- facCorVal[2, 1] <- 0.7
facCorVal[2, 3] <- facCorVal[3, 2] <- 0.7
facCorVal[1, 3] <- facCorVal[3, 1] <- 0.49
RPS <- binds(facCor, facCorVal)

VE <- bind(rep(NA, 3), c(1, 1.2, 1.4))

error <- diag(9)
error[1, 4] <- error[4, 7] <- error[4, 1] <- error[7, 4] <- NA
error[2, 5] <- error[5, 8] <- error[5, 2] <- error[8, 5] <- NA
error[3, 6] <- error[6, 9] <- error[6, 3] <- error[9, 6] <- NA
error[1, 7] <- error[7, 1] <- NA
error[2, 8] <- error[8, 2] <- NA
error[3, 9] <- error[9, 3] <- NA
errorVal <- diag(9)
errorVal[1, 4] <- errorVal[4, 7] <- errorVal[4, 1] <- errorVal[7, 4] <- 0.2
errorVal[2, 5] <- errorVal[5, 8] <- errorVal[5, 2] <- errorVal[8, 5] <- 0.2
errorVal[3, 6] <- errorVal[6, 9] <- errorVal[6, 3] <- errorVal[9, 6] <- 0.2
errorVal[1, 7] <- errorVal[7, 1] <- 0.04
errorVal[2, 8] <- errorVal[8, 2] <- 0.04
errorVal[3, 9] <- errorVal[9, 3] <- 0.04
RTE <- binds(error, errorVal)

VTE <- bind(rep(NA, 9), 0.4)

longNested <- model(LY=LYnested, RPS=RPS, VE=VE, RTE=RTE, VTE=VTE, modelType = "CFA")

The parent model without equality constraint can be specified:

loadingParent <- matrix(0, 9, 3)
loadingParent[1, 1] <- 1
loadingParent[2:3, 1] <- NA
loadingParent[4, 2] <- 1
loadingParent[5:6, 2] <- NA
loadingParent[7, 3] <- 1
loadingParent[8:9, 3] <- NA
loadingMis <- matrix(0, 9, 3)
loadingMis[2:3, 1] <- "runif(1, -0.1, 0.1)"
loadingMis[5:6, 2] <- "runif(1, -0.1, 0.1)"
loadingMis[8:9, 3] <- "runif(1, -0.1, 0.1)"
LYparent <- bind(loadingParent, "runif(1, 0.5, 1.5)", misspec = loadingMis)
    
longParent <- model(LY=LYparent, RPS=RPS, VE=VE, RTE=RTE, VTE=VTE, modelType = "CFA")

The result objects fitting the nested and parent models onto the datasets created from the nested model. Note that if the seed number is the same (the seed number is fixed by default), the same data are generated by both models.

outDatNestedModNested <- sim(1000, n = 200, longNested, generate = longNested)
outDatNestedModParent <- sim(1000, n = 200, longParent, generate = longNested)

For nested model comparison, we would like to compare the simulation results from the same sets of datasets using two nested models. If the datasets are created from the nested model, the fit indices results from using the nested and parent models should not be high. On the other hand, if the datasets are created from the parent model, the fit indices results from the nested model should be worse than the fit indices results from the parent model. Currently, we have the simulation results fitting two nested models on the datasets from the nested model. We can find the average difference in fit indices, as well as the proportion of replications that provides significant chi-square difference test (power), by the anova function:

anova(outDatNestedModNested, outDatNestedModParent)

The first argument is the output using the nested model and the second argument is the output using the parent model. Because the data object is from the nested model, the power should be low.

The figure below shows the graph provided by the anova function when datasets are from the nested model:

Example 24 anova null

To find the cutoff of the difference in fit indices using the Monte Carlo approach, the getCutoffNested function can be used:

cutoff <- getCutoffNested(outDatNestedModNested, outDatNestedModParent)

This first argument is the simulation result using the nested model and the second argument is the simulation result using the parent model. The data object used in these two result objects must be the same in order to create the same datasets, which will be analyzed different model objects. In the function, the fit indices from the nested model will be subtracted by the fit indices from the parent model. Then, the cutoffs based on a priori alpha level are provided from the sampling distribution of the difference between fit indices from the nested and parent models.

The figure below shows the graph provided by the getCutoffNested function:

Example 24 get cutoff

The sampling distribution of the difference between fit indices from the nested and parent models can be plotted by the plotCutoffNested function:

plotCutoffNested(outDatNestedModNested, outDatNestedModParent, alpha=0.05)

This first argument is the simulation result using the nested model and the second argument is the simulation result using the parent model. The alpha argument is the a priori alpha level. Again, the data object used in these two result objects must be the same.

The figure below shows the graph provided by the plotCutoffNested function:

Example 24 fit

The previous two result objects used the data object from the nested model, which provides the sampling distribution of the difference in fit indices if the nested model is true. Alternatively, we can find the sampling distribution of the difference in fit indices if the nested model is false (the parent model is true). The model template from the parent model is used instead:

outDatParentModNested <- sim(1000, n = 200, longNested, generate = longParent)
outDatParentModParent <- sim(1000, n = 200, longParent, generate = longParent)

We can also use the anova function to compare two simulation results of fitting nested and parent models on the datasets from the parent model:

anova(outDatParentModNested, outDatParentModParent)

Because the data object is from the parent model, the power should be high.

The figure below shows the graph provided by the anova function when datasets are from the parent model:

Example 24 anova null

We can also find the statistical power of the Monte Carlo approach for the nested model comparison. That is, we will investigate the proportion of the replications from the simulation with datasets from parent models indicating worse fit than the derived cutoffs from the Monte Carlo approach. We can use the getPowerFitNested function to find the power:

getPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent)

The first argument is the simulation fitting the nested model on the datasets from the parent model. The second argument is the simulation fitting the parent model on the datasets from the parent model. The nullNested argument is the simulation fitting the nested model on the datasets from the nested model. The nullParent argument is the simulation fitting the parent model on the datasets from the nested model.

The figure below shows the graph provided by the getPowerFitNested function when datasets from the nested model are specified:

Example 24 get power 1

Instead of providing the simulation results using the datasets from the nested model, we can specify a vector of the fit indices cutoff to find the statistical power by the getPowerFitNested function:

getPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff=cutoff)

The figure below shows the graph provided by the getPowerFitNested function when the derived cutoffs are specified:

Example 24 get power 2

We can plot the sampling distributions of the differences in fit indices when the datasets are from both the nested and the parent models by the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent)

The first argument is the simulation fitting the nested model on the datasets from the parent model. The second argument is the simulation fitting the parent model on the datasets from the parent model. The nullNested argument is the simulation fitting the nested model on the datasets from the nested model. The nullParent argument is the simulation fitting the parent model on the datasets from the nested model.

The figure below shows the graph provided by the plotPowerFitNested function when the simulation results of the datasets from both parent and nested models are specified:

Example 24 plot power 1

We can select only the plot of the difference in RMSEA in the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent, usedFit="RMSEA")

The figure below shows the graph provided by the plotPowerFitNested function when the simulation results of the datasets from both parent and nested models are specified and the only RMSEA plot is selected:

Example 24 plot power 2

We can also use our theoretical cutoffs as the cutoff to retain or reject the nested model. For example, when the chi-square value is greater than 3.84 (chi-square cutoff when df = 1 and alpha = .05) or the difference in CFI is greater than 0.1, the nested model will be rejected. This cutoff can be used to find the power in the getPowerFitNested function:

cutoff2 <- c(Chi = 3.84, CFI = -0.01)
getPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff = cutoff2)

The figure below shows the graph provided by the getPowerFitNested function when the a priori cutoffs are specified:

Example 24 get power 3

The cutoff can be imposed in the sampling distribution of the difference in fit indices to visualize the power by the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff = cutoff2)

The figure below shows the graph provided by the plotPowerFitNested function when the a priori cutoff is specified:

Example 24 plot power 3

The cutoff can be imposed in the sampling distributions of the differences in fit indices of the datasets from both nested and parent models to visualize the power and type I error by the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent, cutoff=cutoff2)

The figure below shows the graph provided by the plotPowerFitNested function when the a priori cutoff is specified with the simulation results of the datasets from the nested model:

Example 24 plot power 4

Here is the summary of the whole script in this example.

Clone this wiki locally