Skip to content

Example 22: Specifying Trivial Misspecification (Version 0.2)

psunthud edited this page Dec 30, 2012 · 2 revisions

Model Description

This example shows how to use the program to put different types of trivial misspecification. In almost all previous examples, the specified trivial misspecification settings are random (e.g., all cross loadings are uniformly distributed from -0.2 to 0.2). Three other methods of misspecification are introduced in this example: fixed, maximal, and ranged methods.

Like the Example 13, the Monte Carlo approach in model fit evaluation will be used. The target dataset is the the Holzinger and Swineford (1939) data that have three factors with three indicators each. The data will be analyzed by confirmatory factor analysis with three factors with three indicators each. The resulting standardized parameter estimates will be used for data generation.

Example 22 Model

Before doing the Monte Carlo simulation for model fit evaluation, different methods for imposing trivial misspecification can be used. We will implement the different examples using different methods for imposing trivial misspecification here. The first example is to not put any trivial misspecification. The second and third examples are to put a fixed method of misspecification. The fixed method is to find an exemplar of maximally acceptable misspecifications (e.g., a misspecified cross loading of size .3). In the second example, the cross loadings from Factor 2 to Indicator 1 and Factor 3 to Indicator 4 with the magnitude of 0.3 are imposed on top of the estimated standardized parameters.

Example 22 Misspecification 1

In the third example, the cross loadings from Factor 1 to Indicator 6 and Factor 2 to Indicator 9 with the magnitude of 0.3 are imposed on top of the estimated standardized parameters.

Example 22 Misspecification 2

The fourth and the fifth examples shows how to impose the trivial misspecification by the random method. The random method treats a model misspecification as random and it has a distribution. Thus, the random method can account for a wide range of possible misspecified models. The fourth example is to make all cross-loadings as uniform distribution from -0.3 to 0.3. The fifth example is to make all cross-loadings as normal distribution with the mean of 0 and the standard deviation of 0.15. See the figures below of all possible cross-loadings (red lines).

Example 22 Misspecification 3

The sixth example shows how to impose the trivial misspecification by the maximal method. The maximal method also accommodates the fact that there could be a range of trivial misspecifications. However, instead of using a random value within the range, the maximal method selects the misspecification that provides maximum misfit and uses it to define the population. In the sixth example, several sets of different cross-loadings are drawn from uniform distribution from -0.3 to 0.3. Then, the set providing the largest population misfit (quantified by RMSEA) is picked and impose on the real parameters.

The seventh example shows how to impose the trivial misspecification by the ranged method. The ranged method will keep drawing a set of trivial misspecification and pick one that has a desired population misfit (e.g., RMSEA ranges from .02 to .05). The seventh example draws different sets of cross-loadings from uniform distribution from -0.3 to 0.3 and pick the set of cross-loadings that has population RMSEA between .02 and .05.

From these different trivial misspecification, we will find the p value whether the obtained fit indices from the real data are in the ranged of simulated sampling distributions using different methods of trivial misspecification. We will also show the power of the Monte Carlo method in rejecting minimally unacceptable misfit. In this severely misspecified model, we will add three cross-loadings (i.e., Factor 1 to Indicator 4, Factor 2 to Indicator 7, and Factor 3 to Indicator 1; See Figure below) having uniform distribution from 0.6 to 0.9. Again, we will draw multiple sets of severely cross-loadings and pick the set providing the minimum misfit. We will be referred this method as the minimal method.

Example 22 Misspecification 4

Syntax

Similar to the Example 13, the target dataset can be analyzed:

loading <- matrix(0, 9, 3)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
loading[7:9, 3] <- NA
model <- simParamCFA(LY=loading)
analyzeModel <- simModel(model, indLab=paste("x", 1:9, sep=""))
out <- run(analyzeModel, HolzingerSwineford1939)
summary(out)

The figure below shows the graph provided by the summary function:

Example 22 summaryOut

The data and the model objects can also be used to build a simulation study based on the parameter estimates obtained from the output by the runFit function. If trivial misspecification is not specified, the simulation can be obtained:

simOut1 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000)
getCutoff(simOut1, alpha=0.05)
pValue(out, simOut1)

The figure below shows the graph provided by the getCutoff function when there is no misspecification:

Example 22 cutoff 1

The figure below shows the graph provided by the pValue function when there is no misspecification:

Example 22 pvalue 1

In the second example, the trivial misspecification is specified as fixed. The cross loadings from Factor 2 to Indicator 1 and Factor 3 to Indicator 4 with the magnitude of 0.3 are specified as trivial misspecification and are added to the runFit function:

loadingMis2 <- matrix(0, 9, 3)
loadingMis2[1,2] <- NA
loadingMis2[4,3] <- NA
LYMis2 <- simMatrix(loadingMis2, 0.3)
misspec2 <- simMisspecCFA(LY=LYMis2, misBeforeFill=FALSE)
simOut2 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspec2) 
getCutoff(simOut2, alpha=0.05)
pValue(out, simOut2)

Note that in the simMisspecCFA function, the misBeforeFill argument is used to specify whether adding model misspecification before calculating and filling the error variance (and other parameters that have not specified). In this case, the misspecification is added after the error variances are calculated.

The figure below shows the graph provided by the getCutoff function with the fixed misspecification:

Example 22 cutoff 2

The figure below shows the graph provided by the pValue function with the fixed misspecification:

Example 22 pvalue 2

In the third example, the trivial misspecification is specified as fixed. The cross loadings from Factor 1 to Indicator 6 and Factor 2 to Indicator 9 with the magnitude of 0.3 are specified as trivial misspecification and are added to the runFit function:

loadingMis3 <- matrix(0, 9, 3)
loadingMis3[6,1] <- NA
loadingMis3[9,2] <- NA
LYMis3 <- simMatrix(loadingMis3, 0.3)
misspec3 <- simMisspecCFA(LY=LYMis3, misBeforeFill=FALSE)
simOut3 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspec3) 
getCutoff(simOut3, alpha=0.05)
pValue(out, simOut3)

The figure below shows the graph provided by the getCutoff function with the second fixed misspecification:

Example 22 cutoff 3

The figure below shows the graph provided by the pValue function with the second fixed misspecification:

Example 22 pvalue 3

In the fourth example, the trivial misspecification is specified as random. All possible cross loadings are specified as uniform distribution from -0.3 to 0.3 and are added to the runFit function:

u3 <- simUnif(-0.3, 0.3)
loadingMis4 <- matrix(0, 9, 3)
loadingMis4[4:9, 1] <- NA
loadingMis4[c(1:3, 7:9),2] <- NA
loadingMis4[1:6,3] <- NA
LYMis4 <- simMatrix(loadingMis4, "u3")
misspec4 <- simMisspecCFA(LY=LYMis4, misBeforeFill=FALSE)
simOut4 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspec4) 
getCutoff(simOut4, alpha=0.05)
pValue(out, simOut4)

The figure below shows the graph provided by the getCutoff function with the random misspecification:

Example 22 cutoff 4

The figure below shows the graph provided by the pValue function with the random misspecification:

Example 22 pvalue 4

In the fifth example, the trivial misspecification is specified as random. All possible cross loadings are specified as normal distribution with the mean of 0 and the standard deviation of 0.15 and are added to the runFit function:

n3 <- simNorm(0, 0.15)
loadingMis5 <- matrix(0, 9, 3)
loadingMis5[4:9, 1] <- NA
loadingMis5[c(1:3, 7:9),2] <- NA
loadingMis5[1:6,3] <- NA
LYMis5 <- simMatrix(loadingMis5, "n3")
misspec5 <- simMisspecCFA(LY=LYMis5, misBeforeFill=FALSE)
simOut5 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspec5) 
getCutoff(simOut5, alpha=0.05)
pValue(out, simOut5)

The figure below shows the graph provided by the getCutoff function with the second random misspecification:

Example 22 cutoff 5

The figure below shows the graph provided by the pValue function with the second random misspecification:

Example 22 pvalue 5

In the sixth example, the trivial misspecification is specified as maximal. Several sets of different cross-loadings are drawn from uniform distribution from -0.3 to 0.3. Then, the set providing the largest population misfit (quantified by RMSEA) is picked and impose on the real parameters. The maximal method of trivial misspecification can be specified:

u3 <- simUnif(-0.3, 0.3)
loadingMis6 <- matrix(0, 9, 3)
loadingMis6[4:9, 1] <- NA
loadingMis6[c(1:3, 7:9),2] <- NA
loadingMis6[1:6,3] <- NA
LYMis6 <- simMatrix(loadingMis6, "u3")
misspec6 <- simMisspecCFA(LY=LYMis6, optMisfit="max", numIter=100, misBeforeFill=FALSE)
simOut6 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspec6) 
getCutoff(simOut6, alpha=0.05)
pValue(out, simOut6)

In the simMisspecCFA function, the optMisfit is specified as "max" to find the set of parameters with the maximum trivial misspecification. The numIter is the number of draws used to find the maximum misfit.

The figure below shows the graph provided by the getCutoff function with the maximal misspecification:

Example 22 cutoff 6

The figure below shows the graph provided by the pValue function with the maximal misspecification:

Example 22 pvalue 6

In the seventh example, the trivial misspecification is specified as ranged. Several sets of different cross-loadings are drawn from uniform distribution from -0.3 to 0.3. Then, the set providing the population misfit (quantified by RMSEA) between 0.2 and 0.5 is picked and impose on the real parameters. The ranged method of trivial misspecification can be specified:

u1 <- simUnif(-0.1, 0.1)
loadingMis7 <- matrix(0, 9, 3)
loadingMis7[4:9, 1] <- NA
loadingMis7[c(1:3, 7:9),2] <- NA
loadingMis7[1:6,3] <- NA
LYMis7 <- simMatrix(loadingMis7, "u1")
misspec7 <- simMisspecCFA(LY=LYMis7, misfitBound=c(0.02, 0.05), numIter=200, misBeforeFill=FALSE)
simOut7 <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspec7) 
getCutoff(simOut7, alpha=0.05)
pValue(out, simOut7)

In the simMisspecCFA function, the misfitBound is specified as a vector of the minimum and maximum values of the population misfit to be retained.

The figure below shows the graph provided by the getCutoff function with the specified range of misspecification:

Example 22 cutoff 7

The figure below shows the graph provided by the pValue function with the specified range of misspecification:

Example 22 pvalue 7

From above, we can calculate the p value based on different specification of trivial misspecification. Currently, we will find the power of each method of trivial misspecification. First, the alternative model with severe misspecification is created by the minimal method. That is, three cross-loadings (Factor 1 to Indicator 4, Factor 2 to Indicator 7, and Factor 3 to Indicator 1) are uniformly distributed from 0.6 and 0.9. Next, those cross-loadings are drawn multiple times and the set that provides the minimal population misfit is picked. The minimally severe misspecification can be specified:

u69 <- simUnif(0.6, 0.9)
loadingMisAlt <- matrix(0, 9, 3)
loadingMisAlt[4, 1] <- NA
loadingMisAlt[7, 2] <- NA
loadingMisAlt[1, 3] <- NA
LYMisAlt <- simMatrix(loadingMisAlt, "u69")
misspecAlt <- simMisspecCFA(LY=LYMisAlt, optMisfit="min", numIter=100, misBeforeFill=FALSE)
simOutAlt <- runFit(model=analyzeModel, data=HolzingerSwineford1939, nRep=1000, misspec=misspecAlt) 

In the simMisspecCFA function, the optMisfit argument is specified as "min" in order to pick the minimum value of population misfit. The numIter is the number of draws used to find the maximum misfit. The power of rejecting the minimally severe misspecified model can be calculated by the getPowerFit function:

getPowerFit(simOutAlt, nullObject=simOut1) 
getPowerFit(simOutAlt, nullObject=simOut2) 
getPowerFit(simOutAlt, nullObject=simOut3) 
getPowerFit(simOutAlt, nullObject=simOut4) 
getPowerFit(simOutAlt, nullObject=simOut5) 
getPowerFit(simOutAlt, nullObject=simOut6) 
getPowerFit(simOutAlt, nullObject=simOut7) 

The first argument is the simulation from the severely misspecified model (alternative model) and the second argument is the simulation from the no or trivially misspecified model (null model).

The figure below shows the graph provided by the getPowerFit function when the null model has no misspecification:

Example 22 getPower 1

The figure below shows the graph provided by the getPowerFit function when the null model has the fixed misspecification:

Example 22 getPower 2

The figure below shows the graph provided by the getPowerFit function when the null model has the second fixed misspecification:

Example 22 getPower 3

The figure below shows the graph provided by the getPowerFit function when the null model has the random misspecification:

Example 22 getPower 4

The figure below shows the graph provided by the getPowerFit function when the null model has the second random misspecification:

Example 22 getPower 5

The figure below shows the graph provided by the getPowerFit function when the null model has the maximal-method misspecification:

Example 22 getPower 6

The figure below shows the graph provided by the getPowerFit function when the null model has the specified range method of misspecification:

Example 22 getPower 7

We can investigate the actual trivial misspecification values (e.g., the actual values of cross loadings imposed) by drawing the population values only (not generate data and not analyze data). The function used to create the population values are runFitParam. The result object is called parameter result object:

param1 <- runFitParam(analyzeModel, data=HolzingerSwineford1939)
param2 <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspec2)
param3 <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspec3)
param4 <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspec4)
param5 <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspec5)
param6 <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspec6)
param7 <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspec7)
paramAlt <- runFitParam(analyzeModel, data=HolzingerSwineford1939, misspec=misspecAlt)

summary(param4)

The first argument is the model object to analyze the data. The data argument is the real data used to be analyzed. The misspec argument is the (trivial) misspecification adding on the real parameter.

The figure below shows the graph provided by the summary function on the parameter result object:

Example 22 summaryParam 1

Example 22 summaryParam 2

Example 22 summaryParam 3

We can use summaryParam, summaryMisspec, and summaryFit to extract the information about real parameter, misspecified parameter, and fit indices:

summaryParam(param4)
summaryMisspec(param4)
summaryFit(param4)

We can also use plotMisfit function to investigate the relationship between the value of misspecification (e.g., cross loadings) with the population fit index:

plotMisfit(param4, misParam="LY9_1")

The first argument is the parameter result object. The second argument is the target misspecified parameter.

The figure below shows the graph provided by the plotMisfit function:

Example 22 plot misfit

Here is the summary of the whole script in this example.

Function Review

  • runFitParam Find the real and misspecified parameter values as well as population fit indices using multiple replications
  • summaryMisspec Summarize misspecified parameters
  • plotMisfit Plot the misspecified parameters against the population fit indices
Clone this wiki locally