Skip to content

Example 16: Modeling an Endogenous Covariate

psunthud edited this page Jan 20, 2013 · 1 revision

Model Description

This example will show how to create a model with a covariate and analyze the data with or without the covariate. The target model has two factors with three indicators each. The second factor is regressed on the first factor. The parameters are shown in the figure below. The variable Y7 is an indicator-level covariate. The effect of the covariate ranges from 0.3 to 0.5 in standardized metric. The trivial misspecification is also added during the data generation process as shown in the box below.

Example 16 Model 1

We will analyze the simulated data by 1) excluding the covariate from the analysis, 2) accounting for the covariate as the model described above, 3) accounting for the covariate by orthogonalization, and 4) accounting for the covariate at the factor level, as shown in the figure below.

Example 16 Model 2

Syntax

To begin with, the data generation model is needed. Factor loadings with trivial misspecification are specified:

loading <- matrix(0, 7, 3)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
loading[1:7, 3] <- NA
loadingVal <- matrix(0, 7, 3)
loadingVal[1:3, 1] <- "runif(1, 0.5, 0.7)"
loadingVal[4:6, 2] <- "runif(1, 0.5, 0.7)"
loadingVal[1:6, 3] <- "runif(1, 0.3, 0.5)"
loadingVal[7, 3] <- 1
loading.mis <- matrix("runif(1, -0.2, 0.2)", 7, 3)
loading.mis[is.na(loading)] <- 0
loading.mis[,3] <- 0
loading.mis[7,] <- 0
LY <- bind(loading, loadingVal, misspec=loading.mis)

Notice that the factor loading of Indicators 7 on Factor 3 is free with the parameter value of 1. The factor correlations are specified:

RPS <- binds(diag(3))

The regression paths among factors are specified:

path <- matrix(0, 3, 3)
path[2, 1] <- NA
BE <- bind(path, "runif(1, 0.3, 0.5)")

The error correlations are specified:

RTE <- binds(diag(7))

Importantly, the indicator variance (not measurement error variance) is specified:

VY <- bind(c(rep(NA, 6), 0), c(rep(1, 6), ""))

Similar to Example 14, the indicator variances of the first six indicators are set as free and have parameter values of 1. This means that the error variances are free and the parameter values of the error variances are the values that make the indicator variances equal 1. The last indicator, the covariate is fixed as 0. For this package, if the total indicator variance is set to 0, it means that error variance is set to 0. This feature allows users to set measurement error of 0 while allowing them to set the total variance of other variables at the same time.

The set of SEM objects are specified:

datamodel <- model(LY=LY, RPS=RPS, BE=BE, RTE=RTE, VY=VY, modelType="SEM")

The first analysis model is the model that excludes the covariate. This analysis model is specified:

loading2 <- matrix(0, 6, 2)
loading2[1:3, 1] <- NA
loading2[4:6, 2] <- NA
path2 <- matrix(0, 2, 2)
path2[2,1] <- NA
analysis1 <- estmodel(LY=loading2, BE=path2, modelType="SEM", indLab=paste("y", 1:6, sep=""))

Output1 <- sim(100, n=200, analysis1, generate=datamodel)

The second analysis model is the model used for data generation. This analysis model is specified:

model2 <- simModel(Cov.Model)
Output2 <- simResult(100, SimData, model2)

Before building the third analysis model, we need to know how to orthogonalize data and how to transform the data within the result object. First, the function used for orthogonalization is residualCovariate. This function is in the semTools library:

library(semTools)

The help page of this function can be accessed:

?residualCovariate

For example, the target data set is the attitude data set, which is provided in R by default:

head(attitude)

Example 16 head attitude

If we wish to orthogonalize the Variables 2-7 by the Variable 1 in the attitude dataset, the function can be specified:

dat <- residualCovariate(attitude, targetVar=2:7, covVar=1)
head(dat)

The first argument of the residualCovariate function is the target dataset. The second argument, targetVar, is the variables for orthogonalization. The third argument, covVar, is the covariate. Note that covariate can be more than one variable. The head function is to view only a first few rows of a dataset. Note that the second to seventh variables have been orthogonalized.

The figure below shows the screen provided by the head(dat):

Example 16 head dat

Next, we will introduce a wrapper function that will hold all specifications of a function and can be used for analysis later. For example, if we would like to hold a specification that we wish to use the residualCovariate function that the target variables are Variables 2-7 and the covariate is Variable 1. The wrapper function can be built using the simFunction function:

datafun <- function(data) {
residualCovariate(data, targetVar=2:7, covVar=1)
}

This function object can be run on the target data as

dat2 <- datafun(attitude)
head(dat2)

Example 16 head dat2

When running the wrapper function, the first argument must be the specification of data set.

As described above, the wrapper function will save a specification of a function and will used for data transformation when the package builds a result object. Thus, the third analysis model, the model using orthogonalization, is specified:

datafun <- function(data) {
residualCovariate(data, targetVar=1:6, covVar=7)
}
analysis3 <- analysis1
Output3 <- sim(100, n=200, analysis3, generate=datamodel, datafun=datafun)
summary(Output3)

Note that the analysis model is exactly the same as the first analysis model.

The fourth analysis model that accounts the covariate in the factor level is specified:

loading <- matrix(0, 7, 3)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
loading[7, 3] <- NA

path <- matrix(0, 3, 3)
path[2, 1] <- NA
path[1, 3] <- NA
path[2, 3] <- NA

errorCov <- diag(NA, 7)
errorCov[7, 7] <- 0

facCov <- diag(3)

analysis4 <- estmodel(LY=loading, BE=path, TE=errorCov, PS=facCov, modelType="SEM", 
    indLab=paste("y", 1:7, sep=""))
Output4 <- sim(100, n=200, analysis4, generate=datamodel)

Note that the results from the second and the third analysis models are similar. If we summarize the result objects from Analyses 1, 3, and 4 (which are Output1, Output3, and Output4), we will find the note that the population underlying the data generation model does not show up and we cannot find any biases in parameter estimates or standard errors. The reason is that the population parameters underlying the data generation and the analysis model are not the same (e.g., seven indicators in data generation but six indicators in analysis model). We can still view population parameters underlying the data generation process by the summaryPopulation function:

summaryPopulation(Output1)

The figure below shows the screen provided by the summaryPopulation function from Output1:

Example 16 sumPop

We can set the correct population model into the result object by the setPopulation function:

loadingVal <- matrix(0, 7, 3)
loadingVal[1:3, 1] <- 0.6
loadingVal[4:6, 2] <- 0.6
loadingVal[7, 3] <- 1
LY <- bind(loading, loadingVal)

pathVal <- matrix(0, 3, 3)
pathVal[2, 1] <- 0.4
pathVal[1, 3] <- 0.4
pathVal[2, 3] <- 0.4
BE <- bind(path, pathVal)

PS <- binds(facCov)

errorCovVal <- diag(0.64, 7)
errorCovVal[7, 7] <- 0
TE <- binds(errorCov, errorCovVal)

population <- model(LY=LY, PS=PS, BE=BE, TE=TE, modelType="SEM")
Output4 <- setPopulation(Output4, population) 
summary(Output4)

The first argument of the setPopulation function is the target result object. The second argument is the parameter values (in model template) to be input. Now, the summary function will provide bias in parameter estimates and standard errors.

The figure below shows the screen provided by the summary function from Output4 after the correct population model is put:

Example 16 summary 41

Example 16 summary 42

We may use getCutoff and plotCutoff to find the fit indices cutoff.

The figure below shows the graph provided by the plotCutoff function from Output1:

Example 16 SSD 1

The figure below shows the graph provided by the plotCutoff function from Output2:

Example 16 SSD 2

The figure below shows the graph provided by the plotCutoff function from Output3:

Example 16 SSD 3

The figure below shows the graph provided by the plotCutoff function from Output4:

Example 16 SSD 4

Here is the summary of the whole script in this example.

Function Review

  • residualCovariate Orthogonalize the target variables based on covariates
  • summaryPopulation Summarize the population model underlying the data generation process
  • setPopulation Set a new population model for a simulation study, which can be used to compute biases in parameter estimates and standard errors.
Clone this wiki locally