Skip to content

Example 1: Getting Started (Version 0.2)

psunthud edited this page Dec 30, 2012 · 2 revisions

Model Description

The first example is confirmatory factor analysis (CFA) model with two factors and three indicators each. Factor loadings are .7. Error variances are .51 to make the indicator variances equal to 1. The factor correlation is .5.

Example 1 Model

When using simsem, users specify models in a LISREL matrix format. To specify a CFA model, three matrices are required: the factor loading, error covariance, and factor covariance matrices. This package has two options to specify error covariance and factor covariance matrices. First, similar to the LISREL format, matrices can represent covariance matrices. Second, unlike LISREL, covariance matrices can be separated into a correlation matrix and a variance vector. The second option is a useful feature in this package such that standardized parameter values can be directly specified in the code. The second option will be introduced first in this example. Then, the option for specifying the covariance matrix will be introduced in the second example.

Syntax

When creating matrices for data generation and analysis, simsem will create a matrix object. Matrix objects have two components: free/fixed parameters and popultation parameter/starting values. In the free/fixed parameters part, the elements of the matrix can be divided into two types: NA and numbers. NA means that the element is freely estimated in the model. Numbers means that the element is fixed as a specified number, usually as 0. For example, with fixed factor method for scaling identification, factor loading matrix parameters will be NA in elements (1,1), (2,1), (3,1), (4,2), (5,2), and (6,2) (these are the estimated factor loadings). Other elements in the matrix are 0. This can be scripted in R as:

loading <- matrix(0, 6, 2) #create a matrix of all 0s
loading[1:3, 1] <- NA #specify free paramters with NA
loading[4:6, 2] <- NA

The second part of the matix object is a marix of population parameter values (for data generation) or starting values (for data analysis) of the free parameters. In data simulation, these population parameters will be used as the data generation model. The elements of this matix can be a number that provides the same population value across replications or a distribution object for a random parameter (which will be clarified in the third example). In this example, all population parameters/starting values of the factor loading matrix are 0.7. Thus, a new matrix with 6 rows and 2 columns is created and the (1,1), (2,1), (3,1), (4,2), (5,2), and (6,2) elements are specified as 0.7 (the factor loadings). The R script is

loadingValues <- matrix(0, 6, 2)  
loadingValues[1:3, 1] <- 0.7  
loadingValues[4:6, 2] <- 0.7

Next, combine two parts to create the factor loading matrix object by the simMatrix command:

LX <- simMatrix(loading, loadingValues)

If the parameter/starting values of a matrix are the same for all parameters, instead of specifying as a matrix, one parameter/starting value can be put in the simMatrix command:

LX <- simMatrix(loading, 0.7)

Alternatively, users can simply specify a matrix of parameter values. In this case, all population parameters with values of 0 or 1 are assumed to be fixed parameters in the model and all population parameters with values other than 0 and 1 are assumed to be fixed parameters:

LX <- simMatrix(value=loadingValues)

Users can view all specifications in a matrix object by using the summary function by:

summary(LX)

In the population parameters/starting values portion, if an element is not free, the population parameters/starting value will be automatically set as blanks.

As explained above, the covariance matrix can be separated into two parts: a vector of error variances (or indicator variances) and error correlations. By default, indicator variances (as well as factor variances, which will be described later) are set to 1. Thus, the factor loading can be interpreted as standardized factor loadings. The error variances by default are free parameters. From this example, the error variances are .51, which implies that indicator variances are 1 (i.e., .7×1×.7 + .51). Therefore, we will not set any error variances (or any indicator variances) and use the program default by skipping the error-variance specification and set only error correlations. There are no error correlations in this example; therefore, the error correlation matrix is set to be an identity matrix without any free parameters:

error.cor <- matrix(0, 6, 6)
diag(error.cor) <- 1

Because there are no free parameters in the error correlation matrix, parameter/starting values are not applicable. Next, make the error correlation matrix a symmetric matrix object by using the symMatrix function:

RTD <- symMatrix(error.cor)

The symMatrix command is similar to the simMatrix command. The main difference is that symMatrix limits users control over free parameters and constants such that the elements above and below the diagonal line are the same (i.e., symmetric). The population parameter/starting values are not required with the symMatrix command (as well as with simMatrix) so there is only one attribute of the free parameters in this function.

The last matrix is the factor covariance matrix. Again, the factor covariance matrix can be separated into two parts: a vector of factor variances (or factor residual variances) and a matrix of factor correlation (or factor residual correlation). The default in this program is that the factor variances are constrained to be 1. All exogenous and endogenous factor variances are fixed parameters (i.e., fixed factor method of scale identification). Therefore, the only thing we need to specify is the factor correlation. For all correlation matrices, the diagonal elements are 1. In this model, we allow the only one element of factor correlation to be freely estimated and have the parameter/starting value of 0.5. Thus, latent correlation matrix is specified as:

latent.cor <- matrix(NA, 2, 2) #specify a 2x2 matrix of NAs
diag(latent.cor) <- 1 #set the diagonal of the matrix to 1

The symmetric matrix object is created for this factor correlation by

RPH <- symMatrix(latent.cor, 0.5) #Defaults to making all NA values in the matrix .5

At this point, all required matrices for CFA are specified. The next step is to create an object containing the set of matrices (the factor loading matrix, the factor correlation matrix, and the error correlation matrix). This example uses CFA; therefore, the simSetCFA function will be used. The R script will be:

CFA.Model <- simSetCFA(LX = LX, RPH = RPH, RTD = RTD)

Similar to the LISREL notation, LX means the factor loading matrix, RPH means the factor correlation matrix, and RTD means the error correlation matrix. You may notice that RPH and RTD begin with ‘R’ to be indicated as correlation matrices. This step will apply all defaults of this package, in this case: free error variances and to fix factor variances to 1. This default can be seen by the summary function:

summary(CFA.Model)

The summary function will show all parameters/starting values in the models, including all defaults. You may notice that all X-side in LISREL notation are changed to the Y-side notation automatically. The simSetCFA can be specified as the Y-side also as

CFA.Model <- simSetCFA(LY = LX, RPS = RPH, RTE = RTD)

This set of all CFA matrices will be used to create a data-generation object (a data object) and an analysis-model object (a model object) in order to create a set of simulated data and analyze the set of simulated data later. The data and model objects do not need to have the same set of matrices (e.g., data can be generated with a CFA and analyzed with an SEM). However, in this example, we will use the same set of matrices, the CFA model with two factors with three indicators each without any additional constraints, in both data generation (the data object) and analysis of simulated data (the model object). Before simulating and analyzing data the data and model object must be created.

First, the data object is specified with the simData function:

SimData <- simData(CFA.Model, 200)

The first argument is the matrix set (the simsetCFA object). The second argument is a desired sample size, which is 200 in this example. You can see the specification of the data object by the summary function as well. From this, you are ready to simulate data by using the run command:

run(SimData) #This will print a single simulated data set to the screen

To save a single simulated dataset:

Sample <- run(SimData)

Next, the model object is specified with the simModel function:

SimModel <- simModel(CFA.Model)

In the future, models will be fit with various SEM packages. Currently, the model can be only fit with the lavaan package, which is the default of this program. You may see the specification of this model object with the summary function also. You may run the saved data with this model object by the run function:

out <- run(SimModel, Sample) #where sample is a saved dataset to be fit

Results can be summarized with:

summary(out)

The figure below shows the output of the summary function:

Example 1 Model Output of Single Dataset

The simulated data was analyzed by the specified CFA model. All fit indices, parameter estimates, standard errors, and Wald statistics will be printed to the screen. Finally, we need to use the data object and the model object to create many simulated data sets and analyze each one. The result object is used to run a simulation. We can create the result object by the simResult function:

Output <- simResult(1000, SimData, SimModel)

The first attribute is the number of replications. The second attribute is the data object (population parameters) to be used. The third attribute is the model object (analysis model specification) to be used. After submitting this command, the program will simulate 1000 datasets and analyze each of the datasets by the specified model. The result object contains all information from each replication in the simulation and can be used to find Simulated Sampling Distributions for fit indices (SSDs), or summarize results from the simulation.

The result object contains all fit indices values that are ready for creating the SSD. You can find a fit indices cutoff based on the percentile point of the SSD. For example, we wish to find the 95th percentile (alpha level = .05). The getCutoff function can be used:

getCutoff(Output, 0.05)

The first argument is the result object. The second argument is the alpha level. You can see the SSD with the cutoffs in a set of figures:

plotCutoff(Output, 0.05)

The figure below shows the graph provided by the plotCutoff function:

Example 1 SSD

The result object is set in a specific seed number. Therefore, the SSD is expected to be the same. The seed number could be changed by adding the seed argument in the simResult function:

Output <- simResult(1000, SimData, SimModel, seed=751785)

If users who have a computer with multiple processors, this package can ask R to run with multiple processors by setting the multicore argument as TRUE:

Output <- simResult(1000, SimData, SimModel, multicore=TRUE)

The default is to use the maximum numbers of the processors in the machine. The users can specify their desired number of processors by adding the numProc argument:

Output <- simResult(1000, SimData, SimModel, multicore=TRUE, numProc=2)

The summary of the result object contains common results of interest from the simulation:

summary(Output)

The figure below shows the output of the summary function:

Example 1 Result Object Summary

The summary on the screen has mainly two sections: the fit indices cutoffs based on each alpha level and the summary of parameter estimates and standard errors. For the cutoffs, note that the larger the alpha level, the more lenient the cutoffs are. For the parameter estimates and the standard errors, there are seven columns provided:

  1. Estimate.Average: Average of parameter estimates
  2. Estimate.SD: Standard deviation of parameter estimates (the empirical standard errors)
  3. Average.SE: Average of standard errors of each parameter estimate
  4. Power: The proportion of significant parameter estimates
  5. Average.Param: Population parameter values underlying simulated data
  6. Average.Bias: Average relative bias of parameter estimates (relative bias = (population parameter - average parameter estimate)/ population parameter
  7. Coverage: Proportion replication with confidence intervals containing the population parameter values.

Note that columns 5-7 are not provided if users provide a list of data frames instead of data object in the simResult function (providing simulated data instead of a SimData object). Also, those values in columns 5-7 have different meanings when parameters are treated as random, which are shown in the Example 4.

If users want the parameter estimates and the standard errors of all replications only, the summaryParam function can be used:

summaryParam(Output)

The figure below shows the output of the summaryParam function:

Example 1 Result Object Summary Parameter

Users can round the number in the summaryParam function:

round(summaryParam(Output), 3)

Here is the summary of the whole script in this example.

Remarks

Error and factor variances

Users may want to explicitly specify the error variances and the factor variances. This can be done by changing Lines 15-16 to:

error.var <- rep(NA, 6)
VTD <- simVector(error.var, 0.51)  
  
factor.var <- rep(1, 2)
VPH <- simVector(factor.var)  
  
CFA.Model <- simSetCFA(LX = LX, RPH = RPH, RTD = RTD, VTD = VTD, VPH = VPH)

where VTD (or VTE) is the vector of the error variance and VPH (or VPS) is the vector of the factor variance

Indicator and factor intercepts

Users may want to include the indicators intercepts or the factor intercepts (or means) by changing Lines 15-16 to

intercept <- rep(NA, 6)
TX <- simVector(intercept, 0)  
  
factor.mean <- rep(0, 2)
KA <- simVector(factor.mean)  
  
CFA.Model <- simSetCFA(LX = LX, RPH = RPH, RTD = RTD, TX = TX, KA = KA)

where TX (or TY) is the vector of the indicator intercepts and KA (or AL) is the vector of the factor intercepts

Indicator variances

simsem can directly specify the indicator variances (instead of the error variances) by

indicator.var <- rep(NA, 6)
VX <- simVector(indicator.var, 1)  
  
CFA.Model <- simSetCFA(LX = LX, RPH = RPH, RTD = RTD, VX = VX)

where VX (or VY) is the vector of the indicator variances. You cannot specify the error variances of indicators and the overall indicators variances at the same time.

Indicator means

simsem can directly specify indicator means (instead of measurement intercepts) by

indicator.mean <- rep(NA, 6)
MX <- simVector(indicator.mean, 0)  
  
CFA.Model <- simSetCFA(LX = LX, RPH = RPH, RTD = RTD, MX = MX)

where MX (or MY) is the vector of indicator means. Users cannot specify indicator intercepts and the overall indicators means at the same time.

More details in parameter summary

In the summaryParam function, relative bias, standardized bias, and relative bias in standard errors can be calculated by setting the detail argument as TRUE.

summaryParam(Output, detail=TRUE)

The figure below shows the output of the summaryParam function with more details:

Example 1 Result Object Summary Parameter Detail

Details of how they are calculated are available in help file of the summaryParam function:

?summaryParam 

Function Review

  • simMatrix Create matrix object
  • symMatrix Create symmetric matrix object
  • simSetCFA Create set of matrices for CFA
  • simData Create data template in order to simulate data
  • simModel Create analysis model
  • run Run all objects in the simsem package
  • summary Summarize all objects in the simsem package
  • simResult Create result of simulation
  • getCutoff Get the fit indices cutoff with a priori alpha level
  • plotCutoff Plot the sampling distribution of fit indices
  • summaryParam Summary parameter estimates and standard errors
Clone this wiki locally