Skip to content

Example 18: Simulation with Varying Sample Size and Percent Missing (Version 0.2)

psunthud edited this page Dec 30, 2012 · 2 revisions

Model Description

This example will show how to make the simulation study such that both sample size and percent completely missing at random are not equal across replications. That is, the sample size is increasing from 50 to 500 by 5 and the percent completely missing at random could be 0, 0.1, 0.2, 0.3, or 0.4. Then, we will find the combination of sample size and percent completely missing at random values that the power of a given parameter is equal to .8 and the fit indices cutoff of the estimated sample size value. The model in this example is the conditional growth curve model. That is, the model is the growth curve model from Example 3. The intercept and the slope factors are predicted by a grouping variable. The grouping variable has two conditions with equal probability (the average and the variance of the binary variable are 0.5 and 0.25). The effects of the grouping variables onto the intercept and slope are 0.5 and 0.1. We will find the power in detecting these two effects.

Example 18 Model

Syntax

The factor loading object can be specified:

loading <- matrix(0, 5, 3)
loading[1,1] <- 1
loading[2:5,2] <- 1
loading[2:5,3] <- 0:3
LY <- simMatrix(loading)

The factor mean object can be specified:

facMean <- rep(NA, 3)
facMeanVal <- c(0.5, 5, 2)
AL <- simVector(facMean, facMeanVal)

The factor variance object can be specified:

facVar <- rep(NA, 3)
facVarVal <- c(0.25, 1, 0.25)
VPS <- simVector(facVar, facVarVal)

The factor correlation object can be specified:

facCor <- diag(3)
facCor[2,3] <- NA
facCor[3,2] <- NA
RPS <- symMatrix(facCor, 0.5)

The measurement error variance object can be specified:

VTE <- simVector(c(0, rep(NA, 4)), 1.2)

The measurement error correlation object can be specified:

RTE <- symMatrix(diag(5))

The measurement intercept object can be specified:

TY <- simVector(rep(0, 5))

The regression coefficient matrix object can be specified:

path <- matrix(0, 3, 3)
path[2,1] <- NA
path[3,1] <- NA
pathVal <- matrix(0, 3, 3)
pathVal[2,1] <- 0.5
pathVal[3,1] <- 0.1
BE <- simMatrix(path, pathVal)

The SEM object that represents the conditional growth curve model is specified:

LCA.Model <- simSetSEM(LY=LY, RPS=RPS, VPS=VPS, AL=AL, VTE=VTE, RTE=RTE, TY=TY, BE=BE)

The trivial misspecified SEM object that represents the deviation from linearity is specified:

u1 <- simUnif(-0.1, 0.1)
loading.trivial <- matrix(0, 5, 3)
loading.trivial[3:4, 3] <- NA
loading.mis <- simMatrix(loading.trivial, "u1")
LCA.Mis <- simMisspecSEM(LY = loading.mis)

The data distribution object representing the factor distribution can be specified:

group <- simBinom(1, 0.5)
n01 <- simNorm(0, 1)
facDist <- simDataDist(group, n01, n01, keepScale=c(FALSE, TRUE, TRUE))

The simBinom function represents the binomial distribution object. The first argument is the number of trials. The second argument is the proportion of success (or treatment group). If the number of trial is 1 in the binomial distribution, the binomial distribution will be a Bernoulli trial which provides only 0 or 1 (or dummy variable). The keepScale argument in the simDataDist function is to use the mean and standard deviation from the model or the distribution. If the keepScale argument is TRUE, the model-implied mean and standard deviation are used. If FALSE, the mean and standard deviation from the distribution are used.

The data and the model objects can be specified:

datTemplate <- simData(LCA.Model, 300, LCA.Mis, sequential=TRUE, facDist=facDist)
model <- simModel(LCA.Model)

Note that the sequential method is used and the facDist argument is specified for the factor distribution.

The result object can be specified:

Output <- simResult(NULL, datTemplate, model, n=seq(50, 500, 5), pmMCAR=seq(0, 0.4, 0.1))

The pmMCAR argument is the values of percent completely missing at random. Notice that both sample size and percent completely missing at random are vectors. The total number of replications is the product of the length of both vectors. That is, the function will run the replications of all factorial combination of sample size and percent missing completely at random. The result object can be summarized:

summary(Output)

The figure below shows the screen provided by the summary function:

Example 18 summary

Example 18 summary

The cutoffs given the value of sample size and percent completely missing at random can be plotted by the plotCutoff function:

plotCutoff(Output, 0.05)

The figure below shows the graph provided by the plotCutoff function:

Example 18 SSD

The cutoff can be plotted in a three-dimensional graph by specifying the useContour argument as FALSE:

The figure below shows the graph provided by the plotCutoff function by specifying useContour as FALSE:

Example 18 SSD2

We can also use the getCutoff functions to find the cutoff given the specific value of sample size and percent missing completely at random.

getCutoff(Output, 0.05, nVal = 200, pmMCARval=0)
getCutoff(Output, 0.05, nVal = 300, pmMCARval=0.33)	

The power of each parameter given each combination of sample size and percent missing completely at random can be obtained by the getPower function:

Cpow <- getPower(Output)

The figure below shows the first six rows of the Cpow object:

Example 18 Cpow

Cpow2 <- getPower(Output, nVal = 200, pmMCARval=0.35)

The figure below shows the Cpow2 object:

Example 18 Cpow2

The nVal and pmMCARval arguments are used to find the power of each parameter on the specific values of sample size and percent missing completely at random specifically.

The power table obtained from the getPower function can be used to find the sample size value that provides the power of 0.80 given each value of percent missing complete at random by the findPower function:

findPower(Cpow, "N", 0.80)

The figure below shows the screen provided by the findPower function for sample size:

Example 18 findpower

The percent missing completely at random value that provides the power 0.80 given each value of sample size can be calculated:

findPower(Cpow, "MCAR", 0.80)

The figure below shows the screen provided by the findPower function for percent missing completely at random:

Example 18 findpower2

The power graphs of the regression coefficients from the grouping variable against the sample size and percent completely missing at random can be built by the plotPower function:

plotPower(Output, powerParam=c("BE2_1", "BE3_1"))

The figure below shows the graph provided by the plotPower function:

Example 18 plotpower

The power can be plotted in a three-dimensional graph by specifying the useContour argument as FALSE:

plotPower(Output, powerParam=c("BE2_1", "BE3_1"), useContour=FALSE)

The figure below shows the graph provided by the plotPower function by specifying useContour as FALSE:

Example 18 plotpower2

Here is the summary of the whole script in this example.

Remarks

The sample size and percent missing completely at random can be specified as a distribution object and use the number of replications argument (the first argument) as the number of drawn from the distribution object. For example, Line 53 can be changed as

Output <- simResult(1000, datTemplate, model, n=simUnif(50, 500), pmMCAR=simUnif(0, 0.4))
Clone this wiki locally