## Latent Change Score Models (LCSMs)

### 1. Introduction
LCSMs represent a powerful extension of the manifest-level change score models covered in the previous tutorial. The main difference between these two classes of models is the inclusion of **latent variables** for which we aim to estimate the change. The distinction between manifest (observed) and latent variables is crucial as it represents one of the core advantages of using structural equation modeling (SEM) in general. When estimating latent variables measured by a set of indicators (observed variables at the measurement level), we are correcting for **measurement error**. In other words, by including latent variables in our analyses, we achieve results at the "true" score level. Consequently, with LCSMs we can assess the change of the true score on a given construct over time. 


In [2]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from semopy import Model, ModelMeans, semplot, calc_stats
from semopy import ModelMeans
from semopy.means import estimate_means

### 2. Working Example 1

`fairplayer` is a data frame that contains information from a research project targeting **bullying** in a school: The variable names consist of one letter indicating the measurement source (s: self-report, t: teacher-report), two letters indicating each construct being measured (EM: empathy, RA: relational aggression, SI: social intelligence), a number indicating the item number on the scale, and a "t" followed by a number indicating the measurement occasion (time-point) (for more details see https://rdrr.io/cran/stuart/man/fairplayer.html). In addition, the variables IGS (short intervention) and IGL (long intervention) represent the kind of intervention groups to which the students belonged. These variables are dummy coded with 1 representing membership and 0 representing no membership. Thus, students with a 0 in both variables belong to the control group and did not experience any intervention. 

In other words, we have a data frame including 3 outcome variables (EM: empathy, RA: relational aggression, SI: social intelligence) measured by 2 different methods (self-report or teacher-report). 3 time-points (measurement occasions) for 3 different groups (1 control and 2 intervention groups). 

Please inspect the data frame to get familiar with the information that is included. Understanding the information in this data frame will help you to follow the modeling procedure. 

In [3]:
# Read the CSV file using a relative path
fairplayer = pd.read_csv("../LSCM_and_GCM/Datasets/fairplayer.csv")

# Display the first few rows of the dataframe
print(fairplayer.head())

   ratee  IGL  IGS  sEM01at1  sEM02at1  sEM03at1  tEM01at1  tEM02at1  \
1    1.0  1.0  0.0  3.666667  3.333333       4.5  4.666667  3.666667   
2    2.0  1.0  0.0  3.666667  4.333333       3.5  4.333333  4.000000   
3    3.0  1.0  0.0  3.000000  2.666667       2.5  4.333333  3.666667   
4    4.0  1.0  0.0  4.666667  4.666667       5.0  3.333333  3.333333   
5    5.0  1.0  0.0  3.000000  3.000000       3.5  1.333333  2.333333   

   tEM03at1  sEM01at2  ...  sRA03at2  tRA01at2  tRA02at2  tRA03at2  sRA01at3  \
1       5.0  3.666667  ...       1.0       1.0       2.0       1.0       1.0   
2       4.0  4.666667  ...       1.0       1.0       2.0       1.0       1.0   
3       4.0  2.666667  ...       1.0       1.0       1.5       1.0       1.5   
4       3.0  4.666667  ...       1.0       2.5       3.0       1.0       1.0   
5       1.0  3.000000  ...       1.0       2.0       3.0       2.0       1.5   

   sRA02at3  sRA03at3  tRA01at3  tRA02at3  tRA03at3  
1       2.5       1.0       1.0 

### 3. Model specification

Recall what you learned in the last tutorial. The only difference in specification of LCSMs as compared to CSMs is that we need to define latent variables measured by their multiple indicators (thus, the measurement models). Accordingly, we must specify factor loadings for the construct of interest. In addition, the latent **- now second order -** change score factor will be regressed onto the time 1 latent factor and explain variance in the time 2 latent factor. 

In the following example we are interested in assessing the change in *empathy* from time-point 1 to time-point 2. We fir the model using `semopy`

In [6]:
# Read the CSV file using a relative path
fairplayer = pd.read_csv("../LSCM_and_GCM/Datasets/fairplayer.csv")

# Define the latent change score model with comments
lcsm = '''
# Defining a latent variable representing the construct Empathy at time 1 
emp1 =~ sEM01at1 + sEM02at1 + sEM03at1  

# Defining a latent variable representing the construct Empathy at time 2
emp2 =~ sEM01at2 + sEM02at2 + sEM03at2 

# Fixing the change score loading to 1
change =~ 1*emp2

# Fixing the regression of time 2 on time 1 to 1
emp2 ~ 1*emp1

# Fixing post-intervention score residual variance to 0
emp2 ~~ 0*emp2  

# Fixing the intercept of one indicator per time-point to 0 to identify the mean structure
sEM01at2 ~ 0*1
sEM01at1 ~ 0*1

# Freely estimate the means of the change score phantom variable and the baseline (time 1). Per default, these are fixed to zero in lavaan
#change ~ 1 
#emp1 ~ 1

# Specify a covariance between the change score latent variable and the baseline (time 1)
change ~~ emp1
'''

# Create and fit the model
model = ModelMeans(lcsm)

# Fit the model to the data
results = model.fit(fairplayer)

# Print the standardized estimates
estimates = model.inspect()
print(estimates)

# Print the fit measures
fit_measures = calc_stats(model).T
print(fit_measures)

        lval  op      rval   Estimate  Std. Err    z-value   p-value
0       emp2   ~    change   1.000000         -          -         -
1       emp2   ~      emp1   1.000000         -          -         -
2   sEM01at1   ~      emp1   1.000000         -          -         -
3   sEM02at1   ~      emp1   0.741699  0.013576  54.633406       0.0
4   sEM03at1   ~      emp1   0.910383  0.015046  60.507091       0.0
5   sEM01at2   ~      emp2   1.000000         -          -         -
6   sEM02at2   ~      emp2   0.841642  0.012132   69.37182       0.0
7   sEM03at2   ~      emp2   0.871745  0.014296  60.980294       0.0
8   sEM01at2   ~         1   0.000000         -          -         -
9   sEM01at1   ~         1   0.000000         -          -         -
10  sEM02at1   ~         1   0.936956  0.049753   18.83228       0.0
11  sEM02at2   ~         1   0.440179   0.04456   9.878345       0.0
12  sEM03at1   ~         1   0.258395  0.054129   4.773681  0.000002
13  sEM03at2   ~         1   0.405

### 4. Output Interpretation

The interpretation very similar to the one correponding to CSMs:

* The "change" Intercept estimate (0.083) informs us about the change in empathy between the 2 measurement timepoints. This indicates that there was an increase in empathy on average, but it was not significant (p = 0.238).

* The variance estimate of "change" corresponds to 0.095 and is significant (p = 0.002), indicating that individuals substantially differ in their empathy change over time. 

* A negative covariance between "change" and emp1 (-0.036) means that students with lower empathy at time 1 experienced stronger change than those with higher empathy scores. However, this covariance is not significant (p = 0.370).

* Finally, the intercept estimate of emp1 tells us that at time 1, the average empathy score in the sample was 3.883.

### 5. Autoregressive effects and measurement invariance

Whenever measuring a construct on more than one occasion and across more than one group, it is important to assess **measurement invariance** over time and group. That is, we have to make sure that we are measuring the same construct in all groups and measurement time-points. A failure to establish measurement invariance may compromise the results, because direct comparisons are doubtful if the construct measurement turns out to be non-equivalent (see lecture Test Theory and Test Construction for details, explaining that apples and oranges are not directly comparable). LCSMs allow us to test  measurement invariance by inferentially comparing models in which measurement invariance is assumed versus not assumed. 

In the following model specification, we assume measurement invariance. We thus fix factor loadings of the same indicator over time to equivalence. Furthermore, by correlating each indicator's error terms with itself in the next measurement time-point, we can account for autoregressive effects that are due to the measurement repetition and not to the stability of the construct over time. 

In [None]:
# Define the latent change score model with comments
lcsm_b = '''
# Defining a latent variable representing empathy at time 1 
emp1 =~ a*sEM01at1 + b*sEM02at1 + c*sEM03at1 

# Defining a latent variable representing empathy at time 2
# a*, b* and c* indicate that the loading of the same indicator over time is fixed to equivalence with time 1
emp2 =~ a*sEM01at2 + b*sEM02at2 + c*sEM03at2 

# Fixing the loading of the change score to 1
change =~ 1*emp2

# Fixing the regression of time 2 on time 1 empathy to 1
emp2 ~ 1*emp1

# Fixing the residual variance of the post-intervention score to 0
emp2 ~~ 0*emp2  

# Fixing the intercept of one indicator per timepoint to 0 in order to identify the mean structure
sEM01at2 ~ 0*1
sEM01at1 ~ 0*1

# Freely estimate phantom variable and baseline means. Per default, lavaan would fix them to 0
change ~ 1 
emp1 ~ 1

# Include covariance between change score variable and baseline
change ~~ emp1

# Autoregressive effects
sEM01at1 ~~ sEM01at2
sEM02at1 ~~ sEM02at2
sEM03at1 ~~ sEM03at2
'''

# Create and fit the model
model = ModelMeans(lcsm_b)

# Fit the model to the data
results = model.fit(fairplayer)

# Print the standardized estimates
estimates = model.inspect()
print(estimates)

# Print the fit measures
fit_measures = calc_stats(model).T
print(fit_measures)

### 6. Output Interpretation

The fit indices reveal that the model that accounts for **autoregressive effects** over time fits the data better than the one without such effects (see the model *lcsm*). Fit indices such as Chi-square, comparative fit index (CFI) and Tucker-Lewis Index (TLI) improved when accounting for autoregressive effects. Additionally, in the above specified model we assumed loading invariance which did not significantly deteriorate the fit, so we can assume measurement invariance.