CFA (Confirmatory Factor Analysis) and EFA (Exploratory Factor Analysis) lies in their purpose, assumptions, and approach to modeling latent variables.

| Aspect | EFA (Exploratory Factor Analysis) | CFA (Confirmatory Factor Analysis) |
|--------|----------------------------------|-----------------------------------|
| Purpose | To explore possible underlying factor structures without prior assumptions. | To test a hypothesized factor structure based on theory or prior evidence. |
| When Used | Early in research, when you don’t know how variables group together. | Later in research, when you already have a model or theory to verify. |
| Model Specification | No predefined factor structure — the algorithm determines which variables load on which factors. | The researcher specifies how many factors exist and which variables load on each. |
| Cross-loadings | Variables can load on multiple factors (the model finds the best structure). | Usually, each variable loads only on its predefined factor (cross-loadings are constrained to zero). |
| Assumptions | Minimal — used for discovery. | Strong — used to confirm a specific model. |
| Rotation | Rotations (orthogonal or oblique) are often used to improve interpretability. | No rotation — loadings are fixed as per the model. |
| Fit Indices | Not typically used (focus is on loadings). | Model fit indices (e.g., CFI, RMSEA, χ²) are key for evaluating how well the model fits the data. |
| Software Output Focus | Factor loadings, eigenvalues, explained variance. | Model fit indices, standardized loadings, modification indices. |

In [1]:
import pandas as pd
import os
from factor_analyzer import FactorAnalyzer
from semopy import Model, calc_stats


# EFA (Exploratory Factor Analysis)

In [2]:
file_path = "../data/AnxietyQuestionnaireData.csv"
if not os.path.exists(file_path):
    raise FileNotFoundError(f"{file_path} not found. Make sure you saved the CSV correctly.")

# load the data
data = pd.read_csv(file_path)

In [3]:
data

Unnamed: 0,Q01,Q02,Q03,Q04,Q05,Q06,Q07,Q08,Q09,Q10,...,Q14,Q15,Q16,Q17,Q18,Q19,Q20,Q21,Q22,Q23
0,4,5,2,4,4,4,3,5,5,4,...,4,4,3,5,4,3,4,4,4,1
1,5,5,2,3,4,4,4,4,1,4,...,3,2,3,4,4,3,2,2,2,4
2,4,3,4,4,2,5,4,4,4,4,...,2,4,3,4,3,5,2,3,4,4
3,3,5,5,2,3,3,2,4,4,2,...,3,3,3,4,2,4,2,2,2,3
4,4,5,3,4,4,3,3,4,2,4,...,4,4,4,4,3,3,2,4,2,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2566,3,5,4,4,4,5,4,3,4,4,...,3,3,3,3,4,4,3,3,4,2
2567,3,5,4,3,2,2,2,3,4,5,...,2,2,1,1,3,5,1,2,3,5
2568,3,5,4,3,3,4,4,4,5,2,...,4,4,3,4,4,4,2,3,4,2
2569,3,5,3,4,2,4,3,2,5,3,...,3,3,3,3,4,4,2,2,4,3


In [4]:
# run EFA
fa = FactorAnalyzer(n_factors=4, rotation='varimax')
fa.fit(data)

# show factor loadings
loadings = pd.DataFrame(fa.loadings_, index=data.columns, columns=['Factor1','Factor2','Factor3','Factor4'])
print(loadings.round(2))

     Factor1  Factor2  Factor3  Factor4
Q01     0.50     0.22     0.27    -0.00
Q02    -0.21    -0.03     0.01     0.46
Q03    -0.50    -0.18    -0.16     0.40
Q04     0.53     0.28     0.25    -0.03
Q05     0.44     0.27     0.19    -0.05
Q06     0.05     0.75     0.12    -0.10
Q07     0.37     0.56     0.16    -0.13
Q08     0.22     0.15     0.76    -0.00
Q09    -0.13    -0.07     0.06     0.56
Q10     0.14     0.38     0.14    -0.12
Q11     0.24     0.27     0.69    -0.17
Q12     0.51     0.40     0.11    -0.15
Q13     0.29     0.56     0.23    -0.14
Q14     0.39     0.48     0.15    -0.13
Q15     0.28     0.38     0.25    -0.20
Q16     0.54     0.28     0.24    -0.16
Q17     0.30     0.27     0.64    -0.05
Q18     0.37     0.61     0.14    -0.13
Q19    -0.28    -0.15    -0.06     0.38
Q20     0.46     0.03     0.09    -0.20
Q21     0.59     0.26     0.15    -0.15
Q22    -0.03    -0.16    -0.07     0.46
Q23     0.03    -0.04    -0.07     0.33




In [6]:
# define CFA model in lavaan-style syntax
model_desc = """
SA =~ Q01 + Q03 + Q04 + Q05 + Q09 + Q16 + Q17 + Q20 + Q21 + Q23
CSA =~ Q02 + Q06 + Q07 + Q10 + Q12 + Q13 + Q14 + Q15 + Q18 + Q19 + Q22 
MA =~ Q08 + Q11 + Q16 + Q17 + Q20 + Q21
SA =~ Q02 + Q04 + Q09 + Q12 + Q19 + Q22 + Q23
"""

ERROR! Session/line number was not unique in database. History logging moved to new session 9


In [7]:
# fit CFA
model = Model(model_desc)
model.fit(data)

SolverResult(fun=np.float64(1.0261844793559964), success=True, n_it=208, x=array([-1.40685697e+00,  1.22933519e+00,  1.08478470e+00, -7.29038422e-01,
        1.22135672e+00,  5.93672996e-02,  2.35842535e-01,  8.75566185e-01,
        1.03997956e+00, -7.63341678e-02,  1.34976307e+00, -1.28164223e-02,
       -2.75973007e-01, -4.98281003e-01,  6.76539199e-01, -1.90302010e+01,
       -9.01125717e-01,  1.15131004e+00, -3.16341845e-01,  8.82725647e+00,
       -4.40344172e+01, -4.78668264e+01, -2.34092872e+01, -4.09400886e+01,
       -4.11807832e+01, -3.51443177e+01, -4.84741931e+01,  1.07533965e+00,
        2.57755149e-04, -6.62679924e-03, -6.34691970e-03,  4.46428256e-01,
        4.46745549e-01,  6.58904889e-01,  6.83524755e-01,  5.39264894e-01,
        6.49859942e-01,  7.57539921e-01,  6.24142280e-01,  3.14588557e-01,
        1.46816146e+00,  6.27471409e-01,  2.58740518e-01,  4.74190519e-01,
        4.65489960e-01,  5.58879955e-01,  7.02658208e-01,  4.52156764e-01,
        3.41055037e-01,  

In [8]:
estimates = model.inspect(std_est=True)
print(estimates.columns)

Index(['lval', 'op', 'rval', 'Estimate', 'Est. Std', 'Std. Err', 'z-value',
       'p-value'],
      dtype='object')


In [9]:
print(estimates[['lval', 'op', 'rval', 'Estimate', 'Est. Std']])

   lval  op rval   Estimate  Est. Std
0   Q01   ~   SA   1.000000  0.589747
1   Q03   ~   SA  -1.406857 -0.638931
2   Q04   ~   SA   1.229335  0.632735
3   Q05   ~   SA   1.084785  0.548983
4   Q09   ~   SA  -0.729038 -0.281777
5   Q16   ~   SA   1.221357  0.650931
6   Q16   ~   MA   0.059367  0.043312
7   Q17   ~   SA   0.235843  0.130265
8   Q17   ~   MA   0.875566  0.662009
9   Q20   ~   SA   1.039980  0.490194
10  Q20   ~   MA  -0.076334 -0.049253
11  Q21   ~   SA   1.349763  0.669205
12  Q21   ~   MA  -0.012816 -0.008698
13  Q23   ~   SA  -0.275973 -0.129082
14  Q02   ~   SA  -0.498281 -0.285706
15  Q02   ~  CSA   1.000000  0.018860
16  Q12   ~   SA   0.676539  0.360294
17  Q12   ~  CSA -19.030201 -0.333354
18  Q19   ~   SA  -0.901126 -0.399451
19  Q19   ~  CSA   1.151310  0.016787
20  Q22   ~   SA  -0.316342 -0.148354
21  Q22   ~  CSA   8.827256  0.136165
22  Q06   ~  CSA -44.034417 -0.630479
23  Q07   ~  CSA -47.866826 -0.697269
24  Q10   ~  CSA -23.409287 -0.428655
25  Q13   ~ 

In [10]:
fit_stats = calc_stats(model)
print("\nModel fit indices:")
for k, v in fit_stats.items():
    print(f"{k}: {v}")


Model fit indices:
DoF: Value    219
Name: DoF, dtype: int64
DoF Baseline: Value    253
Name: DoF Baseline, dtype: int64
chi2: Value    2638.320296
Name: chi2, dtype: float64
chi2 p-value: Value    0.0
Name: chi2 p-value, dtype: float64
chi2 Baseline: Value    19406.19916
Name: chi2 Baseline, dtype: float64
CFI: Value    0.873686
Name: CFI, dtype: float64
GFI: Value    0.864048
Name: GFI, dtype: float64
AGFI: Value    0.842941
Name: AGFI, dtype: float64
NFI: Value    0.864048
Name: NFI, dtype: float64
TLI: Value    0.854075
Name: TLI, dtype: float64
RMSEA: Value    0.065563
Name: RMSEA, dtype: float64
AIC: Value    111.947631
Name: AIC, dtype: float64
BIC: Value    445.514493
Name: BIC, dtype: float64
LogLik: Value    1.026184
Name: LogLik, dtype: float64
