# Confirmatory Factor Analysis

[Confirmatory Factor Analysis (CFA)](https://en.wikipedia.org/wiki/Confirmatory_factor_analysis) is a type of [Factor Analysis](https://en.wikipedia.org/wiki/Factor_analysis). Factor analysis can be used to establish whether the variability in observed variables (e.g. responses to personality questionnaire items) can be explained by a lower number of underlying factors (e.g. personality traits). CFA in particular can be used to confirm whether a priori specified factors are consistent with observed variables. For this dataset, CFA can be used to determine whether the five personality traits are associated with the questionnaire items that were designed to measure each trait in question. 

In [1]:
import pandas as pd
from factor_analyzer import ConfirmatoryFactorAnalyzer, ModelSpecificationParser

In [2]:
extraversion_items = ["EXT" + str(i) for i in range(1,11)]
neuroticism_items = ["EST" + str(i) for i in range(1,11)]
agreeableness_items = ["AGR" + str(i) for i in range(1,11)]
conscientiousness_items = ["CSN" + str(i) for i in range(1,11)]
openness_items = ["OPN" + str(i) for i in range(1,11)]

all_items = extraversion_items + neuroticism_items + agreeableness_items + conscientiousness_items + openness_items

In [3]:
df = pd.read_csv('./data/cleaned_data.csv')

In [4]:
df.isnull().sum()

Unnamed: 0                  0
EXT1                     1141
EXT2                     4926
EXT3                     1141
EXT4                     5445
                         ... 
endelapse                   0
IPC                         0
country                    67
lat_appx_lots_of_err        0
long_appx_lots_of_err       0
Length: 111, dtype: int64

In [5]:
subset_df = df.copy()
subset_df = subset_df[all_items]
subset_df.dropna(inplace=True)

In [6]:
model_dict = {
    "extraversion": extraversion_items,
    "neuroticism": neuroticism_items,
    "agreeableness": agreeableness_items,
    "conscientiousness": conscientiousness_items,
    "openness": openness_items
}
model_dict

{'extraversion': ['EXT1',
  'EXT2',
  'EXT3',
  'EXT4',
  'EXT5',
  'EXT6',
  'EXT7',
  'EXT8',
  'EXT9',
  'EXT10'],
 'neuroticism': ['EST1',
  'EST2',
  'EST3',
  'EST4',
  'EST5',
  'EST6',
  'EST7',
  'EST8',
  'EST9',
  'EST10'],
 'agreeableness': ['AGR1',
  'AGR2',
  'AGR3',
  'AGR4',
  'AGR5',
  'AGR6',
  'AGR7',
  'AGR8',
  'AGR9',
  'AGR10'],
 'conscientiousness': ['CSN1',
  'CSN2',
  'CSN3',
  'CSN4',
  'CSN5',
  'CSN6',
  'CSN7',
  'CSN8',
  'CSN9',
  'CSN10'],
 'openness': ['OPN1',
  'OPN2',
  'OPN3',
  'OPN4',
  'OPN5',
  'OPN6',
  'OPN7',
  'OPN8',
  'OPN9',
  'OPN10']}

In [7]:
model_spec = ModelSpecificationParser.parse_model_specification_from_dict(subset_df, model_dict)

In [8]:
cfa = ConfirmatoryFactorAnalyzer(model_spec)
cfa.fit(subset_df.values)

ConfirmatoryFactorAnalyzer(bounds=None, disp=True, impute='median',
                           is_cov_matrix=False, max_iter=200, n_obs=653737,
                           specification=<factor_analyzer.confirmatory_factor_analyzer.ModelSpecification object at 0x1172fc760>,
                           tol=None)

In [9]:
cfa.loadings_

array([[0.85472061, 0.        , 0.        , 0.        , 0.        ],
       [0.92600351, 0.        , 0.        , 0.        , 0.        ],
       [0.824874  , 0.        , 0.        , 0.        , 0.        ],
       [0.90207912, 0.        , 0.        , 0.        , 0.        ],
       [0.94320951, 0.        , 0.        , 0.        , 0.        ],
       [0.7165997 , 0.        , 0.        , 0.        , 0.        ],
       [1.01336003, 0.        , 0.        , 0.        , 0.        ],
       [0.71091376, 0.        , 0.        , 0.        , 0.        ],
       [0.81598097, 0.        , 0.        , 0.        , 0.        ],
       [0.89664952, 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.90646138, 0.        , 0.        , 0.        ],
       [0.        , 0.65891404, 0.        , 0.        , 0.        ],
       [0.        , 0.66207573, 0.        , 0.        , 0.        ],
       [0.        , 0.51383007, 0.        , 0.        , 0.        ],
       [0.        , 0.64670994, 0.

In [10]:
cfa.factor_varcovs_

array([[1.        , 0.23653951, 0.36029776, 0.09568473, 0.22620274],
       [0.23653951, 1.        , 0.01264952, 0.30831719, 0.11609876],
       [0.36029776, 0.01264952, 1.        , 0.12359573, 0.12756619],
       [0.09568473, 0.30831719, 0.12359573, 1.        , 0.05360468],
       [0.22620274, 0.11609876, 0.12756619, 0.05360468, 1.        ]])

In [13]:
cfa.transform(subset_df.values)

array([[ 1.96045383,  0.92423287,  0.14677389, -0.18466227,  1.10637443],
       [-0.97278053,  1.04620543,  0.89743043,  0.54172898, -0.88109213],
       [-0.54573387,  0.6830371 ,  0.54153986,  0.31858652,  0.26465504],
       ...,
       [ 0.39087472, -1.51669854,  0.08152021, -0.7115184 ,  0.97441546],
       [-0.88164331, -0.32756937, -0.08407023,  0.68810262, -0.11418652],
       [ 1.08051384,  0.25372248,  0.40749745, -0.70479084,  1.23393173]])