# Factor Analysis

### KMO(Kaiser-Meyer-Olkin)
- 변수들 간의 상관관계가 다른 변수에 의해 잘 설명되는 정도

|KMO||
|:---|:---|
|0.90 이상|상당히 좋음|
|0.80 ~ 0.89|좋음|
|0.70 ~ 0.79|적당함|
|0.60 ~ 0.69|평범|
|0.50 ~ 0.59|좋지 않음|
|0.50 미만|부적합|

### Bartlett's test of sphericity(구형성 검정)
- 요인분석 모형의 적합성 여부 판단
- 귀무가설: 상관관계 행렬이 단위행렬이다. (귀무가설이 기각되어야 요인분석 모델 사용가능)

### Communalities(공통성)
- 주어진 컬럼(요인)에 대한 분석
- 추출된 요인들에 의해 설명되는 비율
- 공통성이 낮은 변수(0.4이하)는 요인변수에서 제외하는게 좋음

### Total variance explained
- 주성분 분석에 생성된 요인에 대한 분석
- 각 요인이 설명하는 분산의 양

---

## Import Packages
- Visual Python: Data Analysis > Import

In [None]:
# Visual Python: Data Analysis > Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

## 1 Factor Analysis

In [None]:
# Visual Python: Data Analysis > File
df = pd.read_csv('./data/10_1_요인분석.csv')
df

In [None]:
# Visual Python: Factor Analysis
vp_df = df.dropna().copy()

# KMO(Kaiser-Meyer-Olkin) measure of sampling adequacy
from IPython.display import display, Markdown
from factor_analyzer.factor_analyzer import calculate_kmo
_kmo = calculate_kmo(vp_df)
display(Markdown('### KMO measure of sampling adequacy'))
display(pd.DataFrame(data={'Statistic ':_kmo[1]}, index=['KMO measure of sampling adequacy']))

# Bartlett's test of sphericity
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
_bartlett = calculate_bartlett_sphericity(vp_df)
display(Markdown('### Bartlett\'s test of sphericity'))
display(pd.DataFrame(data={'Chi-square statistic':_bartlett[0],'p-value':_bartlett[1]}, index=['Bartlett test of sphericity']))

# Initial
from factor_analyzer import FactorAnalyzer
_fa1 = FactorAnalyzer(n_factors=vp_df.shape[1], rotation=None, method='principal', impute='drop')
_fa1.fit(vp_df)

# Number of Factor
_nof = (_fa1.get_eigenvalues()[0] > 1).sum()

# Un-rotated
_fa2 = FactorAnalyzer(n_factors=_nof, rotation=None, method='principal', impute='drop')
_fa2.fit(vp_df)

# Rotated
_fa3 = FactorAnalyzer(n_factors=_nof, rotation='varimax', method='principal', impute='drop')
_fa3.fit(vp_df)

# Correlation matrix
display(Markdown('### Correlation matrix'))
display(pd.DataFrame(data= _fa1.corr_ , index=vp_df.columns, columns=vp_df.columns).round(2))

# Scree plot
import warnings
with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=Warning)
    plt.plot(_fa1.get_factor_variance()[1], 'o-')
    plt.title('Scree Plot')
    plt.xlabel('Factors')
    plt.ylabel('Eigenvalue')
    plt.show()

# Communalities
display(Markdown('### Communalities'))
display(pd.DataFrame(data={'Initial':_fa1.get_communalities(),'Extraction':_fa2.get_communalities()},index=vp_df.columns).round(3))

# Total variance explained
# Initial Eigenvalues
_ss1 = pd.DataFrame(data=_fa1.get_factor_variance(),
                    index=[['Initial Eigenvalues' for i in range(3)],['Total','% of variance','Cumulative %']]).T
# Extraction sums of squared loadings
_ss2 = pd.DataFrame(data=_fa1.get_factor_variance(),
                    index=[['Extraction sums of squared loadings' for i in range(3)],['Total','% of variance','Cumulative %']]).T[:3]
# Rotation sums of squared loadings
_ss3 = pd.DataFrame(data=_fa3.get_factor_variance(),
                    index=[['Rotation sums of squared loadings' for i in range(3)],['Total','% of variance','Cumulative %']]).T
                    
display(Markdown('### Total variance explained'))
display(pd.concat([_ss1,_ss2,_ss3], axis=1).round(3))

# Factor matrix
display(Markdown('### Factor matrix'))
display(pd.DataFrame(data=_fa2.loadings_,index=vp_df.columns,
                     columns=list(range(_nof))).round(3))

# Rotated factor matrix
display(Markdown('### Rotated factor matrix'))
display(pd.DataFrame(data=_fa3.loadings_,index=vp_df.columns,
                     columns=list(range(_nof))).round(3))

---

In [None]:
# End of file