# Factor Analysis


### Factor Analysis (FA) is an exploratory data analysis method used to search influential underlying factors or latent variables from a set of observed variables. 

### It helps in data interpretations by reducing the number of variables. It extracts maximum common variance from all variables and puts them into a common score.

### Factor analysis is widely utilized in market research, advertising, psychology, finance, and operation research. 

### Market researchers use factor analysis to identify price-sensitive customers, identify brand features that influence consumer choice, and helps in understanding channel selection criteria for the distribution channel.

### We'll cover following:

* Factor Analysis
* Types of Factor Analysis
* Determine Number of Factors
* Adequacy Test
* Interpreting the results

# Factor Analysis

#### Factor analysis is a linear statistical model. It is used to explain the variance among the observed variable and condense a set of the observed variable into the unobserved variable called ___factors___. 

Observed variables are modeled as a linear combination of factors and error terms (Source). Factor or latent variable is associated with multiple observed variables, who have common patterns of responses. Each factor explains a particular amount of variance in the observed variables. It helps in data interpretations by reducing the number of variables.

<img src='fa.JPG'>

Factor analysis is a method for investigating whether a number of variables of interest X1, X2,……., Xl, are linearly related to a smaller number of unobservable factors F1, F2,..……, Fk.

<img src='fa2.JPG'>

## Assumptions
1. There are no outliers in data.
2. Sample size should be greater than the factor.
3. There should not be perfect multicollinearity.
4. There should not be homoscedasticity between the variables.


## Types of Factor Analysis
* Exploratory Factor Analysis: 
        It is the most popular factor analysis approach among social and management researchers. 
        Its basic assumption is that any observed variable is directly associated with any factor.
* Confirmatory Factor Analysis (CFA): 
        Its basic assumption is that each factor is associated with a particular set of observed variables.
        CFA confirms what is expected on the basic.
        
# How does factor analysis work?

#### The primary objective of factor analysis is to reduce the number of observed variables and find unobservable variables. 

#### These unobserved variables help the market researcher to conclude the survey. 

#### This conversion of the observed variables to unobserved variables can be achieved in two steps:

* ___Factor Extraction___: 

___In this step, the number of factors and approach for extraction is selected using variance partitioning methods such as principal components analysis and common factor analysis.___

* ___Factor Rotation___: 

___In this step, rotation tries to convert factors into uncorrelated factors — the main goal of this step to improve the overall interpretability. There are lots of rotation methods that are available such as: Varimax rotation method, Quartimax rotation method, and Promax rotation method.___

# Factor Analysis in python using factor_analyzer package

# Adequacy Test

#### Before you perform factor analysis, you need to evaluate the “factorability” of our dataset. 

#### Factorability means "can we find the factors in the dataset?". 

##### There are two methods to check the factorability or sampling adequacy:

* Bartlett’s Test
* Kaiser-Meyer-Olkin Test

# Bartlett’s test of sphericity 
* This test checks whether or not the observed variables intercorrelate at all using the observed correlation matrix against the identity matrix. 
* If the test found statistically insignificant, you should not employ a factor analysis.

#### In this Bartlett ’s test, the p-value is 0. The test was statistically significant, indicating that the observed correlation matrix is not an identity matrix.

# Kaiser-Meyer-Olkin (KMO) Test 

* This test measures the suitability of data for factor analysis. 
* It determines the adequacy for each observed variable and for the complete model. 
* KMO estimates the proportion of variance among all the observed variable. 
* Lower proportion id more suitable for factor analysis. KMO values range between 0 and 1. 
* Value of KMO less than 0.6 is considered inadequate.

#### The overall KMO for our data is 0.84, which is excellent. This value indicates that you can proceed with your planned factor analysis.

## Choosing the Number of Factors

For choosing the number of factors, you can use the Kaiser criterion and scree plot. Both are based on eigenvalues.

### Scree Plot:
#### In multivariate statistics, a scree plot is a line plot of the eigenvalues of factors or principal components in an analysis. The scree plot is used to determine the number of factors to retain in an exploratory factor analysis (FA) or principal components to keep in a principal component analysis (PCA).

<img src='scree.JPG'>