## Principal Component Factor Analysis

Principal component factor analysis (PCFA) reduces the number of variables or attributes in a dataset while maintaining most information (IBM, 2023). What remains after this transformation are smaller components called principal components. Principal component factor analysis aims to identify the correlations between starting variables, verify the validity of established constructs, prepare performance indexes from factors, and extract uncorrelated factors for future analysis (Fávero & Belfiore, 2018). PCFA is often used to preprocess data to improve model performance. 

### Application Scenarios

PCFA has many real-world applications. One of the most interesting uses is facial recognition and speech analysis. PCFA extracts dominant features in facial recognition systems, termed “eigenfaces.” When paired with neural networks, the performance is enhanced further by allowing the system to learn to identify better feature representation (Navaz, Sri, & Mazumder, 2013). This reduces the factors evaluated, allowing systems to focus on dominant variables in their analysis. 

Scientists can also use PCFA to evaluate extremes in weather patterns across the continental United States. For example, scientists used PCFA to study relationships between different climate patterns, like El Nino phases, where there are warmer than average surface sea temperatures, and La Nina, where there are more remarkable than average sea temperatures (Jiang, Cooley, & Wehner, 2020). This can help scientists understand increased hurricane activity and shifts in atmospheric patterns. PCFA is valuable in this study because it can help meteorologists understand flood risks and storm intensities as they relate to various climate patterns. 

### Tools for Utilization

Tools used for PCFA are RapidMiner, Python, and Excel; they each have some limitations and advantages. RapidMiner is user-friendly since it does not require coding, supports various file types, and provides visualizations quickly. However, it is not as flexible and requires licensing for advanced features. Excel is familiar to most users; it is excellent for smaller datasets, and most organizations already have access to Excel. Disadvantages of Excel include limited capabilities due to computational restraints and lack of automation. 

Python offers users a variety of powerful libraries that can further enable customization. Python is also well suited for large datasets and can be integrated easily. Python, however, requires a level of expertise when performing analysis. For this example, we will explore an example of PCFA using Python. We will utilize pandas, numpy, and matplotlib.plot libraries, as well as learning. decomposition, sklearn. Preprocessing and sklearn.datasets. We will focus on a wine dataset available in sklearn.datasets (Scikit-learn developers, n.d.). The following Python code loads specific libraries and loads the wine dataset, which contains 178 samples and 13 features:


In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_wine

Next, it is important to scale the data. PCA is sensitive to scale; the following code ensures that the features are equally represented in the PCA calculation. Once the data is scaled, it is possible to apply PCA. The following code applies scale and PCA:

In [12]:
wine = load_wine()
df = pd.DataFrame(wine.data, columns=wine.feature_names)
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
pca = PCA(n_components=2)
pca_result = pca.fit_transform(df_scaled)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)



Explained Variance Ratio: [0.36198848 0.1920749 ]


The above code is from the scikitlearn library. Parameters were set with n_components = 2; this indicates the number of features to keep. Pca. fit.transform(df_scaled) transforms the dataset into two dimensions (Scikit-learn developers, n.d.). We can then view the explained variance ratio using the print function; when adding .36 and .19, we can then assume that 55% of the total information is retained. We can visualize the PCFA transformation in a scatterplot with the following code:

In [None]:
plt.scatter(pca_result[:, 0], pca_result[:, 1], c=wine.target, cmap='viridis', alpha=0.7)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA Result on Wine Dataset')
plt.colorbar(label='Target Class')
plt.show()


This code will output the scatterplot visualization in Figure 1. This shows the capabilities of Python when performing PCFA. Robust libraries allow them to perform PCFA quickly, as well as set dimensions. Python is an excellent tool for exploring PCFA. 

Figure 1

![image.png](attachment:image.png)

### Conclusion

PCFA is a tool that allows for a reduction in dimensionality while retaining key variance. PCFA has value in fields like climate science, healthcare, and facial recognition. Tools like RapidMiner, Excel, and Python allow data scientists to utilize PCFA quickly and effectively. The Python example provided illustrates how PCFA can simplify data for more effective visualization. PCFA, when properly implemented, can lead to more effective decision-making. 


### References

Fávero, L. P., & Belfiore, P. (2018). Data Science for Business and Decision Making (1st ed.). Academic Press. https://doi.org/10.1016/C2016-0-01101-4

IBM (2023, December 8). Principal Component Analysis. Retrieved February 11, 2025, from https://www.ibm.com/think/topics/principal-component-analysis

Jiang, Y., Cooley, D., & Wehner, M. F. (2020). Principal component analysis for extremes and application to U.S. precipitation. Journal of Climate, 33(15), 6441–6451. https://doi.org/10.1175/JCLI-D-19-0413.1

Navaz, A. S. S., Sri, T. D., & Mazumder, P. (2013). Face recognition using principal component analysis and neural networks. International 

Journal of Computer Networking, Wireless and Mobile Communications (IJCNWMC), 3(1), 245-256.

Scikit-learn developers. (n.d.). sklearn.datasets.load_wine — Wine recognition dataset. Scikit-learn. Retrieved February 10, 2025, from https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html

Scikit-learn developers. (n.d.). sklearn.decomposition.PCA — Principal component analysis. Scikit-learn. Retrieved February 10, 2025, from https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

