# Data mining

# Lesson 3

# Data Analysis Using Factor Analysis, PCA, and SVD

### **Objective:**
Learn to apply dimensionality reduction methods, such as Factor Analysis, PCA, and SVD, to analyze high-dimensional data. Understand how these methods uncover hidden structures and simplify data for better analysis and modeling.

### **Description**

Dimensionality reduction methods such as Factor Analysis, Principal Component Analysis (PCA), and Singular Value Decomposition (SVD) are essential in data analysis, especially when working with high-dimensional data. These methods help reduce the number of variables while preserving most of the information, making data easier to visualize and process. This lab focuses on implementing and understanding these techniques in practice.

### What we will learn:
- Applying PCA to reduce dimensionality and interpret components.
- Using Factor Analysis to identify hidden factors in the data.
- Applying SVD for data decomposition and analysis.
- Evaluating the impact of dimensionality reduction on data and model performance.

### Libraries that we use:

- [Pandas](https://pandas.pydata.org/) - a library for working with tabular data, which will help us in the data preparation phase.
- [Matplotlib](https://matplotlib.org/) - for data visualization and identifying interesting patterns.
- [Scikit-learn](https://scikit-learn.org/stable/) - machine learning library for building and evaluating models.
- [Numpy](https://numpy.org/) - a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays


### Structure of the laboratory work:

- We have sales data and want to predict which customers are most likely to make a purchase in the next month.

Our **data.csv** with columns:

    "Age (integer between 18 and 65)",
    "Annual Income (integer between 15,000 and 120,000)",
    "Customer Satisfaction (integer between 1 and 10)",
    "Purchase Frequency (integer between 1 and 30)",
    "Customer Loyalty (integer between 1 and 10)"

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Download data
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
# Description
print(df.head())


## **Exercise 1:** Principal Component Analysis (PCA)
- Standardize the dataset before applying PCA using StandardScaler.
- Apply PCA to reduce dimensionality to 2 components.
- Output the explained variance ratio for each component.
- Visualize the data after dimensionality reduction.

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

# Standardize the data

# Apply PCA with 2 components

# Explained variance ratio

# print(f"Explained Variance Ratio: {explained_variance}")

# Create a DataFrame for visualization

# Visualize PCA results


## **Exercise 2:** Factor Analysis
- Apply Factor Analysis to identify 2 hidden factors.
- Analyze how the original features load onto these factors.
- Visualize the factor analysis results.

In [None]:
from sklearn.decomposition import FactorAnalysis

# Apply Factor Analysis with 2 factors

# Create a DataFrame for visualization

# Visualize factors

# Factor loadings


## **Exercise 3:** Singular Value Decomposition (SVD)
- Apply Singular Value Decomposition (SVD) to the dataset using numpy.
- Identify the three most significant singular values and analyze their contribution.
- Visualize the projections of data onto the first two singular vectors.

In [None]:
import numpy as np

# Apply SVD

# Print the three most significant singular values

# Create a DataFrame for visualizing projections onto the first two singular vectors

# Visualize SVD results


## **Exercise 4:** Comparing Methods
- Compare the results obtained using PCA, Factor Analysis, and SVD.
- Discuss which method works best for your dataset and why.
- Visualize the differences between PCA, Factor Analysis, and SVD.

In [None]:
# Visualize PCA, Factor Analysis, and SVD results side by side

# PCA scatter plot

# Factor Analysis scatter plot

# SVD scatter plot



## Consclusion:

We learned: 

- Applying PCA to reduce dimensionality and interpret components.
- Using Factor Analysis to identify hidden factors in the data.
- Applying SVD for data decomposition and analysis.
- Evaluating the impact of dimensionality reduction on data and model performance.

This lab focuses on understanding and applying dimensionality reduction techniques to uncover hidden structures in high-dimensional data while preserving critical information.


