<a href="https://colab.research.google.com/github/kiranch97/Machine-Learning-Algorithms-for-Beginner-s/blob/9.-Principal-Component-Analysis-(PCA)/Principal_Component_Analysis_(PCA).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Dimensionality reduction is accomplished by the use of **Principal Component Analysis (PCA)**. It transforms the data into a new coordinate system, reducing the number of variables while preserving as much of the original data’s variation as possible.

The primary components, or axis, that maximize the variance in the data are found using **PCA**. The first principal component captures the most variance, the second principal component (orthogonal to the first) captures the next most, and so on.

**Evaluation Metrics:**

1. **Explained Variance:** Indicates how much variance in the data is captured by each principal component.

2. **Total Explained Variance:** The cumulative variance explained by the selected principal components.

**Applying with Sci-kit Learn**

The Breast Cancer dataset, which includes characteristics derived from a digital picture of a fine needle aspirate (FNA) of a breast tumor, will be subjected to PCA. Our objective is to minimize the dataset’s dimensionality while maintaining the greatest amount of information.

Here are the steps we’ll follow:

1. **Load the Breast Cancer Dataset:**

The Breast Cancer dataset consists of features computed from digitized images of fine needle aspirates of breast masses. The features are attributes of the cell nuclei that are visible in the picture.

2. **Apply PCA:**

We initialize PCA with n_components=2, indicating our intention to reduce the dataset to two dimensions. This choice is often made for visualization purposes or as a pre-processing step for other algorithms.
We fit PCA to the data X. During this process, PCA identifies the axes (principal components) that account for the most variance in the data.

3. **Transform the Data:**

The transform method of PCA is used to apply the dimensionality reduction to X. This results in a new dataset X_pca, where each data point is now represented in terms of the two principal components.

4. **Evaluate the PCA Transformation:**

We evaluate our PCA transformation by looking at the Explained Variance of each principal component. This tells us how much of the data’s total variance is captured by each principal component.
The Total Explained Variance is calculated by summing the explained variances of the two principal components. This gives us an overall measure of how much information was preserved in the dimensionality reduction process.

In [1]:
# Import the Libraries
from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
import numpy as np

In [2]:
# Load the Breast Cancer dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data

In [7]:
# Applying PCA
pca = PCA(n_components=2)  # Reducing to 2 dimensions for simplicity
pca.fit(X)

In [8]:
# Transforming the data
X_pca = pca.transform(X)

In [9]:
# Explained Variance
explained_variance = pca.explained_variance_ratio_

In [10]:
# Total Explained Variance
total_explained_variance = np.sum(explained_variance)

In [11]:
print("Explained variance:", explained_variance)
print("Total Explained Variance:", total_explained_variance)

Explained variance: [0.98204467 0.01617649]
Total Explained Variance: 0.9982211613741722


**Let’s evaluate the results.**

**Explained Variance:**

1. **First Principal Component:** 98.20%
2. **Second Principal Component:** 1.62%
3. **Total Explained Variance:** 99.82%

These results indicate that by reducing the dataset to just two principal components, we have captured approximately 99.82% of the total variance in the dataset.

The first component alone accounts for a significant majority of this variance, which suggests that it captures most of the essential information present in the dataset.