# **Principal Component Analysis (PCA) - Choosing the Right Number of Dimensions**

[Principal Component Analysis - Sklearn](https://scikit-learn.org/stable/modules/decomposition.html#principal-component-analysis-pca)

[sklearn.decomposition.PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA)

In [None]:
# Import Library.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Import Dataset.
dataset = pd.read_csv('Wine.csv')

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

In [None]:
# Split dataset into Training set and Test set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# **Applying PCA**

In [None]:
# Choosing the Right Number of Dimensions.

'''
The following code performs PCA without reducing dimensionality, then computes
the minimum number of dimensions required to preserve 95% of the training set's
variance:
'''

from sklearn.decomposition import PCA
pca = PCA().fit(X_train)
explained_variance = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(explained_variance >= 0.95) + 1
print("The minimum number of dimensions required to preserve 95% of the training set's variance is", d)

The minimum number of dimensions required to preserve 95% of the training set's variance is 10


In [None]:
pca = PCA(n_components=d)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

# Training the Logistic Regression Model.

In [None]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(random_state = 0).fit(X_train, y_train)
y_pred = clf.predict(X_test)  # Predicting the Test Set results.

## Making the Confusion Matrix.

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
print("-----------------------------------------------------------")
print("Accuracy Score is", accuracy_score(y_test, y_pred))
print("-----------------------------------------------------------")
print("-------------------Classification Report-------------------")
print(classification_report(y_test, y_pred))
print("-----------------------------------------------------------")
print("---------------------Confusion Matrix----------------------")
print(confusion_matrix(y_test, y_pred))
print("-----------------------------------------------------------")

-----------------------------------------------------------
Accuracy Score is 1.0
-----------------------------------------------------------
-------------------Classification Report-------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00        14
           2       1.00      1.00      1.00        16
           3       1.00      1.00      1.00         6

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36

-----------------------------------------------------------
---------------------Confusion Matrix----------------------
[[14  0  0]
 [ 0 16  0]
 [ 0  0  6]]
-----------------------------------------------------------
