<a href="https://colab.research.google.com/github/ssutharya/Regression_Analysis/blob/main/Principle_Component_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


* PCR reduces the dimensionality of a dataset by projecting it onto a lower-dimensional subspace, using a set of orthogonal linear combinations of the original variables called principal components.


* PCR is often used as an alternative to multiple linear regression, especially when the number of variables is large or when the variables are correlated.
By using PCR, we can reduce the number of variables in the model and improve the interpretability and stability of the regression results.


* To perform PCR, we first need to standardize the original variables and then compute the principal components using singular value decomposition (SVD) or eigendecomposition of the covariance matrix of the standardized data.


* The principal components are then used as predictors in a linear regression model, whose coefficients can be estimated using least squares regression or maximum likelihood estimation.


In [15]:
from sklearn.datasets import load_diabetes
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [20]:
pca = PCA(n_components=4)
lr = LinearRegression()

pipe = Pipeline(steps =[('PCA', pca), ('linear regression', lr)])

#Pipeline helps do the steps in the given order; the output of one is used as the input for the next.

X, y = load_diabetes(return_X_y=True)

pipe.fit(X, y)

y_pred = pipe.predict(X)

In [21]:
mae = mean_absolute_error(y, y_pred)
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = pipe.score(X, y)

print(f'Number of features before PCR: {X.shape[1]}')
print(f'Number of features after PCR: {pca.n_components_}')

print(f'MAE: {mae:.2f}')
print(f'MSE: {mse:.2f}')
print(f'RMSE: {rmse:.2f}')
print(f'R^2: {r2:.2f}')


Number of features before PCR: 10
Number of features after PCR: 4
MAE: 44.31
MSE: 2963.12
RMSE: 54.43
R^2: 0.50
