Exercise on Principal Component Analysis
a. Load the "Wine Dataset", available in the scikit learn library (use "load_wine()").

b. Implement two functions that transform the data according to the PCA. One implementation is based on the eigendecomposition, while the other is based on the singular value decomposition.

c. Apply the two implementations of the PCA transformation to the wine data and verify they yield the same result (compare the variance of the principal components, and the principal component scores). For a further check, compare your own PCA implementations with the builtin one from sklearn. Are the results still the same?

e. Verify that applying the PCA decorrelates the features. To do so, compare the correlation matrix of the original data and of the transformed data.
d. Perform the dimensionality reduction by motivating the choice for the new dimensionality value. Repeat this step two times: in the first case do not standardize the original data, in the second standardize it. To further understand the effect of standardization, set the number of components for the PCA to two, and plot the transformed data.


In [2]:
from sklearn.datasets import load_wine
import numpy as np


# Carica il Wine Dataset
wine_data = load_wine()

# Estrai i dati e le etichette
X = wine_data.data # Matrice delle features (178 campioni x 13 features)
y = wine_data.target  # Vettore delle etichette di classe (0, 1, o 2)
feature_names = wine_data.feature_names  # Nomi delle 13 features
target_names = wine_data.target_names  # Nomi delle 3 classi di vino

print(X.shape)


(178, 13)


In [None]:
X_centered = X - np.mean(X, axis=0)
C = (1 / (X.shape[0] - 1)) * X_centered.T @ X_centered
eigenvalues, eigenvectors = np.linalg.eig(C)

# IMPORTANTE: Ordina per autovalori decrescenti
idx = np.argsort(eigenvalues)[::-1]  # Indici ordinati decrescente
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]  # Riordina le colonne

# Ora eigenvectors è W (componenti principali ordinate)
W = eigenvectors  # Ogni colonna è una componente principale

print("W shape:", W.shape)
print("Autovalori ordinati:", eigenvalues[:5])  # Prime 5 varianze
X_transformed = X_centered @ W

# Ricostruisci C a partire da W e dagli autovalori
Lambda = np.diag(eigenvalues)          # matrice diagonale degli autovalori
C_reconstructed = W @ Lambda @ W.T     # W Λ W^T

print("Errore ricostruzione C:", np.linalg.norm(C - C_reconstructed))


# Ricostruisci X originale
X_reconstructed = X_transformed @ W.T + np.mean(X, axis=0)
X_reconstructed= X_transformed @ W.T
 

# Verifica
print("X originale uguale a X ricostruito?", 
      np.allclose(X, X_reconstructed))
print("Errore di ricostruzione:", 
      np.linalg.norm(X - X_reconstructed))

#CALCOLO DELLE VARIANZE




W shape: (13, 13)
Autovalori ordinati: [9.92017895e+04 1.72535266e+02 9.43811370e+00 4.99117861e+00
 1.22884523e+00]
Errore ricostruzione C: 1.786931010719055e-11
X originale uguale a X ricostruito? False
Errore di ricostruzione: 10058.618612752334


In [12]:
X_centered = X - np.mean(X, axis=0)
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)

#U=vettori singolari sinistri (matrice U)
X_centered = U @ np.diag(s) @ Vt
#Σ=matrice diagonale degli autovalori (matrice Σ)
#V^T=matrice degli autovettori destri (matrice V trasposta)

print("U:", U.shape)
print("s:", s.shape)
print("Vt:", Vt.shape)



U: (178, 13)
s: (13,)
Vt: (13, 13)


c. Apply the two implementations of the PCA transformation to the wine data and verify they yield the same result (compare the variance of the principal components, and the principal component scores). For a further check, compare your own PCA implementations with the builtin one from sklearn. Are the results still the same?

In [13]:
#Confrontare sia la prima PCA con la seconda PCa ottenuta via SVD, confrontare inoltre le varianze delle componenti delle rispettive PCA.
print("Varianze eig vs svd uguali?", np.allclose(X_transformed, X_centered, rtol=1e-6, atol=1e-8))


Varianze eig vs svd uguali? False
