# Dimensionality Rediction
<br>
Source:

[link](https://saturncloud.io/blog/what-is-the-difference-between-pca-truncated-svd-and-svds-in-numpy-and-sklearn/)

In [19]:
# Imports
import numpy as np

from numpy.linalg import eig
from sklearn.decomposition import PCA

from numpy.linalg import svd
from sklearn.decomposition import TruncatedSVD
from sklearn.utils.extmath import randomized_svd

from scipy.sparse.linalg import svds
from scipy.sparse import csc_matrix

In [20]:
# Preamble
# Mock-up sample array
# Example 1: 2 features, 3 samples
#X = np.array([
#	[1, 2],
#	[3, 4],
#	[5, 6]
#])

# Example 2: 3 features, 4 samples
X = np.array([
	[1, 2, 3],
	[3, 4, 5],
	[5, 6, 7],
	[7, 8, 9]
])

# Number of features (dimensions)
n_dim = 2

print(X)

[[1 2 3]
 [3 4 5]
 [5 6 7]
 [7 8 9]]


## Concept

In [21]:
# Calculate covariance matrix
cov_matrix = np.cov(X.T)
print('Covariance:\n', cov_matrix)

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = eig(cov_matrix)
print()
print('Eigen values:\n', eigenvalues)
print('Eigen vectors:\n', eigenvectors)

# Sort eigenvectors by eigenvalues
sorted_indices = eigenvalues.argsort()[::-1]
print()
print('Sorted indices of eigen values:\n', sorted_indices)
eigenvalues = eigenvalues[sorted_indices]
print('Eigen values:\n', eigenvalues)
eigenvectors = eigenvectors[:, sorted_indices]
print('Eigen vectors:\n', eigenvectors)

# Transform data using first two principal components
transformed = X.dot(eigenvectors[:, :n_dim])
print()
print('Transformed data:\n', eigenvectors)

Covariance:
 [[6.66666667 6.66666667 6.66666667]
 [6.66666667 6.66666667 6.66666667]
 [6.66666667 6.66666667 6.66666667]]

Eigen values:
 [ 2.00000000e+01 -1.95279746e-15  9.00776117e-32]
Eigen vectors:
 [[ 5.77350269e-01  8.12978835e-01 -6.90899878e-17]
 [ 5.77350269e-01 -4.72056641e-01  7.07106781e-01]
 [ 5.77350269e-01 -3.40922194e-01 -7.07106781e-01]]

Sorted indices of eigen values:
 [0 2 1]
Eigen values:
 [ 2.00000000e+01  9.00776117e-32 -1.95279746e-15]
Eigen vectors:
 [[ 5.77350269e-01 -6.90899878e-17  8.12978835e-01]
 [ 5.77350269e-01  7.07106781e-01 -4.72056641e-01]
 [ 5.77350269e-01 -7.07106781e-01 -3.40922194e-01]]

Transformed data:
 [[ 5.77350269e-01 -6.90899878e-17  8.12978835e-01]
 [ 5.77350269e-01  7.07106781e-01 -4.72056641e-01]
 [ 5.77350269e-01 -7.07106781e-01 -3.40922194e-01]]


## PCA: Principal Component Analysis

In [22]:
pca = PCA(n_components=n_dim)
transformed = pca.fit_transform(X)
print('Transformed data:\n', transformed)

Transformed data:
 [[ 5.19615242e+00  2.38225952e-17]
 [ 1.73205081e+00 -6.49707143e-18]
 [-1.73205081e+00  6.49707143e-18]
 [-5.19615242e+00  1.94912143e-17]]


## Truncated SVD: Truncated Singular Value Decomposition

Truncated SVD can be implemented in Numpy using the numpy.linalg library. Here is an example of how to use Truncated SVD in Numpy:

In [23]:
U, S, Vt = svd(X, full_matrices=False)
transformed = U.dot(np.diag(S)).dot(Vt)
print('Transformed data:\n', transformed)

Transformed data:
 [[1. 2. 3.]
 [3. 4. 5.]
 [5. 6. 7.]
 [7. 8. 9.]]


In Sklearn, Truncated SVD can be implemented using the sklearn.decomposition library. Here is an example of how to use Truncated SVD in Sklearn:

In [24]:
svd = TruncatedSVD(n_components=n_dim)
transformed = svd.fit_transform(X)
print('Transformed data:\n', transformed)

Transformed data:
 [[ 3.62361288e+00  9.32432124e-01]
 [ 7.05585006e+00  4.63659289e-01]
 [ 1.04880872e+01 -5.11354618e-03]
 [ 1.39203244e+01 -4.73886381e-01]]


## SVDS: Singular Value Decomposition

In [25]:
U, S, Vt = randomized_svd(X, n_components=n_dim)
transformed = U.dot(np.diag(S)).dot(Vt)
print('Transformed data:\n', transformed)

Transformed data:
 [[1. 2. 3.]
 [3. 4. 5.]
 [5. 6. 7.]
 [7. 8. 9.]]




PCA, Truncated SVD, and SVDS are all linear algebra techniques used for dimensionality reduction, but they differ in their approach.
- PCA is used to find the most important features in the data and project them onto a lower-dimensional space.
- Truncated SVD is used to perform a partial decomposition of the data matrix and retain most of the important information.
- SVDS is used to calculate a partial decomposition of a large matrix where a full decomposition is not feasible