# Most Used Functions in Dimensionality Reduction

Dimensionality reduction techniques are used to reduce the number of features in a dataset while retaining as much information as possible. This notebook covers some of the most commonly used functions for dimensionality reduction using popular Python libraries such as scikit-learn.

## 1. Principal Component Analysis (PCA)

PCA is a widely used technique for dimensionality reduction that transforms the data to a new coordinate system.

In [1]:
# Example: Principal Component Analysis using scikit-learn
from sklearn.decomposition import PCA
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'feature3': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)

# Apply PCA
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(df)

print(f'Reduced data:\n{reduced_data}')

Reduced data:
[[ 4.89897949e+00  3.84592537e-16]
 [ 2.44948974e+00 -1.28197512e-16]
 [-0.00000000e+00 -0.00000000e+00]
 [-2.44948974e+00  1.28197512e-16]
 [-4.89897949e+00  2.56395025e-16]]


## 2. Linear Discriminant Analysis (LDA)

LDA is a supervised dimensionality reduction technique that maximizes class separability.

In [2]:
# Example: Linear Discriminant Analysis using scikit-learn
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5, 6],
    'feature2': [2, 4, 6, 8, 10, 12],
    'target': [0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

# Apply LDA
lda = LinearDiscriminantAnalysis(n_components=1)
X = df[['feature1', 'feature2']]
y = df['target']
reduced_data = lda.fit_transform(X, y)

print(f'Reduced data:\n{reduced_data}')

Reduced data:
[[-1.25]
 [-0.75]
 [-0.25]
 [ 0.25]
 [ 0.75]
 [ 1.25]]


## 3. t-SNE (t-Distributed Stochastic Neighbor Embedding)

t-SNE is a technique for visualizing high-dimensional data by reducing it to two or three dimensions.

In [7]:
import pandas as pd
from sklearn.manifold import TSNE

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5, 6],
    'feature2': [2, 4, 6, 8, 10, 12],
    'feature3': [5, 4, 3, 2, 1, 0]
}
df = pd.DataFrame(data)

# Apply t-SNE with adjusted perplexity
tsne = TSNE(n_components=2, perplexity=2, random_state=42)
reduced_data = tsne.fit_transform(df)

print(f'Reduced data:\n{reduced_data}')


Reduced data:
[[-165.49147    39.94785 ]
 [-121.394966   39.8173  ]
 [ -68.188255   39.659157]
 [ -11.135759   39.489376]
 [  42.070732   39.330883]
 [  86.16744    39.199203]]


## 4. Non-Negative Matrix Factorization (NMF)

NMF is a technique for factorizing non-negative data into non-negative components.

In [4]:
# Example: Non-Negative Matrix Factorization using scikit-learn
from sklearn.decomposition import NMF

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'feature3': [1, 3, 5, 7, 9]
}
df = pd.DataFrame(data)

# Apply NMF
nmf = NMF(n_components=2, random_state=42)
reduced_data = nmf.fit_transform(df)

print(f'Reduced data:\n{reduced_data}')

Reduced data:
[[0.58426299 0.        ]
 [1.28487377 0.01107156]
 [1.21225685 0.51167486]
 [0.26198713 1.56145624]
 [0.         2.18055511]]




## 5. Independent Component Analysis (ICA)

ICA is a technique for separating a multivariate signal into additive, independent components.

In [5]:
# Example: Independent Component Analysis using scikit-learn
from sklearn.decomposition import FastICA

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'feature3': [1, 3, 5, 7, 9]
}
df = pd.DataFrame(data)

# Apply ICA
ica = FastICA(n_components=2, random_state=42)
reduced_data = ica.fit_transform(df)

print(f'Reduced data:\n{reduced_data}')

Reduced data:
[[-1.41421356  1.41421356]
 [-0.70710678  0.70710678]
 [ 0.          0.        ]
 [ 0.70710678 -0.70710678]
 [ 1.41421356 -1.41421356]]


## 6. Truncated SVD (Singular Value Decomposition)

Truncated SVD is a technique for dimensionality reduction that is particularly useful for sparse data.

In [6]:
# Example: Truncated SVD using scikit-learn
from sklearn.decomposition import TruncatedSVD
import numpy as np

# Sample data
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Apply Truncated SVD
svd = TruncatedSVD(n_components=2)
reduced_data = svd.fit_transform(data)

print(f'Reduced data:\n{reduced_data}')

Reduced data:
[[ 3.58705934  1.06442721]
 [ 8.75770068  0.55016253]
 [13.92834202  0.03589786]
 [19.09898335 -0.47836682]]
