# Unsupervised Feature selection technique

## 2. Independence Component Analysis (ICA)

### üîπ What is ICA?

- PCA finds new axes (directions) that maximize variance (spread of data).
- ICA finds new axes that make the components statistically independent.

üëâ Independence means: knowing one component gives you no information about the other.
So PCA cares about uncorrelated features, ICA goes further and finds independent features.

### üîπ Simple Real-Time Example: "Cocktail Party Problem"

#### Imagine you are in a room where:

- üé§ Microphone 1 records sound from all directions (mix of all voices).
- üé§ Microphone 2 also records a mix of all voices but with different weights.

- Both recordings are mixed signals (you cannot tell whose voice is whose).

üëâ ICA can separate the voices into independent signals:

- One component = Speaker A‚Äôs voice
- Another component = Speaker B‚Äôs voice

This is why ICA is widely used in signal processing (EEG brain signals, audio separation, etc.).

### üîπ Where is ICA Used?

- EEG/MEG Brain signal analysis ‚Üí separating brain signals from noise.
- Audio processing ‚Üí separating music instruments or voices.
- Image processing ‚Üí extracting independent features from images.
- Finance ‚Üí separating independent sources of risk in stock data.

#### ‚úÖ In short:

- PCA ‚Üí ‚Äúcompress data while keeping max variance.‚Äù
- ICA ‚Üí ‚Äúunmix data into independent sources.‚Äù

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA

In [3]:
# Step 1: Generate two independent signals
np.random.seed(0)
n_samples = 2000
time = np.linspace(0, 8, n_samples)

In [4]:
# Independent sources (like voices/music)
s1 = np.sin(2 * time)          # Signal 1: sine wave
s2 = np.sign(np.sin(3 * time)) # Signal 2: square wave
S = np.c_[s1, s2]

In [5]:
# Add noise
S = S + 0.2 * np.random.normal(size=S.shape)

In [6]:
# Step 2: Mix the signals (like microphones capturing mixed sounds)
A = np.array([[1, 1], [0.5, 2]])  # Mixing matrix
X = S.dot(A.T)  # Mixed signals

In [7]:
X

array([[ 0.43284191,  0.33646812],
       [ 1.65193015,  2.99823304],
       [ 1.19406334,  1.80384831],
       ...,
       [-1.29432521, -2.29832957],
       [-0.93043174, -1.9743876 ],
       [-1.13504609, -2.31136368]])

In [8]:
# Step 3: Apply ICA to recover original signals
ica = FastICA(n_components=2, random_state=0)
S_ica = ica.fit_transform(X)  # Independent components
A_ica = ica.mixing_

In [9]:
S_ica

array([[ 0.02559942,  0.33585939],
       [ 1.37381763,  0.0447254 ],
       [ 0.73540806,  0.34487582],
       ...,
       [-1.1137819 , -0.36539436],
       [-1.03053095,  0.00893473],
       [-1.18235734, -0.04979883]])

In [10]:
A_ica

array([[1.06308671, 0.73569087],
       [2.06212272, 0.40677657]])

Intuition:

- Suppose your dataset is like fruit juice (orange + apple + grape mixed).
- ICA extracts the pure juices (independent sources = S_ica).
- The mixing matrix (A_ica) tells you how much of each fruit juice went into making the original mixture.

In [1]:
import matplotlib.pyplot as plt

# Step 4: Plot
plt.figure(figsize=(10, 7))

plt.subplot(3, 1, 1)
plt.title("Original Independent Signals (Sources)")
plt.plot(S)

plt.subplot(3, 1, 2)
plt.title("Mixed Signals (Observed by microphones)")
plt.plot(X)

plt.subplot(3, 1, 3)
plt.title("Recovered Signals using ICA")
plt.plot(S_ica)

plt.tight_layout()
plt.show()

NameError: name 'plt' is not defined