## Feature Union

In [14]:
import pandas as pd
import numpy as np

# Generating a random dataset with 10 rows and 4 columns
np.random.seed(42)  # For reproducibility
data = np.random.randn(10, 4)

# Creating a DataFrame and naming the columns
df = pd.DataFrame(data, columns=['f1', 'f2', 'f3', 'y'])

df.head()

Unnamed: 0,f1,f2,f3,y
0,0.496714,-0.138264,0.647689,1.52303
1,-0.234153,-0.234137,1.579213,0.767435
2,-0.469474,0.54256,-0.463418,-0.46573
3,0.241962,-1.91328,-1.724918,-0.562288
4,-1.012831,0.314247,-0.908024,-1.412304


In [15]:
from sklearn.pipeline import FeatureUnion
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [16]:
# Define FeatureUnion
feature_union = FeatureUnion([
    ('scaler', StandardScaler()),  # Apply StandardScaler
    ('pca', PCA(n_components=2))   # Apply PCA, reduce to 2 components
])

# first we have defined our feature union
# scaler and pca are being applied parallely

In [17]:
X_transformed = feature_union.fit_transform(df.drop(columns=['y']))

pd.DataFrame(X_transformed, columns=feature_union.get_feature_names_out())

# when we did feature_union.fit_transform we get new feature space where 3 columns are from scaling and 2 from pca

Unnamed: 0,scaler__f1,scaler__f2,scaler__f3,pca__pca0,pca__pca1
0,0.815293,0.41836,0.947878,1.025659,-0.425413
1,-0.282292,0.302777,1.873701,1.772532,-0.358223
2,-0.635686,1.239158,-0.156427,0.327888,1.038742
3,0.432718,-1.721587,-1.410206,-1.911072,-0.68996
4,-1.451676,0.963905,-0.598312,-0.193153,1.371662
5,2.270396,0.312856,0.371269,0.51176,-0.891133
6,-0.74818,0.718778,-0.839795,-0.48428,1.020731
7,-0.832663,0.233387,-0.29387,-0.191723,0.583958
8,0.04908,-0.690119,1.121664,0.726878,-0.811461
9,0.383011,-1.777515,-1.015903,-1.584488,-0.838903


- **Creating the Feature Union:**
  - `FeatureUnion` is used to apply multiple transformations in parallel and combine their outputs into a single feature set.

- **FeatureUnion Definition:**
  - **`FeatureUnion([('scaler', StandardScaler()), ('pca', PCA(n_components=2))])`:**
    - **Name:** `FeatureUnion`
    - **Transformations:**
      - **`('scaler', StandardScaler())`:**
        - **Name:** `scaler`
        - **Transformation:** `StandardScaler()`
        - **Purpose:** Standardizes features by removing the mean and scaling to unit variance.
      
      - **`('pca', PCA(n_components=2))`:**
        - **Name:** `pca`
        - **Transformation:** `PCA(n_components=2)`
        - **Purpose:** Reduces the dimensionality of the data to 2 principal components.

- **Transforming the Data:**
  - `fit_transform(df.drop(columns=['y']))`: Applies the transformations defined in `FeatureUnion` to the DataFrame (excluding the target column `y`).
  - `pd.DataFrame(...)`: Creates a new DataFrame with transformed data and appropriate column names.
  - `get_feature_names_out()`: Retrieves the feature names generated by the transformers for the new DataFrame columns.

- **Output:**
  - The transformed DataFrame has the following features:
    - **`scaler__f1`**
    - **`scaler__f2`**
    - **`scaler__f3`**
    - **`pca__pca0`**
    - **`pca__pca1`**
    
  - The transformed DataFrame contains columns from both scaling (`scaler`) and PCA (`pca`). Above is a snippet of the resulting DataFrame:
