## Transformers

In [61]:
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

# Generate some data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=42)

# Use the transformer directly
X_transformed = StandardScaler().fit_transform(X)


LinearRegression().fit(X_transformed, y)


**This code demonstrates the use of a transformer, StandardScaler, to preprocess data by scaling features to have zero mean and unit variance. The transformed data is then used to fit a linear regression model. This example highlights how transformers are used for data preprocessing in machine learning pipelines.**

## Custom Transformer using Function Transformer

In [62]:
def cube(x):
    """
    Custom function to cube the input data.
    
    Parameters:
    x : numpy array
        Input data to be transformed.
    
    Returns:
    numpy array
        Transformed data with each element cubed.
    """
    return np.power(x, 3)

In [63]:
from sklearn.preprocessing import FunctionTransformer

# Create the custom transformer
cube_transformer = FunctionTransformer(cube)


In [64]:
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

# Generate some data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=42)

# Use the transformer directly
X_transformed = cube_transformer.transform(X)

LinearRegression().fit(X_transformed, y)


### Explanation

This code demonstrates the creation and usage of a custom transformer using `FunctionTransformer` from scikit-learn. The transformer, defined by the `cube` function, cubes each element of the input data. Here’s a breakdown of the process:

1. **Custom Function (`cube`)**:
   - The `cube` function takes an input array `x` and returns each element cubed using NumPy's `np.power` function.

2. **Creating the Transformer**:
   - `FunctionTransformer(cube)` initializes a transformer instance that applies the `cube` function to its input data.

3. **Generating Sample Data**:
   - `make_regression` generates synthetic data (`X`, `y`) with 100 samples, 2 features, and a random noise of 0.1, for regression modeling.

4. **Transforming Data**:
   - `X_transformed = cube_transformer.transform(X)` applies the custom transformation to the input data `X`, resulting in `X_transformed`.

5. **Fitting a Model**:
   - `LinearRegression().fit(X_transformed, y)` fits a linear regression model to the transformed data `X_transformed` and target `y`.

6. **Purpose**:
   - This code illustrates how custom transformers can be used to preprocess data in specific ways before applying machine learning models. In this case, cubing the features before fitting a regression model can sometimes be useful in certain predictive modeling scenarios.


## Custom Transformer using Class Approach

In [65]:
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np

In [66]:
class MedianIQRScaler(BaseEstimator, TransformerMixin):
    """
    Custom transformer to scale features using median and interquartile range (IQR).
    """
    
    def __init__(self):
        self.medians_ = None
        self.iqr_ = None

    def fit(self, X, y=None):
        """
        Fit the transformer to the data by calculating medians and IQR.
        
        Parameters:
        X : numpy array
            Input data to calculate medians and IQR.
        y : None
            Ignored parameter. Compatibility with scikit-learn pipeline.
        
        Returns:
        self : object
            Returns the instance itself.
        """
        # Calculate medians and interquartile range for each feature
        self.medians_ = np.median(X, axis=0)
        Q1 = np.percentile(X, 25, axis=0)
        Q3 = np.percentile(X, 75, axis=0)
        self.iqr_ = Q3 - Q1
        
        # Handle case where IQR is 0 to avoid division by zero during transform
        self.iqr_[self.iqr_ == 0] = 1
        
        return self

    def transform(self, X):
        """
        Transform the input data using learned medians and IQR.
        
        Parameters:
        X : numpy array
            Input data to be transformed.
        
        Returns:
        X_scaled : numpy array
            Transformed data scaled using medians and IQR.
        
        Raises:
        RuntimeError : If the transformer has not been fitted yet.
        """
        # Check if fit has been called
        if self.medians_ is None or self.iqr_ is None:
            raise RuntimeError("The transformer has not been fitted yet.")
        
        # Scale features using median and IQR learned during fit
        X_scaled = (X - self.medians_) / self.iqr_
        
        return X_scaled

In [67]:
from sklearn.datasets import make_blobs

# Generate synthetic data
X, _ = make_blobs(n_samples=100, n_features=2, centers=3, random_state=42)

# Initialize the transformer
scaler = MedianIQRScaler()

# Fit the scaler to the data
scaler.fit(X)

# Transform the data
X_scaled = scaler.transform(X)

# Print the first few rows of the transformed data
print("Transformed data (first 5 rows):")
print(X_scaled[:5])

Transformed data (first 5 rows):
[[-0.49872679 -0.71613207]
 [ 0.78423675 -0.08192868]
 [-0.03656645  0.52987512]
 [ 0.84159877 -0.09379661]
 [-0.3814692  -0.57206564]]


### Explanation

This code illustrates the creation and usage of a custom transformer using a class approach in scikit-learn. The `MedianIQRScaler` transformer calculates the median and interquartile range (IQR) from the input data during the `fit` method and scales the data during the `transform` method. Here’s a breakdown of the process:

1. **Custom Transformer (`MedianIQRScaler`)**:
   - Inherits from `BaseEstimator` and `TransformerMixin`, making it compatible with scikit-learn pipelines.
   - `__init__` initializes the instance variables (`medians_` and `iqr_`) to store calculated medians and IQR.
   
2. **Fitting the Transformer**:
   - `fit(X)` calculates the medians and IQR for each feature in the input data `X`.
   - Handles the case where IQR is zero to prevent division errors during transformation.
   - Returns the transformer instance itself (`self`).
   
3. **Transforming Data**:
   - `transform(X)` scales the input data `X` using the medians and IQR learned during the fit.
   - Raises a `RuntimeError` if the transformer has not been fitted before transformation.
   
4. **Generating Sample Data**:
   - `make_blobs` generates synthetic data (`X`) with 100 samples, 2 features, and 3 centers for clustering.
   
5. **Using the Transformer**:
   - `MedianIQRScaler()` initializes an instance of the custom transformer.
   - `scaler.fit(X)` fits the transformer to the data `X`, calculating medians and IQR.
   - `scaler.transform(X)` transforms the data `X` using the learned medians and IQR.
   
6. **Purpose**:
   - This code demonstrates how to create a custom transformer in scikit-learn to perform specialized data preprocessing tasks, such as scaling features using robust statistics like median and IQR. Custom transformers allow flexibility in preprocessing pipelines, catering to specific requirements of machine learning models.
