### Robust Scaler

Unlike the Standard Scaler, which uses the mean and standard deviation, the Robust Scaler uses the median and the interquartile range (IQR). 

This makes it more robust to outliers, as the median and IQR are less affected by extreme values.

In [1]:
import numpy as np
from sklearn.preprocessing import RobustScaler

In [2]:
# Sample dataset
data = np.array([[1, 2],
                 [2, 3],
                 [3, 4],
                 [4, 5],
                 [100, 200]])  # Outliers

In [3]:
# Initialize the RobustScaler
scaler = RobustScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

In [4]:
print("\nScaled Data:")
print(scaled_data)


Scaled Data:
[[-1.  -1. ]
 [-0.5 -0.5]
 [ 0.   0. ]
 [ 0.5  0.5]
 [48.5 98. ]]


The median is subtracted from each feature, and the result is divided by the IQR (75th percentile - 25th percentile).

This scaling method is robust to outliers because the median and IQR are not influenced by extreme values.

When to Use Robust Scaler?

    When your dataset contains outliers.

    When you want to scale features without being influenced by extreme values.

    When the data is not normally distributed.

Comparison with Other Scalers:
    
    StandardScaler: Uses mean and standard deviation. Sensitive to outliers.

    MinMaxScaler: Scales data to a fixed range (e.g., [0, 1]). Sensitive to outliers.

    RobustScaler: Uses median and IQR. Robust to outliers.

### Power Transform

The Power Transform is a technique used to make data more Gaussian-like (normally distributed). 

It is particularly useful for stabilizing variance and making the data more suitable for machine learning algorithms that assume normality.

In Python, we can use the PowerTransformer class from the sklearn.preprocessing module.

The two most common power transforms are:

    Yeo-Johnson Transform: Works for both positive and negative values.

    Box-Cox Transform: Works only for positive values.

In [5]:
import numpy as np
from sklearn.preprocessing import PowerTransformer

In [6]:
# Sample dataset with skewed distributions
data = np.array([[1, 2],
                 [2, 3],
                 [3, 4],
                 [4, 5],
                 [100, 200]])  # Outliers

In [9]:
transformer = PowerTransformer(method='yeo-johnson')  # or method='box-cox' for positive data

transformed_data = transformer.fit_transform(data)

In [10]:
print(transformed_data)

[[-1.42350633 -1.28926032]
 [-0.45217158 -0.52663912]
 [-0.01608132 -0.09982465]
 [ 0.24604552  0.17828052]
 [ 1.6457137   1.73744357]]


The Yeo-Johnson transform can handle both positive and negative values.

The Box-Cox transform requires all values to be positive. 
 
    If your data contains negative values, use the Yeo-Johnson method.

The transform aims to make the data more normally distributed by applying a power function.

When to Use Power Transform?

    When your data is skewed and you want to make it more Gaussian-like.

    When you need to stabilize variance in the data.

    When using algorithms that assume normally distributed data (e.g., linear regression, Gaussian Naive Bayes).

Comparison with Other Transformers:

    StandardScaler: Scales data to have zero mean and unit variance but does not change the distribution.

    MinMaxScaler: Scales data to a fixed range but does not change the distribution.

    PowerTransformer: Transforms data to make it more Gaussian-like.

For the Box-Cox transform, ensure all data is positive. If not, use the Yeo-Johnson transform.

The PowerTransformer is particularly useful for reducing skewness and improving the performance of machine learning models.

### Quantile Transform

The Quantile Transform is a non-linear transformation that maps the data to a specified distribution (e.g., normal or uniform).

It is particularly useful for handling skewed data or outliers, as it spreads out the most frequent values and compresses the less frequent ones.

In [None]:
import numpy as np
from sklearn.preprocessing import QuantileTransformer

In [None]:
# Sample dataset with a skewed distribution
data = np.array([[1],
                 [2],
                 [3],
                 [4],
                 [100]])  # Outlier

In [None]:
transformer = QuantileTransformer(output_distribution='uniform', random_state=42)
transformed_data = transformer.fit_transform(data)

In [None]:
print(transformed_data)

The Quantile Transformer maps the data to a specified distribution (uniform or normal) based on the quantiles of the data.

It is robust to outliers because it uses quantiles instead of mean or variance.

By default, it maps the data to a uniform distribution (output_distribution='uniform'). 

You can also map it to a normal distribution by setting output_distribution='normal'.

We can customize the QuantileTransformer by adjusting parameters such as:

    output_distribution: Choose between uniform (default) or normal.

    n_quantiles: Number of quantiles to compute (default is 1000).

    random_state: Seed for reproducibility.

When to Use Quantile Transform?

    When your data is skewed and you want to make it more normally distributed.

    When you need to handle outliers in the data.

    When you want to map the data to a specific distribution (e.g., uniform or normal).

Comparison with Other Transformers:

    StandardScaler: Scales data to have zero mean and unit variance but does not change the distribution.

    PowerTransformer: Makes data more Gaussian-like using power transformations.

    QuantileTransformer: Maps data to a specified distribution (uniform or normal) based on quantiles.

The QuantileTransformer is computationally expensive for large datasets because it requires sorting the data and computing quantiles.

It is particularly useful for non-parametric data or when you want to ensure a specific distribution for your data.