# Module 1: Data Analysis and Data Preprocessing

## Section 2: Feature scaling and normalization

### Part 7: Normalizer

The Normalizer is a preprocessing technique in scikit-learn that is used to normalize samples (i.e., rows) in a dataset. Normalization is the process of scaling each data point to have a unit norm (i.e., length or magnitude). Normalization is applied to each sample independently, meaning that the features for a single sample are rescaled individually.

### 7.1 Understanding normalizer

The Normalizer class in scikit-learn can be useful in cases where the scale of features across different samples is not consistent, and you want to bring them to a common scale to avoid any bias in the machine learning model.


The Normalizer in scikit-learn provides different normalization methods, including L1 normalization (Manhattan norm) and L2 normalization (Euclidean norm). The L1 normalization scales the data such that the sum of absolute values of each row is equal to 1, while L2 normalization scales the data such that the sum of squares of each row is equal to 1.

### 7.1 Usage of normalizer

To use the Normalizer class in scikit-learn, you can follow this example:

In [2]:
import numpy as np
from sklearn.preprocessing import Normalizer

# Sample data: 3 samples with 2 features
data = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Create a Normalizer
normalizer = Normalizer(norm='l1')
normalizer2 = Normalizer(norm='l2')
normalized_data = normalizer.transform(data)
normalized_data2 = normalizer2.transform(data)

print("Original Data:\n", data)
print("\nL1 Normalized Data:\n", normalized_data)
print("\nL2 Normalized Data:\n", normalized_data2)

Original Data:
 [[1. 2.]
 [3. 4.]
 [5. 6.]]

L1 Normalized Data:
 [[0.33333333 0.66666667]
 [0.42857143 0.57142857]
 [0.45454545 0.54545455]]

L2 Normalized Data:
 [[0.4472136  0.89442719]
 [0.6        0.8       ]
 [0.6401844  0.76822128]]


In this example, we used the L1 normalization (Manhattan norm) and L2 normalization (Euclidean norm) to normalize the data. As you can see, each row (sample) is normalized independently, and the sum of squares of each row is equal to 1 after normalization.

### 7.2 Summary

The Normalizer in scikit-learn is a useful tool for normalizing data samples independently, bringing them to a common scale, and avoiding any bias in machine learning models due to different feature scales. It provides different normalization methods (L1 and L2), allowing you to choose the one that suits your specific use case best.