# <span style = "color:coral">Feature scaling </span>

***

Feature scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.

I’m sure most of you must have faced this issue in your projects or your learning journey. For example, one feature is entirely in kilograms while the other is in grams, another one is liters, and so on. How can we use these features when they vary so vastly in terms of what they’re presenting?

This is where I turned to the concept of feature scaling. It’s a crucial part of the data preprocessing stage.

### Why Should we Use Feature Scaling?

The first question we need to address – why do we need to scale the variables in our dataset? Some machine learning algorithms are sensitive to feature scaling while others are virtually invariant to it. Let me explain that in more detail.

#### 1. Gradient Descent Based Algorithms

    Machine learning algorithms like linear regression, logistic regression, neural network, etc. that use gradient descent as an optimization technique require data to be scaled

#### 2. Distance-Based Algorithms

    Distance algorithms like KNN, K-means, and SVM are most affected by the range of features. This is because behind the scenes they are using distances between data points to determine their similarity.Since both the features have different scales, there is a chance that higher weightage is given to features with higher magnitude. This will impact the performance of the machine learning algorithm and obviously, we do not want our algorithm to be biassed towards one feature.
    
    Therefore, we scale our data before employing a distance based algorithm so that all the features contribute equally to the result.

#### 3. Tree-Based Algorithms

    Tree-based algorithms, on the other hand, are fairly insensitive to the scale of the features. Think about it, a decision tree is only splitting a node based on a single feature. The decision tree splits a node on a feature that increases the homogeneity of the node. This split on a feature is not influenced by other features.
    
    So, there is virtually no effect of the remaining features on the split. This is what makes them invariant to the scale of the features!

### Normalization vs Standardization

#### What is Normalization?

Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.

#### What is Standardization?

Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

### The Big Question – Normalize or Standardize?
Normalization vs. standardization is an eternal question among machine learning newcomers. Let me elaborate on the answer in this section.

* Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.
* Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.

## Normalization using sklearn

#### <span style = "color:red">Before proceeding with Feature scaling, Split the dataset into X and y. X being the input variables and y the target</span>

In [None]:
# data normalization with sklearn
from sklearn.preprocessing import MinMaxScaler

# Create normalization model
norm = MinMaxScaler()

# Store column names of X in a variable
xcolumns = X.columns

# fit_transform data into a new variable
X_scaled = norm.fit_transform(X)

# Create a new DataFrame with X and assign the column names we fetched earlier
X_scaled = pd.DataFrame(X_scaled, columns = xcolumns)

## Standardization using sklearn

In [1]:
     # data standardization with  sklearn
from sklearn.preprocessing import StandardScaler

# create scaler model
scaler = StandardScaler()

# Store column names of X in a variable
xcolumns = X.columns

# fit_transform data into a new variable
X_scaled = scaler.fit_transform(X)

# Create a new DataFrame with X and assign the column names we fetched earlier
X_scaled = pd.DataFrame(X_scaled, columns = xcolumns)

NameError: name 'X' is not defined

***