# Feature Scaling

- Some machine learning models that rely on distance metrics (e.g. KNN) require scaling to perform well.
- Feature scaling improves the convergence of the steepest descent algorithms (algorithms like gradient descent in trying to minimize the lost function), which do not possess the property of scale invariance.
- There are some machine learning algorithms where scaling is not going to have any effect (e.g. CART based models_ Classification and Regression Trees).
- If you just train the model to feature data that was scaled, that means for new incoming unseen data, you will also have to scale it before feeding it to the model.
- If you scale all the features to the same range, that does make it easier to compare one coefficient to another.

---

- Feature Scaling Benefits:
    - Can lead to great increases in performance.
    - Absolutely necessary for some models.

---

<img src='fs1.png' width=650>

---

# Standardization

It means you're essentially standardizing all your data to follow a standard normal distribution so you could have negative values there.

<img src='fs2.png' width=650>

---

# Normalization

<img src='fs3.png' width=650>

---

# .fit() and .transform() methods

<img src='fs4.png' width=650>

---

<img src='fs5.png' width=650>

---

<img src='fs6.png' width=650>

leakage - утечка

---

# Feature Scaling Process

<img src='fs7.png' width=650>

---

# Do we need to scale the label?


- In general it is NOT necessary nor advised. 
- Normalizing the output distribution is altering the definition of the target.
- Predicting a distribution that doesn't mirror your real-world target.
- Can actually negatively impact stochastic gradient descent.
<br>https://stats.stackexchange.com/questions/111467/is-it-necessary-to-scale-the-target-value-in-addition-to-scaling-features-for-re

---

__Standardization:__

Standardizing the features around the center and 0 with a standard deviation of 1 is important when we compare measurements that have different units. Variables that are measured at different scales do not contribute equally to the analysis and might end up creating a bais.

For example, A variable that ranges between 0 and 1000 will outweigh a variable that ranges between 0 and 1. Using these variables without standardization will give the variable with the larger range weight of 1000 in the analysis. Transforming the data to comparable scales can prevent this problem. Typical data standardization procedures equalize the range and/or data variability.

__Normalization:__

Similarly, the goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization. It is required only when features have different ranges.

For example, consider a data set containing two features, age, and income(x2). Where age ranges from 0–100, while income ranges from 0–100,000 and higher. Income is about 1,000 times larger than age. So, these two features are in very different ranges. When we do further analysis, like multivariate linear regression, for example, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor. So we normalize the data to bring all the variables to the same range.

__When Should You Use Normalization And Standardization:__

Normalization is a good technique to use when you do not know the distribution of your data or when you know the distribution is not Gaussian (a bell curve). Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks.

Standardization assumes that your data has a Gaussian (bell curve) distribution. This does not strictly have to be true, but the technique is more effective if your attribute distribution is Gaussian. Standardization is useful when your data has varying scales and the algorithm you are using does make assumptions about your data having a Gaussian distribution, such as linear regression, logistic regression, and linear discriminant analysis.

---