#### Standardize vs. Normalize
- <strong>Rescaling</strong>: add/subtract a constant and then multiply/divide by constant. Example: converting from Farenheit to Celsius.
<br/><br/>
- <strong>Normalizing</strong>: a vector means that dividing by the norm of the vector. Also known as rescaling by minimum and range of the vector to make elements between 0 and 1, thus bringing all the columns into a common scale.
<br/><br/>
- <strong>Standardizing</strong>: a vector means that subtracting a measure of location and dividing by a measure of scale. For instance, consider a vector of values that are generated from a normal distribution, here you subtract the mean and divide by the standard deviation, hence obtaining the "standard normal" RV with $\mu=0$ and $\sigma=1$
<br/><br/>
- Questions to be answered:
    1. Why should you standardize/normalize/scale your data?
    2. How to standardize your numeric attributes to have a 0 mean and 1 std.
    3. How to normalize your numeric attributes between the range of 0 and 1 using min/max scalar.
    4. How to normalize using robust scalar.
    5. When to choose standardization or normalization.
<br/><br/>    
- Why Standardize or Normalize?
    - <strong>Standardization</strong>: standardizing features is important when we compare measurements that have different units. Variables that are measured at different scales do not contribute equally to the analysis and might end up creating a bias.
        - For example, A variable that ranges between 0 and 1000 will outweigh a variable that ranges between 0 and 1. Using these variables without standardization will give the variable with the larger range weight of 1000 in the analysis. Transforming the data to comparable scales can prevent this problem. Typical data standardization procedures equalize the range and/or data variability.
<br/><br/>    
    - <strong>Normalization</strong>: The goal of normalization is to change the values of the numeric columns in the dataset to a common scale, without distorting differences in the range of values. For ML, every dataset does not require normalization. It is required only when features have different ranges.
        - For example, consider a data set containing two features, age, and income(x2). Where age ranges from 0–100, while income ranges from 0–100,000 and higher. Income is about 1,000 times larger than age. So, these two features are in very different ranges. When we do further analysis, like multivariate linear regression, for example, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor. So we normalize the data to bring all the variables to the same range.
<br/><br/>    
- When do you Standardize or Normalize?
    - <strong>Normalization</strong>: is used when you do not know the distribution of the data or when you know that the data is not normally distributed. Normalization is useful when you data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as KNN and aritificial neural nets. The following code normalizes data:
```python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler() 
data_scaled = scaler.fit_transform(data)
# find mean and std
print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))
```
<br/><br/>
    - <strong>Standardization</strong>: assumes that your data comes from a normal distribution. This does not strictly have to be true, but it is more effective if your data is normal. Standardization is useful when your data has varying scales and the algorithm you are using does make assumptions about your data having a normal distribution, examples include linear regression, logistic regression and linear discriminant analysis. The following code standardizes data:
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
# find mean and standard deviation
print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))
```
<br/><br/>
- <strong>Robust Scalar </strong>(Scaling to median and quantiles): Scaling using median and quantiles consists of subtracting the median to all the observations and then dividing by the interquartile difference. It scales features using statistics that are robust to outliers. Interquartile difference is the difference between the 75th and the 25th quartile.
$$X_{scaled} = \frac{X-median(X)}{IQR}$$
<br/><br/>
$$IQR = 75th quartile - 25th quartile$$
```python
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler() 
data_scaled = scaler.fit_transform(data)
# get mean and std
print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))
```
<br/><br/>
Sources:
- https://medium.com/@swethalakshmanan14/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff

#### Different Methods in normalizing data
- Normalization Function
$$y = \frac{X - min(X)}{max(X)-min(X)}$$
<br/><br/>
- Sigmoid Function: As $x$ goes to negative infinity, $f(x)=0$ and as $x$ goes to positive infinity, $f(x)=1$.
$$f = \frac{1}{1+e^{-t}}$$
<br/><br/>
- Log Function: Used only when $x > 0$ for all $x$.
$$ f = log(x) $$
<br/><br/>
- Log Function + 1
$$ f = log(x) + 1$$
<br/><br/>
- Log Function + 1 Normalized
$$g =  \frac{f(X) - min(f(X))}{max(f(X)) - min(f(X))}, f(X) = log(X)$$
<br/><br/>
- Cube Root: When numbers are too large
$$ f = X^{1/3}$$
<br/><br/>
- Cube Root Normalized
$$ g = \frac{f(X) - min(f(X))}{max(f(X)) - min(f(X))}, f(X) = X^{1/3}$$