# About Feature Scaling
Scale down or scale up the data in order to standardise the range of features/ variable


## Sections

- [Min Max normalization](#Min-Max-normalization)
- [Z Score normalization](#Z-Score-normalization)
- [Decimal Scaling normalization](#Z-Score-normalization)


Let us define a list with random numbers as our independent variable

In [1]:
import random
random.seed(332)

data = []

for x in range(100):
  data.append(random.randint(1,101))

## Min-Max-scaling-normalization
Min-Max normalization is a simple technique where the technique can specifically fit the data in a pre-defined boundary.

\begin{equation} X_{norm} = \frac{X - X_{min}}{X_{max}-X_{min}} \end{equation}


In the cell below we will use the *data* list and use the min-max scaling normalisation to normalise it
- *min_max_normal* is the list of scaled output
- *X_min* is the minimum value of the data
- *x_max* is the maximum value of the data
- *D* upper boundry of the predefined range
- *C* lower boundry of the predefined range 

In [3]:
min_max_normal = []
X_min = min(data)
X_max = max(data)
D = 1
C = 0

for element in data:
    X_norm = (float(element - X_min)/(X_max - X_min))*(D - C) + C
    min_max_normal.append(X_norm)

print (min_max_normal)

[0.31, 0.66, 0.67, 0.97, 0.74, 0.53, 0.89, 0.75, 0.5, 0.22, 0.0, 0.28, 0.91, 0.68, 0.3, 0.0, 0.77, 0.08, 0.23, 0.76, 0.25, 0.64, 0.99, 0.71, 0.16, 0.67, 0.77, 0.48, 0.05, 0.92, 0.6, 0.97, 0.69, 0.26, 0.47, 0.88, 1.0, 0.82, 0.11, 0.63, 0.72, 0.95, 0.18, 0.86, 0.86, 0.47, 0.28, 0.79, 0.69, 0.69, 0.44, 0.35, 0.8, 0.73, 0.11, 0.17, 0.68, 0.98, 0.78, 0.8, 0.88, 0.42, 0.46, 0.77, 0.84, 0.02, 0.59, 0.97, 0.57, 0.89, 0.98, 0.04, 0.86, 0.8, 0.9, 0.15, 0.51, 0.48, 0.41, 0.54, 0.98, 0.6, 0.4, 0.42, 0.73, 0.99, 0.01, 0.19, 0.29, 0.47, 0.52, 0.49, 0.77, 0.26, 0.02, 0.9, 0.49, 0.44, 0.12, 0.14]


## Z-Score-normalization
We scale the feature so that transformed features are with an average of zero and standard deviation of one.

\begin{equation} z = \frac{x - \mu}{\sigma}\end{equation} 

In the cell below we will use the *data* list and use the min-max scaling normalisation to normalise it
- *z_score_normal* is the list of scaled output
- *mean* is the minimum value of the data
- *std* is the maximum value of the data

In [4]:
import math 

def mean(column):
    """
    takes input the list of variables from the data
    returns mean of the variables in the list
    """
    sum_ = 0
    for element in column:
        sum_ = sum_ + element
        
    return float(sum_)/len(column)

def std(column):
    """
    takes input the list of variables from the data
    returns standard deviation of the variables in the list
    """
    if len(column) <= 1:
        return 0.0

    mean_data, sd = mean(column), 0.0

    # calculate stan. dev.
    for el in column:
        sd += (float(el) - mean_data)**2
    sd = math.sqrt(sd / float(len(column)-1))

    return sd

In [10]:
z_normal = []

mean_data = mean(data)
std_data = std(data)

for element in data:
    z_norm = float(element - mean_data)/std_data
    z_normal.append(z_norm)

print (z_normal)

[-0.8390076236118431, 0.33748543834386635, 0.3710995258283152, 1.3795221503617805, 0.6063981382194571, -0.0994976989539686, 1.1106094504861896, 0.6400122257039059, -0.20033996140731514, -1.1415344109718826, -1.8810443356297573, -0.9398498860651896, 1.1778376254550873, 0.404713613312764, -0.8726217110962919, -1.8810443356297573, 0.7072404006728036, -1.6121316357541664, -1.1079203234874337, 0.6736263131883548, -1.040692148518536, 0.27025726337496864, 1.4467503253306782, 0.5055558757661105, -1.3432189358785758, 0.3710995258283152, 0.7072404006728036, -0.2675681363762128, -1.712973898207513, 1.2114517129395361, 0.1358009134371733, 1.3795221503617805, 0.43832770079721284, -1.0070780610340873, -0.3011822238606616, 1.0769953630017408, 1.480364412815127, 0.8753108380950478, -1.51128937330082, 0.2366431758905198, 0.5391699632505594, 1.3122939753928828, -1.2759907609096781, 1.0097671880328432, 1.0097671880328432, -0.3011822238606616, -0.9398498860651896, 0.7744685756417012, 0.43832770079721284, 

## Decimal Scaling Normalisation

We normalize by moving the decimal point of values of features. The number of decimal points moved depends on the maximum absolute value in the features, it provides the range between -1 and 1

- Take the maximum number of digits. For eg. 3031, then maximum digits is 4
- Calculate power of 10. 10^4 = 10000.
- Divide each number by 10000.

In [11]:
ds_normal =[]
max_digits = 10**len(str(max(data)))

for element in data:
    ds_norm = float(element)/max_digits
    ds_normal.append(ds_norm)

print (ds_normal)

[0.032, 0.067, 0.068, 0.098, 0.075, 0.054, 0.09, 0.076, 0.051, 0.023, 0.001, 0.029, 0.092, 0.069, 0.031, 0.001, 0.078, 0.009, 0.024, 0.077, 0.026, 0.065, 0.1, 0.072, 0.017, 0.068, 0.078, 0.049, 0.006, 0.093, 0.061, 0.098, 0.07, 0.027, 0.048, 0.089, 0.101, 0.083, 0.012, 0.064, 0.073, 0.096, 0.019, 0.087, 0.087, 0.048, 0.029, 0.08, 0.07, 0.07, 0.045, 0.036, 0.081, 0.074, 0.012, 0.018, 0.069, 0.099, 0.079, 0.081, 0.089, 0.043, 0.047, 0.078, 0.085, 0.003, 0.06, 0.098, 0.058, 0.09, 0.099, 0.005, 0.087, 0.081, 0.091, 0.016, 0.052, 0.049, 0.042, 0.055, 0.099, 0.061, 0.041, 0.043, 0.074, 0.1, 0.002, 0.02, 0.03, 0.048, 0.053, 0.05, 0.078, 0.027, 0.003, 0.091, 0.05, 0.045, 0.013, 0.015]
