## <center>Preprocessing Methods</center>

- binaraization
- scaling
- normalization
- mean removal etc.

### 1. Binarization

In [1]:
from sklearn import preprocessing
import numpy as np
data = np.array([[2.2, 5.9, -1.8], [5.4, -3.2, -5.1], [-1.9, 4.2, 3.2]])

In [2]:
bindata = preprocessing.Binarizer(threshold=1.5).transform(data)
print('Threshold array:\n\n', bindata)

Threshold array:

 [[1. 1. 0.]
 [1. 0. 0.]
 [0. 1. 1.]]


### 2. Mean Removal

In [7]:
print('Mean (before)', data.mean(axis=0))
print('Standard Deviaton(before)', data.std(axis=0))

Mean (before) [ 2.1        -0.96666667  1.83333333]
Standard Deviaton(before) [3.14430702 4.5682479  2.67124607]


In [9]:
scaled_data =  preprocessing.scale(data)

print('Mean (after)', scaled_data.mean(axis=0))
print('Standard Deviation', scaled_data.std(axis=0))

Mean (after) [0.00000000e+00 0.00000000e+00 7.40148683e-17]
Standard Deviation [1. 1. 1.]


### 3.Scaling
- StandardScaler -> features with mean=0 and variance=1
- MinMaxScaler -> features in a 0 to 1 range  
- Normalizer -> feature vector to an Euclidean length=1

In [10]:
data

array([[ 2.2,  5.9, -1.8],
       [ 5.4, -3.2, -5.1],
       [-1.9,  4.2,  3.2]])

In [18]:
minmax_scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
data_minmax = minmax_scaler.fit_transform(data)
print('MinMaxScaler applied on the data:\n\n',data_minmax)

MinMaxScaler applied on the data:

 [[0.56164384 1.         0.39759036]
 [1.         0.         0.        ]
 [0.         0.81318681 1.        ]]


In [19]:
standard_scaler = preprocessing.StandardScaler().fit(data)
data_standard = standard_scaler.transform(data)
print('StandardScaler applied on the data:\n\n',data_standard)

StandardScaler applied on the data:

 [[ 0.10040991  0.91127074 -0.16607709]
 [ 1.171449   -1.39221918 -1.1332319 ]
 [-1.27185891  0.48094844  1.29930899]]


In [20]:
normalizer = preprocessing.Normalizer()
data_normalizer = normalizer.fit_transform(data)
print('Normalizer applied  on the data:\n\n', data_normalizer)

Normalizer applied  on the data:

 [[ 0.3359268   0.90089461 -0.2748492 ]
 [ 0.6676851  -0.39566524 -0.63059148]
 [-0.33858465  0.74845029  0.57024784]]


### 3. Normalization
-- bring the feature vector on common scale

- L1 - Least Absolute Deviation - sum of absolute values(on each row)= 1; it is insensitive to outliers
- L2 - Least Squares - sum of squares (on each row)= 1; takes outliers in consideration during training

In [21]:
data

array([[ 2.2,  5.9, -1.8],
       [ 5.4, -3.2, -5.1],
       [-1.9,  4.2,  3.2]])

In [24]:
data_l1 = preprocessing.normalize(data, norm='l1')
data_l2 = preprocessing.normalize(data, norm='l2')

print('L1-normalized data:\n', data_l1)
print('\nL2-normalized data:\n', data_l2)

L1-normalized data:
 [[ 0.22222222  0.5959596  -0.18181818]
 [ 0.39416058 -0.23357664 -0.37226277]
 [-0.20430108  0.4516129   0.34408602]]

L2-normalized data:
 [[ 0.3359268   0.90089461 -0.2748492 ]
 [ 0.6676851  -0.39566524 -0.63059148]
 [-0.33858465  0.74845029  0.57024784]]
