Linear models are of the type y = w x + b, where the regression coefficient w represents the expected change in y for a one unit change in x (the predictor). Thus, the magnitude of w is partly determined by the magnitude of the units being used for x

- Multiple predictors x1, x2, ...xn, predictors with greater numeric ranges dominate over those with smaller numeric ranges
- Gradient descent converges faster when all the predictors (x1 to xn) are within a similar scale
- SVM, feature scaling can decrease the time to find the support vectors
- Feature scaling is required for methods that utilise distance calculations like k-nearest neighbours (KNN) and 
  k-means clustering( Euclidean Distance)

The machine learning models affected by the magnitude of the feature are:

    Linear and Logistic Regression
    Neural Networks
    Support Vector Machines
    KNN
    K-means clustering
    Linear Discriminant Analysis (LDA)
    Principal Component Analysis (PCA)

Machine learning models insensitive to feature magnitude are the ones based on Trees:

    Classification and Regression Trees
    Random Forests
    Gradient Boosted Trees


# Titanic Survival Prediction

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

In [3]:
data = pd.read_csv('C:\\Users\\admin\PP_programs\DataRepo\\titanic\\train.csv', usecols = ['Pclass', 'Age', 'Fare', 'Survived'])
data.head()

Unnamed: 0,Survived,Pclass,Age,Fare
0,0,3,22.0,7.25
1,1,1,38.0,71.2833
2,1,3,26.0,7.925
3,1,1,35.0,53.1
4,0,3,35.0,8.05


In [4]:
data.describe()

Unnamed: 0,Survived,Pclass,Age,Fare
count,891.0,891.0,714.0,891.0
mean,0.383838,2.308642,29.699118,32.204208
std,0.486592,0.836071,14.526497,49.693429
min,0.0,1.0,0.42,0.0
25%,0.0,2.0,20.125,7.9104
50%,0.0,3.0,28.0,14.4542
75%,1.0,3.0,38.0,31.0
max,1.0,3.0,80.0,512.3292


In [5]:
#Range calculation = Max - Min
for col in ['Pclass', 'Age', 'Fare']:
    print(col, '_range: ', data[col].max()-data[col].min())

Pclass _range:  2
Age _range:  79.58
Fare _range:  512.3292


In [6]:
X_train, X_test, y_train, y_test = train_test_split(data[['Pclass', 'Age', 'Fare']].fillna(0),
                                                    data.Survived,test_size=0.3,random_state=12)

X_train.shape, X_test.shape

((623, 3), (268, 3))

Standard Scaling : 
X_std = (X - X.min() / (X.max - X.min())

And converting scaled feature back to its initial format:
X_scaled = X_std * (max - min) + min

In [7]:
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [8]:
print('Mean: ', X_train_scaled.mean(axis=0))
print('Standard Deviation: ', X_train_scaled.std(axis=0))
print('Minimum value: ', X_train_scaled.min(axis=0))
print('Maximum value: ', X_train_scaled.max(axis=0))

Mean:  [0.6565008  0.28573616 0.06287472]
Standard Deviation:  [0.4114945  0.21630472 0.09531884]
Minimum value:  [0. 0. 0.]
Maximum value:  [1. 1. 1.]


# Logistic Regression

In [9]:
#Without Scaling
print('Without Scaling')
logit = LogisticRegression(random_state=12, C=1000) # c big to avoid regularization
logit.fit(X_train, y_train)

pred = logit.predict_proba(X_train)
print('Train: Logistic Regression roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))

pred = logit.predict_proba(X_test)
print('Test: Logistic Regression roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))


#With Scaling
print('With Scaling')
logit = LogisticRegression(random_state=12, C=1000) # c big to avoid regularization
logit.fit(X_train_scaled, y_train)

pred = logit.predict_proba(X_train_scaled)
print('Train: Logistic Regression roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))

pred = logit.predict_proba(X_test_scaled)
print('Test: Logistic Regression roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

Without Scaling




Train: Logistic Regression roc-auc: 0.7175422241719676
Test: Logistic Regression roc-auc: 0.70552040401695
With Scaling
Train: Logistic Regression roc-auc: 0.7175422241719677
Test: Logistic Regression roc-auc: 0.70552040401695




Observation : 
    No Impact of scaling can be seen

# SVM

In [10]:
#Without Scaling
SVM_model = SVC(random_state=12, probability=True)
SVM_model.fit(X_train, y_train)
print('Without Scaling')
pred = SVM_model.predict_proba(X_train)
print('Train: SVM roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = SVM_model.predict_proba(X_test)
print('Test: SVM roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

#With Scaling
SVM_model = SVC(random_state=12, probability=True)
SVM_model.fit(X_train_scaled, y_train)
print('With Scaling')
pred = SVM_model.predict_proba(X_train_scaled)
print('Train: SVM roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = SVM_model.predict_proba(X_test_scaled)
print('Test: SVM roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

Without Scaling
Train: SVM roc-auc: 0.8866363237552094
Test: SVM roc-auc: 0.636849132176235
With Scaling
Train: SVM roc-auc: 0.7150636104408862
Test: SVM roc-auc: 0.6875834445927904




Observation : Impact of scaling on Test accuracy can be seen

# Neural Networks

In [13]:
#Without Scaling
NN_model = MLPClassifier(random_state=12, solver='sgd')
NN_model.fit(X_train, y_train)
print('Without Scaling')
pred = NN_model.predict_proba(X_train)
print('Train: Neural Network roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = NN_model.predict_proba(X_test)
print('Test: Neural Network roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

#With Scaling
print('With Scaling')
NN_model = MLPClassifier(random_state=12, solver='sgd')
NN_model.fit(X_train_scaled, y_train)
pred = NN_model.predict_proba(X_train_scaled)
print('Train: Neural Network roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = NN_model.predict_proba(X_test_scaled)
print('Test: Neural Network roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

Without Scaling
Train: Neural Network roc-auc: 0.5553794691818381
Test: Neural Network roc-auc: 0.6535670749405003
With Scaling
Train: Neural Network roc-auc: 0.7216001316078088
Test: Neural Network roc-auc: 0.707784292099611




Observation: Scaling the features improved the performance of the neural network

# K-Nearest Neighbours

In [15]:
#Without Scaling
print('Without Scaling')
KNN = KNeighborsClassifier(n_neighbors=3)
KNN.fit(X_train, y_train)
pred = KNN.predict_proba(X_train)
print('KNN roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = KNN.predict_proba(X_test)
print('KNN roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

#With Scaling
print('With Scaling')
KNN = KNeighborsClassifier(n_neighbors=3)
KNN.fit(X_train_scaled, y_train)
pred = KNN.predict_proba(X_train_scaled)
print('KNN roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = KNN.predict_proba(X_test_scaled)
print('KNN roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

Without Scaling
KNN roc-auc: 0.8724720333406448
KNN roc-auc: 0.6432054333313983
With Scaling
KNN roc-auc: 0.8812294362798858
KNN roc-auc: 0.7090903813780693


# Random Forest

In [16]:
print('Without Scaling')
rf = RandomForestClassifier(n_estimators=700, random_state=12)
rf.fit(X_train, y_train)
pred = rf.predict_proba(X_train)
print('Train: Random Forests roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = rf.predict_proba(X_test)
print('Test: Random Forests roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

print('With Scaling')
rf = RandomForestClassifier(n_estimators=700, random_state=12)
rf.fit(X_train_scaled, y_train)
pred = rf.predict_proba(X_train_scaled)
print('Train: Random Forests roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = rf.predict_proba(X_test_scaled)
print('Test: Random Forests roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

Without Scaling
Train: Random Forests roc-auc: 0.9924270673393288
Test: Random Forests roc-auc: 0.7350380217101062
With Scaling
Train: Random Forests roc-auc: 0.9923831980697522
Test: Random Forests roc-auc: 0.7342833923492192


Observation: No change in performance

# AdaBoost Classifier

In [17]:
print('Without Scaling')
ada = AdaBoostClassifier(n_estimators=250, random_state=12)
ada.fit(X_train, y_train)
pred = ada.predict_proba(X_train)
print('Train: AdaBoost roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = ada.predict_proba(X_test)
print('Test: AdaBoost roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

print('With Scaling')
ada = AdaBoostClassifier(n_estimators=250, random_state=12)
ada.fit(X_train_scaled, y_train)
pred = ada.predict_proba(X_train_scaled)
print('Train: AdaBoost roc-auc: {}'.format(roc_auc_score(y_train, pred[:,1])))
pred = ada.predict_proba(X_test_scaled)
print('Test: AdaBoost roc-auc: {}'.format(roc_auc_score(y_test, pred[:,1])))

Without Scaling
Train: AdaBoost roc-auc: 0.8555878482123273
Test: AdaBoost roc-auc: 0.7204098217913738
With Scaling
Train: AdaBoost roc-auc: 0.8555878482123273
Test: AdaBoost roc-auc: 0.7023567655424624


Observation: No change in performance