This is an Example to show how standardizing/feature_scaling is performed

## Importing relevant Libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

In [2]:
data = pd.read_csv("Downloads/original.csv")

In [5]:
data.head()

Unnamed: 0,SAT,"Rand 1,2,3",GPA
0,1714,1,2.4
1,1664,3,2.52
2,1760,3,2.54
3,1685,3,2.74
4,1693,2,2.83


As you can see one feature 'SAT' has the highest number and another feature 'GPA', 'Rand 1,2,3' is low, so it is necessary to perform feature scaling/standardization to get the good model.

## Declare dependent and independent variables

In [3]:
x = data[['SAT', 'Rand 1,2,3']]
y = data["GPA"]

In [4]:
data.axes

[RangeIndex(start=0, stop=84, step=1),
 Index(['SAT', 'Rand 1,2,3', 'GPA'], dtype='object')]

## Standardization

In [62]:
from sklearn.preprocessing import StandardScaler

In [63]:
scaler = StandardScaler()

In [64]:
scaler.fit(x)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [65]:
x_scaled = scaler.transform(x)

In [66]:
x_scaled

array([[-1.26338288, -1.24637147],
       [-1.74458431,  1.10632974],
       [-0.82067757,  1.10632974],
       [-1.54247971,  1.10632974],
       [-1.46548748, -0.07002087],
       [-1.68684014, -1.24637147],
       [-0.78218146, -0.07002087],
       [-0.78218146, -1.24637147],
       [-0.51270866, -0.07002087],
       [ 0.04548499,  1.10632974],
       [-1.06127829,  1.10632974],
       [-0.67631715, -0.07002087],
       [-1.06127829, -1.24637147],
       [-1.28263094,  1.10632974],
       [-0.6955652 , -0.07002087],
       [ 0.25721362, -0.07002087],
       [-0.86879772,  1.10632974],
       [-1.64834403, -0.07002087],
       [-0.03150724,  1.10632974],
       [-0.57045283,  1.10632974],
       [-0.81105355,  1.10632974],
       [-1.18639066,  1.10632974],
       [-1.75420834,  1.10632974],
       [-1.52323165, -1.24637147],
       [ 1.23886453, -1.24637147],
       [-0.18549169, -1.24637147],
       [-0.5608288 , -1.24637147],
       [-0.23361183,  1.10632974],
       [ 1.68156984,

## Regression with the Scaled feature

In [67]:
reg = LinearRegression()
reg.fit(x_scaled,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

## Summery

In [68]:
summery = pd.DataFrame(data= [["Intercept"],["SAT"],["RAND 1,2,3"]],columns=["Features"])
summery["Weights"]=reg.intercept_,reg.coef_[0],reg.coef_[1]
summery

Unnamed: 0,Features,Weights
0,Intercept,3.330238
1,SAT,0.171814
2,"RAND 1,2,3",-0.00703


In [69]:
summery.set_value(0,"Features","Bias")

  """Entry point for launching an IPython kernel.


Unnamed: 0,Features,Weights
0,Bias,3.330238
1,SAT,0.171814
2,"RAND 1,2,3",-0.00703


In [70]:
summery

Unnamed: 0,Features,Weights
0,Bias,3.330238
1,SAT,0.171814
2,"RAND 1,2,3",-0.00703


## Making Prediction

In [71]:
new_data = pd.DataFrame(data = [[1700,2],[1800,1]],columns = ["SAT","Rand 1,2,3"])
new_data

Unnamed: 0,SAT,"Rand 1,2,3"
0,1700,2
1,1800,1


In [72]:
reg.predict(new_data)

array([295.39979563, 312.58821497])

the output is far from what we expecting it is because we didnt standardized the input

In [73]:
scaled_newdata = scaler.transform(new_data)

In [82]:
dt =scaled_newdata[:,0].reshape(-1,1)
dt

array([[-1.39811928],
       [-0.43571643]])

In [75]:
reg.predict(scaled_newdata)

array([3.09051403, 3.26413803])

In [76]:
reg_simple = LinearRegression()

In [85]:
x_simple = x_scaled[:,0].reshape(-1,1)

In [86]:
reg_simple.fit(x_simple,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [87]:
reg_simple.predict(dt)

array([3.08970998, 3.25527879])