# Multiple Linear Regression with Standardization, Feature Scaling and then making predictions.

In [231]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

In [232]:
bit = pd.read_csv('Fit_Data.csv')

In [233]:
bit.head(5)

Unnamed: 0,Calories Burned,Steps,Distance,Floors,Minutes Sedentary,Minutes Lightly Active(Fat Burn),Minutes Fairly Active(Cardio),Minutes Very Active(Peak),Activity Calories,Minutes Asleep,Minutes Awake,Number of Awakenings,Time in Bed,Minutes REM Sleep,Minutes Light Sleep,Minutes Deep Sleep
0,2736,10201,4.46,6,633,341,10,24,1482,334,98,20,432,47.0,242.0,45.0
1,2637,9539,4.25,4,608,292,31,2,1302,414,93,33,507,50.0,346.0,18.0
2,2656,11394,4.75,5,750,242,32,27,1328,331,58,27,389,31.0,278.0,22.0
3,2934,17150,7.2,6,541,294,16,36,1657,464,89,36,553,84.0,341.0,39.0
4,2961,18607,7.82,11,452,270,18,48,1651,526,126,46,652,79.0,401.0,46.0


In [251]:
x = bit[['Time in Bed', 'Number of Awakenings']]
y = bit['Distance']

In [252]:
scaler = StandardScaler()

In [253]:
scaler.fit(x)

StandardScaler()

In [254]:
x_scaled = scaler.transform(x)

### Regression with scaled features

In [255]:
reg = LinearRegression()
reg.fit(x_scaled,y)

LinearRegression()

In [256]:
reg.coef_

array([ 0.31549928, -0.65360951])

In [257]:
reg.intercept_

4.717136150234742

### Creating a summary table

In [261]:
reg_summary = pd.DataFrame([['Bias'],['Time in Bed'],['Number of Awakenings']], columns=['Features'])
reg_summary['Weights'] = reg.intercept_, reg.coef_[0], reg.coef_[1]

In [262]:
reg_summary

Unnamed: 0,Features,Weights
0,Bias,4.717136
1,Time in Bed,0.315499
2,Number of Awakenings,-0.65361


## Making predictions with standardiized coefficients (weights)

### Lets use the prediction feature to predict the distance I would travel if I had spent 6 hours in bed and woke up 8 times or 4 hours in bed and woke up 15 times the night before.

In [264]:
new_bit = pd.DataFrame(data=[[360, 8], [240, 15]], columns = ['Time in Bed', 'Number of Awakenings'])
new_bit

Unnamed: 0,Time in Bed,Number of Awakenings
0,360,8
1,240,15


In [265]:
reg.predict(new_bit)

array([113.06799953,  70.63281984])

When we do not scale our numbers we get unrealistic results.

In [266]:
new_data_scaled = scaler.transform(new_bit)
new_data_scaled

array([[-0.36163251, -1.79342806],
       [-1.35511132, -1.00695711]])

In [267]:
reg.predict(new_data_scaled)

array([5.77524298, 4.94775625])

The scaled data gives us numbers that are more realistic.

### What if we remove the remove the **Time in Bed** variable?

In [270]:
reg_simple = LinearRegression()
x_simple_matrix = x_scaled[:,0].reshape(-1,1)
reg_simple.fit(x_simple_matrix,y)

LinearRegression()

In [273]:
reg_simple.predict(new_data_scaled[:,0].reshape(-1,1))

array([4.80870251, 5.06025413])

The prediction is seen to differ by .9 on and .1 respectively without the influence of **Time in Bed**. 