# Multiple Linear Regression:

`Multiple Linear Regression` (MLR) is an extension of Simple Linear Regression where we use multiple independent variables (features) to predict a dependent variable (target). It helps in understanding how several factors influence an outcome.

### Assumptions of Multiple Linear Regression:

- Linearity – The relationship between independent variables and the dependent variable is linear.
- Independence – Observations are independent of each other.
- Homoscedasticity – The variance of residuals (errors) is constant across all values of X.
- Normality of Residuals – The residuals (differences between actual and predicted values) follow a normal distribution.
- No Multicollinearity – Independent variables should not be highly correlated with each other.

<img src="images/mlr.png" width='600px'>

- Import neccesary libraries:

In [19]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

- Load dataset:

In [2]:
df = pd.read_csv("Datasets/Modified_Mobile_Price_Data.csv")

- View Data sample

In [3]:
df.sample()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,11930
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,20804
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,21774
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,21011
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,11291


- Dividing the dataset column into X and y as independent and dependent variable.

In [7]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

- Spliting X and y into training and testing sets

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3 , random_state=0)

- View X_train sample

In [11]:
X_train.sample(3)

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
1882,591,0,2.1,1,18,1,16,0.5,196,7,20,952,1726,704,14,5,4,1,1,1
1812,1565,1,0.5,0,0,0,38,0.1,121,5,1,781,1364,308,19,17,7,1,1,1
1010,683,0,0.7,0,5,0,19,0.9,173,4,17,954,1985,2622,16,3,5,0,0,1


- Create LearnRegression object and fit the model 

In [None]:
lr = LinearRegression()

lr.fit(X_train,y_train)

- Prediction

In [17]:
y_pred = lr.predict(X_test)

- Model Evaluation

In [18]:
r2_score(y_test,y_pred)

0.8629357807785693

86.29% accuracy without scaling the data

In [50]:
from sklearn.preprocessing import StandardScaler

In [51]:
s = StandardScaler()

new_X_train = s.fit_transform(X_train)
new_X_test = s.transform(X_test)

In [52]:
new_lr = LinearRegression()

new_lr.fit(new_X_train,y_train)

In [53]:
new_X_train

array([[-0.90893609,  1.01149463, -1.16524008, ...,  0.55534783,
        -0.98722446,  0.98159786],
       [ 0.42822183,  1.01149463,  1.65523128, ...,  0.55534783,
        -0.98722446, -1.01874713],
       [-0.84784765,  1.01149463, -0.30683575, ...,  0.55534783,
         1.01294087,  0.98159786],
       ...,
       [-0.10121125, -0.98863599,  0.55156858, ..., -1.80067327,
        -0.98722446,  0.98159786],
       [-0.09894872, -0.98863599,  1.04208533, ...,  0.55534783,
         1.01294087,  0.98159786],
       [-1.19627797, -0.98863599, -1.28786926, ...,  0.55534783,
        -0.98722446,  0.98159786]])

In [54]:
new_X_test

array([[ 0.49609787,  1.01149463, -1.28786926, ...,  0.55534783,
         1.01294087, -1.01874713],
       [-0.32293964,  1.01149463, -1.28786926, ..., -1.80067327,
         1.01294087, -1.01874713],
       [ 0.65447529,  1.01149463,  0.3063102 , ...,  0.55534783,
        -0.98722446,  0.98159786],
       ...,
       [ 1.1001946 ,  1.01149463, -1.28786926, ...,  0.55534783,
         1.01294087,  0.98159786],
       [-1.47683226, -0.98863599, -1.28786926, ..., -1.80067327,
         1.01294087,  0.98159786],
       [-1.51982042,  1.01149463, -1.28786926, ...,  0.55534783,
         1.01294087, -1.01874713]])

In [55]:
new_y_pred = new_lr.predict(new_X_test)

In [56]:
r2_score(y_test,new_y_pred)

0.8629357807785694