### Multiple Linear Regression

- takes more than one input variable to predict the output variable
- models the linear relationship between many input variables and a output variable
- target variable(Y) must be the continuous/real, but the independent variable may be of continuous or categorical form.
- Each input variable must model the linear relationship with the dependent variable.
- MLR tries to fit a regression line through a multidimensional space of data-points.

### Assumptions for Multiple Linear Regression:

* A linear relationship should exist between the Target and predictor variables.
* The regression residuals(errors) must be normally distributed.
* MLR assumes little or no multicollinearity (correlation between the independent variable) in data.
* Since It is a regression problem input as well as output data should be in numerical form.

### Implementaton

In [1]:
import numpy as np  
import matplotlib.pyplot as plt 
import pandas as pd  

In [19]:
import seaborn as sns
df=sns.load_dataset('iris')

In [20]:
sns.get_dataset_names

<function seaborn.utils.get_dataset_names()>

In [21]:
df.head(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa


In [22]:
# Preprocessing

In [23]:
from sklearn.preprocessing import StandardScaler, LabelEncoder
ss=StandardScaler()
le=LabelEncoder()

In [24]:
df['species']=le.fit_transform(df['species'])

In [25]:
x=df.drop('species',axis=1)
y=df.species

In [26]:
x[0:2]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2


In [27]:
y[0:5]

0    0
1    0
2    0
3    0
4    0
Name: species, dtype: int32

In [28]:
# train test split

In [29]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest=train_test_split(x,y,test_size=0.20,random_state=42)

In [30]:
# x=ss.fit_transform(x)

In [40]:
from sklearn.linear_model import LinearRegression
model=LinearRegression(fit_intercept=True,
                        normalize=False,
                        copy_X=True,
                        n_jobs=None,
                        positive=False,)

In [32]:
model.fit(xtrain,ytrain)

LinearRegression()

In [33]:
ypred=model.predict(xtest)

In [34]:
xtest[0:1]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
73,6.1,2.8,4.7,1.2


In [35]:
model.predict(xtest[0:1])

array([1.23071715])

In [36]:
from sklearn.metrics import mean_absolute_error,mean_squared_error, r2_score

In [37]:
mean_absolute_error(ytest,ypred)
# since this value ranges from 0 to inf, lower value indicates that model is good fit

0.1463769496530853

In [38]:
# root mean squared error
mse=mean_squared_error(ytest,ypred)
mse**1/2
# since this value ranges from 0 to inf, lower value indicates that model is good fit

0.018556897203988437

In [39]:
r2_score(ytest,ypred)

0.9468960016420045

 
    R_Squared values is closer to 1, we can say our model is a good fit.
    Since it shows percentage of variation explained by model, here it is 94 percent which is too good