# Multivariable Linear Regression:

In simple linear regression, we were only able to consider one input feature for predicting the value of the output feature. However, in Multivariable Linear Regression, we can predict the output based on more than one input feature. Here is the formula for multivariable linear regression.
    
        Y = b0 + b1*X1 + b2*X2 + ... +bn*Xn
        
   where,                                        
       
       b0         = constant or y intercept of line                                       
       b1,b2,bn   = coefficent of input features                                                                   
       X1,X2,Xn   = input features                                                                        
       y          = output                                                     

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

In [3]:
data = pd.read_csv("D:/CSV_Data/CO2_Emissions.csv")
data.head()


Unnamed: 0,Make,Model,Vehicle Class,Engine Size,Cylinders,Transmission,Fuel Type,Fuel Consumption City,Fuel Consumption Hwy,Fuel Consumption Comb,Fuel Consumption Comb.1,CO2 Emissions
0,ACURA,ILX,COMPACT,2.0,4,AS5,Z,9.9,6.7,8.5,33,196
1,ACURA,ILX,COMPACT,2.4,4,M6,Z,11.2,7.7,9.6,29,221
2,ACURA,ILX HYBRID,COMPACT,1.5,4,AV7,Z,6.0,5.8,5.9,48,136
3,ACURA,MDX 4WD,SUV - SMALL,3.5,6,AS6,Z,12.7,9.1,11.1,25,255
4,ACURA,RDX AWD,SUV - SMALL,3.5,6,AS6,Z,12.1,8.7,10.6,27,244


# Define X and Y:

X stores the input features we want to consider, and Y stores the value of output.

In [4]:
X = data[['Engine Size', 'Cylinders', 'Fuel Consumption City', 'Fuel Consumption Hwy', 'Fuel Consumption Comb',
          'Fuel Consumption Comb.1']]
Y = data['CO2 Emissions']


In [5]:
# divide data into training and testing data set

train = data[:(int((len(data)*0.8)))]
test = data[(int((len(data)*0.8))):]               

In [6]:
reg = linear_model.LinearRegression()

train_x = np.array(train[['Engine Size', 'Cylinders', 'Fuel Consumption City', 'Fuel Consumption Hwy', 'Fuel Consumption Comb',
          'Fuel Consumption Comb.1']])

train_y = np.array(train['CO2 Emissions'])

test_x = np.array(test[['Engine Size', 'Cylinders', 'Fuel Consumption City', 'Fuel Consumption Hwy', 'Fuel Consumption Comb',
          'Fuel Consumption Comb.1']])
test_y = np.array(test['CO2 Emissions'])

reg.fit(train_x,train_y)

LinearRegression()

In [9]:
# coefficent value

coeff_data = pd.DataFrame(reg.coef_, X.columns, columns=['Coefficents'])
coeff_data

Unnamed: 0,Coefficents
Engine Size,5.80969
Cylinders,7.536303
Fuel Consumption City,-1.21528
Fuel Consumption Hwy,3.597912
Fuel Consumption Comb,2.807875
Fuel Consumption Comb.1,-3.618275


In [11]:
# predict values

y_pred = reg.predict(test_x)
print(y_pred)

[187.82712262 197.49966659 202.31530253 ... 230.31945899 221.48802535
 234.81303532]


In [12]:
# Accuracy of model

from sklearn.metrics import r2_score

R = r2_score(test_y, y_pred)

print("R^2 : ",R)

R^2 :  0.933739442242242


Now notice that here we used the same dataset for simple and multivariable linear regression.              
We can notice that the accuracy of multivariable linear regression is far better than                      
the accuracy of simple linear regression.