**Multiple Linear Regression (MLR)**
is an extension of Simple Linear Regression (SLR) that allows for the modeling of the relationship between a dependent variable and multiple independent variables. In MLR, instead of having just one independent variable, there are two or more independent variables.

The equation for Multiple Linear Regression can be written as:


=

0
+

1

1
+

2

2
+
.
.
.
+




+

Y=β
0
​
 +β
1
​
 X
1
​
 +β
2
​
 X
2
​
 +...+β
n
​
 X
n
​
 +ε

Where:


Y is the dependent variable.

1
,

2
,
.
.
.
,


X
1
​
 ,X
2
​
 ,...,X
n
​
  are the independent variables.

0
β
0
​
  is the y-intercept, representing the value of

Y when all independent variables are zero.

1
,

2
,
.
.
.
,


β
1
​
 ,β
2
​
 ,...,β
n
​
  are the coefficients, representing the change in

Y for a one-unit change in each respective independent variable, holding all other variables constant.

ε is the error term, representing the difference between the observed and predicted values of

Y.
The goal of MLR is to estimate the coefficients (

0
,

1
,

2
,
.
.
.
,


β
0
​
 ,β
1
​
 ,β
2
​
 ,...,β
n
​
  ) that best fit the data. This is typically done using a method such as Ordinary Least Squares (OLS), which minimizes the sum of the squared differences between the observed and predicted values of

Y.

Similar to SLR, MLR allows us to:

Understand the relationship between the dependent variable and multiple independent variables.
Make predictions about the dependent variable based on the values of the independent variables.
Assess the significance and contribution of each independent variable to the model.
In practice, MLR is commonly used in various fields such as economics, finance, social sciences, and engineering, where there are often multiple factors influencing an outcome of interest. It provides a powerful tool for analyzing complex relationships between variables and making informed decisions based on data.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [3]:
df = pd.read_csv("/content/traffic.csv")

In [4]:
df.head()

Unnamed: 0,DateTime,Junction,Vehicles,ID
0,2015-11-01 00:00:00,1,15,20151101001
1,2015-11-01 01:00:00,1,13,20151101011
2,2015-11-01 02:00:00,1,10,20151101021
3,2015-11-01 03:00:00,1,7,20151101031
4,2015-11-01 04:00:00,1,9,20151101041


In [5]:
# Convert 'DateTime' column to datetime
df['DateTime'] = pd.to_datetime(df['DateTime'])

In [6]:
# Split the data into independent (X) and dependent (y) variables
X = df[['Junction', 'ID']]  # independent variables
y = df['Vehicles']           # dependent variable

In [7]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
# Initialize and fit the multiple linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

In [9]:
# Make predictions
y_pred = model.predict(X_test)

In [10]:
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 198.95372630220297


In [11]:
# Print the coefficients
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)

Intercept: -27119.429549989753
Coefficients: [-1.51040785e+01  1.34775801e-06]


In this code, I've modified the X variable to include both Junction and ID columns as independent variables. Then, the multiple linear regression model is initialized and trained on the training data. Predictions are made on the test data, and the Mean Squared Error is calculated to evaluate the model's performance. Finally, the intercept and coefficients of the model are printed.

This code assumes that the ID column is also a numerical feature that you want to include in your regression analysis. You can adjust the independent variables according to your specific dataset and analysis requirements.