 Simple linear regression is a statistical method used to model the relationship between two quantitative variables: one independent variable (often denoted as

X) and one dependent variable (often denoted as

Y). The goal of simple linear regression is to find the best-fitting straight line that describes the relationship between these two variables.

The equation of a straight line can be expressed as:


=

0
+

1

+

Y=β
0
​
 +β
1
​
 X+ε

Where:


Y is the dependent variable (the variable we want to predict).

X is the independent variable (the variable we use to make predictions).

0
β
0
​
  is the y-intercept, which represents the value of

Y when

X is zero.

1
β
1
​
  is the slope of the line, which represents the change in

Y for a one-unit change in

X.

ε is the error term, representing the difference between the observed and predicted values of

Y.
In simple linear regression, we aim to estimate the values of

0
β
0
​
  and

1
β
1
​
  such that the line fits the data points as closely as possible. This is typically done using a method called least squares, which minimizes the sum of the squared differences between the observed and predicted values of

Y.

Once we have estimated the values of

0
β
0
​
  and

1
β
1
​
 , we can use the regression equation to make predictions for the dependent variable

Y based on new values of the independent variable

X.

Overall, simple linear regression provides a way to quantify the relationship between two variables and make predictions based on that relationship. It's a foundational tool in statistics and data analysis.

In [4]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error


Loading the Dataset: In this step, you're importing the necessary libraries (pandas) and creating a DataFrame from your dataset. The DataFrame is a tabular data structure provided by the pandas library, which allows for easy manipulation and analysis of data.

In [5]:

data=pd.read_csv("/content/traffic.csv")

In [6]:
data.head()

Unnamed: 0,DateTime,Junction,Vehicles,ID
0,2015-11-01 00:00:00,1,15,20151101001
1,2015-11-01 01:00:00,1,13,20151101011
2,2015-11-01 02:00:00,1,10,20151101021
3,2015-11-01 03:00:00,1,7,20151101031
4,2015-11-01 04:00:00,1,9,20151101041


Data Preprocessing:

DateTime Conversion: Since the DateTime column represents dates and times, it's converted to datetime format using pd.to_datetime(). This allows for easier handling and manipulation of date and time data.
Splitting the Data: The dataset is split into independent variable(s) (X) and dependent variable(s) (y). In this case, Junction is chosen as the independent variable and

In [9]:
# Convert 'DateTime' column to datetime
data['DateTime'] = pd.to_datetime(data['DateTime'])

Splitting into Training and Testing Sets: This step divides the data into two subsets: the training set and the testing set. The training set is used to train the model, while the testing set is used to evaluate its performance. This is achieved using train_test_split() from sklearn.model_selection.

In [11]:
# Split the data into independent (X) and dependent (y) variables
X = data[['Junction']]  # independent variable
y = data['Vehicles']     # dependent variable

In [12]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Initializing and Fitting the Linear Regression Model:

An instance of the LinearRegression model from sklearn.linear_model is created.
The model is trained (or "fit") using the training data. This means finding the coefficients (slope and intercept) that best fit the data.

In [13]:
# Initialize and fit the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

Making Predictions: After the model is trained, it's used to make predictions on the independent variable(s) of the testing set using the predict() method.

In [14]:
# Make predictions
y_pred = model.predict(X_test)

Evaluating Model Performance:

Mean Squared Error (MSE): It measures the average of the squares of the errors (the difference between actual and predicted values). Lower MSE values indicate better fit.
The coefficients (intercept_ and coef_): These represent the y-intercept and slope of the regression line, respectively. In this case, intercept_ represents the baseline number of vehicles when Junction is zero, and coef_ represents how much the number of vehicles changes with each unit increase in Junction.

In [15]:
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 254.93967005535328


In [16]:
# Print the coefficients
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)

Intercept: 51.8271137186218
Coefficient: [-13.26882258]


This code loads the dataset into a pandas DataFrame, converts the 'DateTime' column to datetime format, splits the data into independent (X) and dependent (y) variables, and then fits a linear regression model to the training data. Finally, it makes predictions on the test data and calculates the Mean Squared Error to evaluate the model's performance.