# **Mielage Prediction using Regression Analysis**





-------------

## **Objective**

## **Data Source**

The objective of this project is to develop a predictive model for estimating a vehicle's fuel efficiency (mpg) using regression analysis. By analyzing features such as engine specifications (cylinders, displacement, horsepower), vehicle weight, acceleration, model year, and origin, the project aims to identify the most significant factors influencing mileage and provide accurate predictions. This model can serve as a valuable tool for understanding and optimizing vehicle performance.

The dataset used in this project was obtained from the YBI Foundation GitHub repository, which provides diverse and insightful datasets for educational and analytical purposes.

## **Import Library**

In [49]:
import pandas as pd

## **Import Data**

In [50]:
mileage = pd.read_csv('https://raw.githubusercontent.com/YBI-Foundation/Dataset/main/MPG.csv')

URLError: <urlopen error [Errno 11001] getaddrinfo failed>

## **Describe Data**

In [None]:
mileage.describe()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year
count,398.0,398.0,398.0,392.0,398.0,398.0,398.0
mean,23.514573,5.454774,193.425879,104.469388,2970.424623,15.56809,76.01005
std,7.815984,1.701004,104.269838,38.49116,846.841774,2.757689,3.697627
min,9.0,3.0,68.0,46.0,1613.0,8.0,70.0
25%,17.5,4.0,104.25,75.0,2223.75,13.825,73.0
50%,23.0,4.0,148.5,93.5,2803.5,15.5,76.0
75%,29.0,8.0,262.0,126.0,3608.0,17.175,79.0
max,46.6,8.0,455.0,230.0,5140.0,24.8,82.0


## **Data Visualization**

In [None]:
mileage.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino


## **Data Preprocessing**

In [None]:
mileage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    392 non-null    float64
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    object 
 8   name          398 non-null    object 
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB


In [None]:
mileage.shape

(398, 9)

In [None]:
mileage.dropna()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino
...,...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,usa,ford mustang gl
394,44.0,4,97.0,52.0,2130,24.6,82,europe,vw pickup
395,32.0,4,135.0,84.0,2295,11.6,82,usa,dodge rampage
396,28.0,4,120.0,79.0,2625,18.6,82,usa,ford ranger


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
mileage.columns

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'model_year', 'origin', 'name'],
      dtype='object')

In [None]:
Y = mileage['mpg']
X = mileage[['cylinders', 'displacement', 'horsepower', 'weight',
             'acceleration', 'model_year']]

## **Train Test Split**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=2539)

In [None]:
X_test.head()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model_year
59,4,97.0,54.0,2254,23.5,72
378,4,105.0,63.0,2125,14.7,82
208,8,318.0,150.0,3940,13.2,76
181,4,91.0,53.0,1795,17.5,75
83,4,98.0,80.0,2164,15.0,72


## **Modeling**

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

## **Model Evaluation**

In [None]:
# Drop missing values in X_train and align Y_train
X_train = X_train.dropna()
Y_train = Y_train.loc[X_train.index]  # Align Y_train with X_train indices

# Drop missing values in X_test and align Y_test
X_test = X_test.dropna()
Y_test = Y_test.loc[X_test.index]  # Align Y_test with X_test indices
model.fit(X_train, Y_train)

## **Prediction**

In [None]:
Y_pred = model.predict(X_test)

## **Accuracy**

In [None]:
from sklearn.metrics import mean_absolute_percentage_error
mean_absolute_percentage_error(Y_test, Y_pred)

np.float64(0.13800006712859111)

## **Explaination**

This mileage prediction model using linear regression reveals several key insights:
1. Model Performance:
   - The Mean Absolute Percentage Error (MAPE) indicates the average percentage deviation of predictions

2. Feature Importance:
   - The model coefficients show the impact of each feature on MPG
   - Weight and displacement typically have strong negative correlations with MPG
   - Model year generally has a positive correlation, suggesting newer cars tend to be more fuel-efficient

3. Data Quality:
   - Missing values were handled by removal to ensure data quality
   - The dataset provides a good representation of various car characteristics

4. Limitations:
   - The model assumes linear relationships between features and MPG
   - It doesn't account for interaction effects between features
   - The prediction accuracy might vary for cars very different from those in the training data