# **Mielage Prediction using Regression Analysis**





-------------

## **Objective**

## **Data Source**

The objective of this project is to develop a predictive model for estimating a vehicle's fuel efficiency (mpg) using regression analysis. By analyzing features such as engine specifications (cylinders, displacement, horsepower), vehicle weight, acceleration, model year, and origin, the project aims to identify the most significant factors influencing mileage and provide accurate predictions. This model can serve as a valuable tool for understanding and optimizing vehicle performance.

The dataset used in this project was obtained from the YBI Foundation GitHub repository, which provides diverse and insightful datasets for educational and analytical purposes.

## **Import Library**

In [1]:
import pandas as pd

## **Import Data**

In [None]:
mileage = pd.read_csv('https://raw.githubusercontent.com/YBI-Foundation/Dataset/main/MPG.csv')

## **Describe Data**

In [None]:
mileage.describe()

## **Data Visualization**

In [None]:
mileage.head()

## **Data Preprocessing**

In [None]:
mileage.info()

In [None]:
mileage.shape

In [None]:
mileage.dropna()

## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
mileage.columns

In [None]:
Y = mileage['mpg']
X = mileage[['cylinders', 'displacement', 'horsepower', 'weight',
             'acceleration', 'model_year']]

## **Train Test Split**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=2539)

In [None]:
X_test.head()

## **Modeling**

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

## **Model Evaluation**

In [None]:
# Drop missing values in X_train and align Y_train
X_train = X_train.dropna()
Y_train = Y_train.loc[X_train.index]  # Align Y_train with X_train indices

# Drop missing values in X_test and align Y_test
X_test = X_test.dropna()
Y_test = Y_test.loc[X_test.index]  # Align Y_test with X_test indices
model.fit(X_train, Y_train)

## **Prediction**

In [None]:
Y_pred = model.predict(X_test)

## **Accuracy**

In [None]:
from sklearn.metrics import mean_absolute_percentage_error
mean_absolute_percentage_error(Y_test, Y_pred)

## **Explaination**

This mileage prediction model using linear regression reveals several key insights:
1. Model Performance:
   - The Mean Absolute Percentage Error (MAPE) indicates the average percentage deviation of predictions

2. Feature Importance:
   - The model coefficients show the impact of each feature on MPG
   - Weight and displacement typically have strong negative correlations with MPG
   - Model year generally has a positive correlation, suggesting newer cars tend to be more fuel-efficient

3. Data Quality:
   - Missing values were handled by removal to ensure data quality
   - The dataset provides a good representation of various car characteristics

4. Limitations:
   - The model assumes linear relationships between features and MPG
   - It doesn't account for interaction effects between features
   - The prediction accuracy might vary for cars very different from those in the training data