## üìå Used Car Price Prediction ‚Äî Project Overview

## üß© Problem Statement
The objective of this project is to develop a robust **AdaBoost Regression Model** to predict the **selling price** of used cars based on historical data from CarDekho. This model helps potential sellers and buyers estimate a fair market price by analyzing vehicle specifications and usage history.

---

## üìä Dataset Information

## üè∑Ô∏è Dataset Name  
CarDekho Used Car Dataset

## üåê Source  
Scraped data from cardekho.com (India)

---

## üìê Dataset Shape
- **Total Rows:** 15,411  
- **Total Columns:** 13  

---

## üß¨ Dataset Columns
- **Car_Name** ‚Äì Name of the car model
- **Brand** ‚Äì Manufacturer (Maruti, Hyundai, etc.)
- **Model** ‚Äì Specific variant
- **Vehicle_Age** ‚Äì Age of the car in years
- **Km_Driven** ‚Äì Total distance covered ($km$)
- **Seller_Type** ‚Äì Individual or Dealer
- **Fuel_Type** ‚Äì Petrol, Diesel, CNG, or LPG
- **Transmission_type** ‚Äì Manual or Automatic
- **Mileage** ‚Äì Fuel efficiency ($kmpl$)
- **Engine** ‚Äì Displacement capacity ($CC$)
- **Max_Power** ‚Äì Maximum power output ($bhp$)
- **Seats** ‚Äì Passenger capacity
- **Selling_Price** ‚Äì **Target Label** (Price in INR)

---

## üéØ Features and Target Used in This Project

## üîπ Features (X)
We use a combination of numerical and categorical features:
- Vehicle Age, KM Driven, Fuel Type, Transmission, Engine CC, Max Power, etc.

## üéØ Target (y)
- **Selling_Price** (Continuous numerical value)

---

## üìö Steps Performed in the Notebook

1. **Data Collection:** Importing the dataset consisting of 15,411 rows.
2. **Exploratory Data Analysis (EDA):** Visualizing price trends and feature distributions.
3. **Data Preprocessing:** - Handling missing values.
   - Categorical encoding (One-Hot Encoding/Label Encoding).
   - Scaling features using `StandardScaler`.
4. **Model Building:** Implementing the **AdaBoost Regressor** (Ensemble Learning).
5. **Pipelines:** Using `ColumnTransformer` for organized preprocessing.
6. **Evaluation:** Measuring performance using metrics like **R-Squared ($R^2$)**, **Mean Absolute Error (MAE)**, and **Root Mean Squared Error (RMSE)**.

---

## Used Car Price Prediction 
 ### 1) Problem Statement 
 . This dataset comprises used cars sold on cardekho.com in india as well important features of these cars.
 
 . If user can predict the price of the care based on input features.
 
 . Prediction result can be used to give new seller the price sugestion based on market condition 


 ### 2) Data Collection . 
 . The Dataset is collected from scrapping from cardheko website 
 
 . The Data consist of 13 column and 15411 rows

In [2]:
## importing important libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings

warnings.filterwarnings("ignore")

%matplotlib inline

In [13]:
data=pd.read_csv('cardekho_imputated.csv')
data.head()

Unnamed: 0.1,Unnamed: 0,car_name,brand,model,vehicle_age,km_driven,seller_type,fuel_type,transmission_type,mileage,engine,max_power,seats,selling_price
0,0,Maruti Alto,Maruti,Alto,9,120000,Individual,Petrol,Manual,19.7,796,46.3,5,120000
1,1,Hyundai Grand,Hyundai,Grand,5,20000,Individual,Petrol,Manual,18.9,1197,82.0,5,550000
2,2,Hyundai i20,Hyundai,i20,11,60000,Individual,Petrol,Manual,17.0,1197,80.0,5,215000
3,3,Maruti Alto,Maruti,Alto,9,37000,Individual,Petrol,Manual,20.92,998,67.1,5,226000
4,4,Ford Ecosport,Ford,Ecosport,6,30000,Dealer,Diesel,Manual,22.77,1498,98.59,5,570000


In [14]:
## Checking the null value 
data.isnull().sum()

Unnamed: 0           0
car_name             0
brand                0
model                0
vehicle_age          0
km_driven            0
seller_type          0
fuel_type            0
transmission_type    0
mileage              0
engine               0
max_power            0
seats                0
selling_price        0
dtype: int64

In [17]:
## Remove the Unnessary columns 

data.drop('car_name',axis=1,inplace=True)
data.drop('brand',axis=1,inplace=True)

In [20]:
data.drop('Unnamed: 0',axis=1,inplace=True)

In [21]:
data.head()

Unnamed: 0,model,vehicle_age,km_driven,seller_type,fuel_type,transmission_type,mileage,engine,max_power,seats,selling_price
0,Alto,9,120000,Individual,Petrol,Manual,19.7,796,46.3,5,120000
1,Grand,5,20000,Individual,Petrol,Manual,18.9,1197,82.0,5,550000
2,i20,11,60000,Individual,Petrol,Manual,17.0,1197,80.0,5,215000
3,Alto,9,37000,Individual,Petrol,Manual,20.92,998,67.1,5,226000
4,Ecosport,6,30000,Dealer,Diesel,Manual,22.77,1498,98.59,5,570000


In [23]:
len(data['model'].unique())


120

In [27]:
data['model'].unique()

array(['Alto', 'Grand', 'i20', 'Ecosport', 'Wagon R', 'i10', 'Venue',
       'Swift', 'Verna', 'Duster', 'Cooper', 'Ciaz', 'C-Class', 'Innova',
       'Baleno', 'Swift Dzire', 'Vento', 'Creta', 'City', 'Bolero',
       'Fortuner', 'KWID', 'Amaze', 'Santro', 'XUV500', 'KUV100', 'Ignis',
       'RediGO', 'Scorpio', 'Marazzo', 'Aspire', 'Figo', 'Vitara',
       'Tiago', 'Polo', 'Seltos', 'Celerio', 'GO', '5', 'CR-V',
       'Endeavour', 'KUV', 'Jazz', '3', 'A4', 'Tigor', 'Ertiga', 'Safari',
       'Thar', 'Hexa', 'Rover', 'Eeco', 'A6', 'E-Class', 'Q7', 'Z4', '6',
       'XF', 'X5', 'Hector', 'Civic', 'D-Max', 'Cayenne', 'X1', 'Rapid',
       'Freestyle', 'Superb', 'Nexon', 'XUV300', 'Dzire VXI', 'S90',
       'WR-V', 'XL6', 'Triber', 'ES', 'Wrangler', 'Camry', 'Elantra',
       'Yaris', 'GL-Class', '7', 'S-Presso', 'Dzire LXI', 'Aura', 'XC',
       'Ghibli', 'Continental', 'CR', 'Kicks', 'S-Class', 'Tucson',
       'Harrier', 'X3', 'Octavia', 'Compass', 'CLS', 'redi-GO', 'Glanza',
       

In [28]:
x=data.drop('selling_price',axis=1)
y=data['selling_price']

In [29]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=41)

In [30]:
## Create column Transformer with 3 types of transformers 
cat_features=x.select_dtypes(include='object').columns
num_features=x.select_dtypes(exclude='object').columns

from sklearn.preprocessing import OneHotEncoder, StandardScaler 
from sklearn.compose import ColumnTransformer 

numeric_transformer = StandardScaler()
oh_transformer=OneHotEncoder(drop='first')
preprocessor=ColumnTransformer(
    [
        ('OneHotEncoder',oh_transformer,cat_features),
        ('StandardScaler',numeric_transformer,num_features)
        
    ]
)

In [31]:
preprocessor

0,1,2
,transformers,"[('OneHotEncoder', ...), ('StandardScaler', ...)]"
,remainder,'drop'
,sparse_threshold,0.3
,n_jobs,
,transformer_weights,
,verbose,False
,verbose_feature_names_out,True
,force_int_remainder_cols,'deprecated'

0,1,2
,categories,'auto'
,drop,'first'
,sparse_output,True
,dtype,<class 'numpy.float64'>
,handle_unknown,'error'
,min_frequency,
,max_categories,
,feature_name_combiner,'concat'

0,1,2
,copy,True
,with_mean,True
,with_std,True


In [32]:
# Applying Transformatioin in training (fit_transformar)
x_train=preprocessor.fit_transform(x_train)
pd.DataFrame(x_train)

Unnamed: 0,0
0,<Compressed Sparse Row sparse matrix of dtype ...
1,<Compressed Sparse Row sparse matrix of dtype ...
2,<Compressed Sparse Row sparse matrix of dtype ...
3,<Compressed Sparse Row sparse matrix of dtype ...
4,<Compressed Sparse Row sparse matrix of dtype ...
...,...
11553,<Compressed Sparse Row sparse matrix of dtype ...
11554,<Compressed Sparse Row sparse matrix of dtype ...
11555,<Compressed Sparse Row sparse matrix of dtype ...
11556,<Compressed Sparse Row sparse matrix of dtype ...
