# Predicting Used Car Prices in Dubizzle

We will be using [this Kaggle Dataset](https://www.kaggle.com/datasets/alihassankp/dubizzle-used-car-sale-data) to analyze used car prices on Dubizzle.

## About the Dataset

*Dubizzle is the UAE'S (Middle east country) favorite marketplace to buy, sell and find anything. In this dataset I scrapped almost all data from Dubizzle related to automobile selling. This data can be used for finding interesting fact and correlation between different kind brands, resell value of a specific car related to year and more. Enjoy and explore.*



**NOTES TO SELF**

- Should explore Dubizzle. What is this site? When someone posts a car, where are they typically located?
- The *mechanical condition* of the car: Is that a 3rd party telling the condition or the seller?
- The make and model of the car is not included in the data, but can be pulled from the *title*. Should write a program to do that.
- What is *regional specs*?


- Approx. currency rate from AED to Dollars is 1 AED to 0.27 Dollars.

In [61]:
from matplotlib import pyplot as plt

import numpy as np
import pandas as pd

In [62]:
data = pd.read_csv('./data.csv', thousands=',')

## Initial Exploration of the Data



In [63]:
data.shape

(9970, 20)

In [64]:
data.columns

Index(['title', 'price_in_aed', 'kilometers', 'body_condition',
       'mechanical_condition', 'seller_type', 'body_type', 'no_of_cylinders',
       'transmission_type', 'regional_specs', 'horsepower', 'fuel_type',
       'steering_side', 'year', 'color', 'emirate', 'motors_trim', 'company',
       'model', 'date_posted'],
      dtype='object')

In [65]:
# Seeding so that our analysis is consistent with the results we get on
# subsequent runs of the notebook.

np.random.seed(19)

N = len(data)
train_mask = np.random.rand(N) < 0.9
train_data = data[train_mask]
test_data = data[~train_mask]

N_TRAIN = len(train_data)
N_TEST = len(test_data)

f'Training Data: {N_TRAIN}, Test Data: {N_TEST}'


'Training Data: 9029, Test Data: 941'

In [66]:
train_data.head()

Unnamed: 0,title,price_in_aed,kilometers,body_condition,mechanical_condition,seller_type,body_type,no_of_cylinders,transmission_type,regional_specs,horsepower,fuel_type,steering_side,year,color,emirate,motors_trim,company,model,date_posted
0,MITSUBISHI PAJERO 3.5L / 2013,26000,167390,Perfect inside and out,Perfect inside and out,Dealer,SUV,6,Automatic Transmission,GCC Specs,Unknown,Gasoline,Left Hand Side,2013.0,Silver,Dubai,GLS,mitsubishi,pajero,13/05/2022
1,chevrolet silverado,110000,39000,Perfect inside and out,Perfect inside and out,Dealer,SUV,8,Automatic Transmission,North American Specs,400 - 500 HP,Gasoline,Left Hand Side,2018.0,White,Sharjah,1500 High Country,chevrolet,silverado,14/01/2022
2,MERCEDES-BENZ E300 - 2014 - GCC SPEC - FULL OP...,78000,200000,Perfect inside and out,Perfect inside and out,Dealer,Sedan,6,Automatic Transmission,GCC Specs,400 - 500 HP,Gasoline,Left Hand Side,2014.0,Blue,Sharjah,E 300,mercedes-benz,e-class,05/05/2022
3,WARRANTY UNTIL APR 2023 || Ferrari 488 Spider ...,899000,27000,Perfect inside and out,Perfect inside and out,Dealer,Hard Top Convertible,8,Automatic Transmission,GCC Specs,600 - 700 HP,Gasoline,Left Hand Side,2018.0,Red,Dubai,Standard,ferrari,488-spider,30/04/2022
4,USED RENAULT DOKKER 2020,33000,69000,Perfect inside and out,Perfect inside and out,Owner,Wagon,4,Manual Transmission,GCC Specs,Less than 150 HP,Gasoline,Left Hand Side,2020.0,White,Dubai,Standard,renault,dokker,13/05/2022


In [72]:
train_data[['company', 'price_in_aed']].groupby('company').describe()




Unnamed: 0_level_0,price_in_aed,price_in_aed,price_in_aed,price_in_aed,price_in_aed,price_in_aed,price_in_aed,price_in_aed
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
company,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
acura,2.0,102500.000000,10606.601718,95000.0,98750.0,102500.0,106250.0,110000.0
alfa-romeo,20.0,109410.000000,40394.709138,48000.0,93750.0,99900.0,119900.0,235000.0
aston-martin,30.0,561430.000000,392363.073546,121000.0,199250.0,489950.0,868500.0,1699000.0
audi,354.0,181137.132768,193045.524335,14900.0,55000.0,98500.0,226500.0,1149000.0
baic,1.0,93000.000000,,93000.0,93000.0,93000.0,93000.0,93000.0
...,...,...,...,...,...,...,...,...
tesla,34.0,250441.176471,47912.288027,185000.0,225000.0,246000.0,275250.0,399000.0
toyota,769.0,112953.732120,95094.188375,11500.0,50500.0,80500.0,138000.0,470000.0
volkswagen,194.0,79375.500000,51429.956381,15000.0,35000.0,69000.0,112000.0,299000.0
volvo,20.0,86514.950000,61504.463052,17500.0,40625.0,61950.0,139225.0,235000.0
