## Introduction of the Data and Objectives


In this project, we will be working with a dataset of used cars from ebay Kleinanzeigen, a classifieds section of the German eBay website. Orignial dataset can be found here https://data.world/data-society/used-cars-data

### Data Dictionary
|Feature | Description|
|--------|--------|
|dateCrawled | When this ad was first crawled. All field-values are taken from this date.|
|name | Name of the car.|
|seller | Whether the seller is private or a dealer.|
|offerType | The type of listing|
|price | The price on the ad to sell the car.|
|abtest | Whether the listing is included in an A/B test.|
|vehicleType | The vehicle Type.|
|yearOfRegistration | The year in which the car was first registered.|
|gearbox | The transmission type.|
|powerPS | The power of the car in PS.|
|model | The car model name.|
|kilometer | How many kilometers the car has driven.|
|monthOfRegistration | The month in which the car was first registered.|
|fuelType | What type of fuel the car uses.|
|brand | The brand of the car.|
|notRepairedDamage | If the car has a damage which is not yet repaired.|
|dateCreated | The date on which the eBay listing was created.|
|nrOfPictures | The number of pictures in the ad.|
|postalCode | The postal code for the location of the vehicle.|
|lastSeenOnline | When the crawler saw this ad last online.|

### Objective
The aim of this project is to clean the data and analyze the included used car listings

### Importing Libraries and Loading the dataset

In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
autos = pd.read_csv('autos.csv', encoding='Latin-1')

In [3]:
autos.head()

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


### Overview of Statistical Summaries

In [4]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   dateCrawled          50000 non-null  object
 1   name                 50000 non-null  object
 2   seller               50000 non-null  object
 3   offerType            50000 non-null  object
 4   price                50000 non-null  object
 5   abtest               50000 non-null  object
 6   vehicleType          44905 non-null  object
 7   yearOfRegistration   50000 non-null  int64 
 8   gearbox              47320 non-null  object
 9   powerPS              50000 non-null  int64 
 10  model                47242 non-null  object
 11  odometer             50000 non-null  object
 12  monthOfRegistration  50000 non-null  int64 
 13  fuelType             45518 non-null  object
 14  brand                50000 non-null  object
 15  notRepairedDamage    40171 non-null  object
 16  date

Obsevations:
- There are 20 features in our dataset 
- Features are of objects and integers dtypes.
- Columns *notRepairedDamage, fuelType, model, gearbox*, and *vehicleType* contain null values.

In [5]:
# Changing the column names from cameltype to snaketype
# autos.columns
cols = {'yearOfRegistration': 'registration_year', 
        'monthOfRegistration': 'registration_month',
       'notRepairedDamage': 'unrepaired_damage',
       'dateCreated': 'ad_created', 'dateCrawled':'date_crawled', 
       'vehicleType':'vehicle_type', 'gearbox':'gear_box', 
        'powerPS':'power_ps', 'fuelType':'fuel_type','nrOfPictures':
       'num_pictures', 'postalCode':'postal_code', 'lastSeen':
       'last_seen', 'offerType':'offer_type', 'abtest':'ab_test'}
autos = autos.rename(columns=cols)
autos.columns
    

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',
       'vehicle_type', 'registration_year', 'gear_box', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'num_pictures', 'postal_code',
       'last_seen'],
      dtype='object')

- Renamed the columns from camelcase to Python prefered snakecase.

In [6]:
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,ab_test,vehicle_type,registration_year,gear_box,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,num_pictures,postal_code,last_seen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-03-08 10:40:35,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


#### Observations:
- There are two columns, i-e *seller* and *offer_type* that are made up of text and have just two unique values in each of them.
- *num_pictures* column is not showing expected behaivor. Thus requies further investigation.
- The columns of *price* and *odometer* are in string format. We need to convert them into numerics to perform mathematical ops.


### Data Cleaning

In [7]:
# cleaning the 'odometer' column
autos['odometer'] = autos['odometer'].str.replace('km','')
autos['odometer'] = autos['odometer'].str.replace(',','')

# autos['odometer'].rename({'odometer':'odometer_km'}, 
#                         inplace=True)
autos['odometer'] = autos.rename({'odometer':'odometer_km'},
                                 axis=1, inplace=True)
autos['odometer_km'] = autos['odometer_km'].astype(int)

In [8]:
autos.columns

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',
       'vehicle_type', 'registration_year', 'gear_box', 'power_ps', 'model',
       'odometer_km', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'num_pictures', 'postal_code',
       'last_seen', 'odometer'],
      dtype='object')

In [9]:
# Renaming 'price' column
autos['price'] = autos['price'].str.replace('$','')
autos['price'] = autos['price'].str.replace(',','')
autos['price'] = autos['price'].astype(int)

### Exploring Odometer and Price columns

In [10]:
autos['odometer_km'].unique()

array([150000,  70000,  50000,  80000,  10000,  30000, 125000,  90000,
        20000,  60000,   5000, 100000,  40000])

In [11]:
autos['odometer_km'].describe()

count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64

In [12]:
autos['odometer_km'].value_counts().sort_index(ascending=False)

150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
70000      1230
60000      1164
50000      1027
40000       819
30000       789
20000       784
10000       264
5000        967
Name: odometer_km, dtype: int64

In [13]:
autos['odometer_km'].value_counts(normalize=True).sort_index(ascending=False)

150000    0.64848
125000    0.10340
100000    0.04338
90000     0.03514
80000     0.02872
70000     0.02460
60000     0.02328
50000     0.02054
40000     0.01638
30000     0.01578
20000     0.01568
10000     0.00528
5000      0.01934
Name: odometer_km, dtype: float64


- We can see that more than 64% of cars' odometer shows 150,000 kms traveled

In [14]:
autos['price'].unique().shape

(2357,)

In [15]:
autos['price'].describe()

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64

It can be seen in max and min descriptions that there are some unrealistic entries. Lets dig a bit deeper

In [16]:
autos['price'].value_counts().sort_index(ascending=False)

99999999       1
27322222       1
12345678       3
11111111       2
10000000       1
            ... 
5              2
3              1
2              3
1            156
0           1421
Name: price, Length: 2357, dtype: int64

In [17]:
autos['price'].value_counts().sort_index(ascending=True)

0           1421
1            156
2              3
3              1
5              2
            ... 
10000000       1
11111111       2
12345678       3
27322222       1
99999999       1
Name: price, Length: 2357, dtype: int64

### Observations:
Odometer:
- Cars have odometer readings ranging from 5,000 to 150,000
- Around 64% of cars have an odometer reading of 150,000

Price:
- It can be seen that indeed there are some unrealistic values for the cars.
- We need to resolve the issue by removing such irrelavant entries 

In [18]:
bool1 = (autos['price'] < 350000) & (autos['price'] >100)
price_with_condition = autos.loc[bool1, 'price']
price_with_condition.describe()

count     48089.000000
mean       5939.462913
std        8949.391060
min         110.000000
25%        1250.000000
50%        3099.000000
75%        7500.000000
max      345000.000000
Name: price, dtype: float64

So in the cell shown above, we addressed the issue of unrealistic entries by considering only the prices that fall under the bracket of USD100.00 to USD350,000.00. Any values out of these bounds are not going to be considered anymore.

### Moving onto the date columns

For the sake of the reader, the columns with dates are shown in the table below.


|Columns with Date| Description|
| ------ | ------ |
|date_crawled| added by the crawler|
|last_seen|  added by the crawler|
|ad_created|  from the website|
|registration_month|  from the website|
|registration_year|  from the website|

In [19]:
# Knowing dtypes of each of the cols 
cols_with_date = ['date_crawled', 'last_seen','ad_created',
                 'registration_month','registration_year']
for cols in cols_with_date:
    print(autos[cols].dtypes)

object
object
object
int64
int64


Columns such as *date_crawled, last_seen*, and *ad_created* columns are in string form hence autos.describe() can not work on them efficiently. Converting them to numeric datatype will help us here

In [20]:
autos[['date_crawled', 'last_seen', 'ad_created']].head()

Unnamed: 0,date_crawled,last_seen,ad_created
0,2016-03-26 17:47:46,2016-04-06 06:45:54,2016-03-26 00:00:00
1,2016-04-04 13:38:56,2016-04-06 14:45:08,2016-04-04 00:00:00
2,2016-03-26 18:57:24,2016-04-06 20:15:37,2016-03-26 00:00:00
3,2016-03-12 16:58:10,2016-03-15 03:16:28,2016-03-12 00:00:00
4,2016-04-01 14:38:50,2016-04-01 14:38:50,2016-04-01 00:00:00


In [21]:
autos['date_crawled'].str[:10]

0        2016-03-26
1        2016-04-04
2        2016-03-26
3        2016-03-12
4        2016-04-01
            ...    
49995    2016-03-27
49996    2016-03-28
49997    2016-04-02
49998    2016-03-08
49999    2016-03-14
Name: date_crawled, Length: 50000, dtype: object

In [22]:
autos['date_crawled'][0]

'2016-03-26 17:47:46'

In [23]:
# lets strip the yyyy-mm-dd part only and build a distribution table for it

def date_dist(string, head=True):
    ''' The function takes the desired column as first argument.
    The second argument controls the number of rows to be displayed
    
    
    Returns striped date to y-m-d and its distribution across the 
    column as percentage'''
    string = string.str[:10]
    string_v = string.value_counts(normalize=True, 
                                     dropna=False)
    string_val = string_v.sort_index()
    if head==True:
        return 100*string_val.head()
    else:
        return string_val*100


In [24]:
date_dist(autos['date_crawled'])

2016-03-05    2.538
2016-03-06    1.394
2016-03-07    3.596
2016-03-08    3.330
2016-03-09    3.322
Name: date_crawled, dtype: float64

In [25]:
date_dist(autos['ad_created'])

2015-06-11    0.002
2015-08-10    0.002
2015-09-09    0.002
2015-11-10    0.002
2015-12-05    0.002
Name: ad_created, dtype: float64

In [26]:
date_dist(autos['last_seen'])

2016-03-05    0.108
2016-03-06    0.442
2016-03-07    0.536
2016-03-08    0.760
2016-03-09    0.986
Name: last_seen, dtype: float64

using Series.sort_index() for better exploration

In [27]:
date_dist(autos['date_crawled'], False).sort_index()

2016-03-05    2.538
2016-03-06    1.394
2016-03-07    3.596
2016-03-08    3.330
2016-03-09    3.322
2016-03-10    3.212
2016-03-11    3.248
2016-03-12    3.678
2016-03-13    1.556
2016-03-14    3.662
2016-03-15    3.398
2016-03-16    2.950
2016-03-17    3.152
2016-03-18    1.306
2016-03-19    3.490
2016-03-20    3.782
2016-03-21    3.752
2016-03-22    3.294
2016-03-23    3.238
2016-03-24    2.910
2016-03-25    3.174
2016-03-26    3.248
2016-03-27    3.104
2016-03-28    3.484
2016-03-29    3.418
2016-03-30    3.362
2016-03-31    3.192
2016-04-01    3.380
2016-04-02    3.540
2016-04-03    3.868
2016-04-04    3.652
2016-04-05    1.310
2016-04-06    0.318
2016-04-07    0.142
Name: date_crawled, dtype: float64

In [28]:
date_dist(autos['ad_created'], False).sort_index()

2015-06-11    0.002
2015-08-10    0.002
2015-09-09    0.002
2015-11-10    0.002
2015-12-05    0.002
              ...  
2016-04-03    3.892
2016-04-04    3.688
2016-04-05    1.184
2016-04-06    0.326
2016-04-07    0.128
Name: ad_created, Length: 76, dtype: float64

In [29]:
date_dist(autos['last_seen'], False).sort_index()

2016-03-05     0.108
2016-03-06     0.442
2016-03-07     0.536
2016-03-08     0.760
2016-03-09     0.986
2016-03-10     1.076
2016-03-11     1.252
2016-03-12     2.382
2016-03-13     0.898
2016-03-14     1.280
2016-03-15     1.588
2016-03-16     1.644
2016-03-17     2.792
2016-03-18     0.742
2016-03-19     1.574
2016-03-20     2.070
2016-03-21     2.074
2016-03-22     2.158
2016-03-23     1.858
2016-03-24     1.956
2016-03-25     1.920
2016-03-26     1.696
2016-03-27     1.602
2016-03-28     2.086
2016-03-29     2.234
2016-03-30     2.484
2016-03-31     2.384
2016-04-01     2.310
2016-04-02     2.490
2016-04-03     2.536
2016-04-04     2.462
2016-04-05    12.428
2016-04-06    22.100
2016-04-07    13.092
Name: last_seen, dtype: float64

### Exploring registration_year column

In [30]:
autos['registration_year'].describe()

count    50000.000000
mean      2005.073280
std        105.712813
min       1000.000000
25%       1999.000000
50%       2003.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

#### Observations:
- By looking into min and max which are displaying years 1000 and 9999, we understand that the values need to be looked into further

- Any vehicle's registeration after the issue date of this listing has an inaccuate date.
- We will assume the earliest registration date to be around 1920s. It looks more realistic than 1000 A.D.

In [31]:
autos['registration_year'].astype(int)

0        2004
1        1997
2        2009
3        2007
4        2003
         ... 
49995    2011
49996    1996
49997    2014
49998    2013
49999    1996
Name: registration_year, Length: 50000, dtype: int32

In [32]:
autos['registration_year'].dtypes

dtype('int64')

In [33]:
# Lets see the percentage of the data that would be considered

autos['registration_year'].between(1900,2016).sum()/autos.shape[0]

0.96056

The remaining data after supposed cleaning is still around 96% of the total column.

In [34]:
autos = autos[autos['registration_year'].between(1900,2016)]
autos['registration_year'].describe()

count    48028.00000
mean      2002.80351
std          7.31085
min       1910.00000
25%       1999.00000
50%       2003.00000
75%       2008.00000
max       2016.00000
Name: registration_year, dtype: float64

In [52]:
autos['registration_year'].value_counts(normalize=True, bins=10)

(1994.8, 2005.4]      0.568564
(2005.4, 2016.0]      0.355668
(1984.2, 1994.8]      0.058841
(1973.6, 1984.2]      0.009390
(1963.0, 1973.6]      0.005330
(1952.4, 1963.0]      0.001582
(1931.2, 1941.8]      0.000208
(1909.893, 1920.6]    0.000187
(1941.8, 1952.4]      0.000167
(1920.6, 1931.2]      0.000062
Name: registration_year, dtype: float64

In [53]:
print(56+35)

91


**Observation:**
- More than 56% of the cars are of registerations during 1994-2005
- More than 35% of the cars are of registerations during 2005-2016
- So together, around **91%** of the cars are registered from 1994-2016

### Exploring price by brand column 

In [36]:
autos['brand'].unique()

array(['peugeot', 'bmw', 'volkswagen', 'smart', 'ford', 'chrysler',
       'seat', 'renault', 'mercedes_benz', 'audi', 'sonstige_autos',
       'opel', 'mazda', 'porsche', 'mini', 'toyota', 'dacia', 'nissan',
       'jeep', 'saab', 'volvo', 'mitsubishi', 'jaguar', 'fiat', 'skoda',
       'subaru', 'kia', 'citroen', 'chevrolet', 'hyundai', 'honda',
       'daewoo', 'suzuki', 'trabant', 'land_rover', 'alfa_romeo', 'lada',
       'rover', 'daihatsu', 'lancia'], dtype=object)

In [37]:
brand_dict = {}
for key in autos['brand']:
    if key in brand_dict:
        brand_dict[key] +=1
    else:
        brand_dict[key] = 1

In [38]:
brand_dict

{'peugeot': 1418,
 'bmw': 5284,
 'volkswagen': 10188,
 'smart': 668,
 'ford': 3352,
 'chrysler': 176,
 'seat': 873,
 'renault': 2274,
 'mercedes_benz': 4580,
 'audi': 4149,
 'sonstige_autos': 526,
 'opel': 5195,
 'mazda': 727,
 'porsche': 293,
 'mini': 415,
 'toyota': 599,
 'dacia': 123,
 'nissan': 725,
 'jeep': 108,
 'saab': 77,
 'volvo': 444,
 'mitsubishi': 391,
 'jaguar': 76,
 'fiat': 1242,
 'skoda': 770,
 'subaru': 105,
 'kia': 341,
 'citroen': 669,
 'chevrolet': 274,
 'hyundai': 473,
 'honda': 377,
 'daewoo': 72,
 'suzuki': 284,
 'trabant': 75,
 'land_rover': 98,
 'alfa_romeo': 318,
 'lada': 29,
 'rover': 65,
 'daihatsu': 123,
 'lancia': 52}

In [55]:
autos['brand'].value_counts(normalize=True).head(6)

volkswagen       0.212126
bmw              0.110019
opel             0.108166
mercedes_benz    0.095361
audi             0.086387
ford             0.069793
Name: brand, dtype: float64

In [54]:
autos['brand'].value_counts(normalize=True).head(6).sum()

0.6818522528525027

- Together the top 6 brands comprise of 68% of of the total brands
- So We will analyse top 6 brands as the rest of the brands are not in significant numbers 

In [40]:
common_brands = autos['brand'].value_counts(normalize=True).head(6).index

common_brands

Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')

### Lets do the analysis on the top 6 brands 

#### Comparing Brands with mean Price

In [41]:
top_brands_price_dict = {}
for brand in common_brands:
    brands_only = autos[autos['brand'] == brand]
    mean_price = brands_only['price'].mean()
    top_brands_price_dict[brand] = int(mean_price)


In [60]:
top_brands_price_dict


{'volkswagen': 6516,
 'bmw': 8334,
 'opel': 5252,
 'mercedes_benz': 30317,
 'audi': 9093,
 'ford': 7263}

In [43]:
top_brand_series = pd.Series(top_brands_price_dict)
top_brand_series.sort_values(ascending=False)

mercedes_benz    30317
audi              9093
bmw               8334
ford              7263
volkswagen        6516
opel              5252
dtype: int64

We can see that **Audi, BMW**, and **Mercedes-Benz** are the most expensive

#### Comparing Brands witgh mean Mileage

In [44]:
top_odometer_dict = {}
for brand in common_brands:
    brands_only = autos[autos['brand'] == brand]
    mean_odo = brands_only['odometer_km'].mean()
    top_odometer_dict[brand] = int(mean_odo)
    

In [45]:
top_odometer_dict

{'volkswagen': 128730,
 'bmw': 132434,
 'opel': 129227,
 'mercedes_benz': 130860,
 'audi': 129287,
 'ford': 124046}

In [46]:
top_odometer_series = pd.Series(top_odometer_dict)
top_odometer_series.sort_values(ascending=False)

bmw              132434
mercedes_benz    130860
audi             129287
opel             129227
volkswagen       128730
ford             124046
dtype: int64

There range of the mileage do not vary much with respect to the brand.

In [66]:
df = pd.DataFrame(top_brand_series, columns=['mean_price'])
df['mean_odokm'] = top_odometer_series
df = df.sort_values(by='mean_price', ascending=False)
df

Unnamed: 0,mean_price,mean_odokm
mercedes_benz,30317,130860
audi,9093,129287
bmw,8334,132434
ford,7263,124046
volkswagen,6516,128730
opel,5252,129227


## Finding the most popular model in each brand 

In [48]:
brand_model_dict = {}
brands = top_brand_series.index
for brand in brands:
    this_brand = autos[autos['brand'] == brand]
    its_model = this_brand['model'].value_counts()
    brand_model_dict[brand] = its_model.index[0]
brand_model_dict
common_model_series = pd.Series(brand_model_dict)


In [67]:
df['most_common_model'] = common_model_series
df

Unnamed: 0,mean_price,mean_odokm,most_common_model
mercedes_benz,30317,130860,c_klasse
audi,9093,129287,a4
bmw,8334,132434,3er
ford,7263,124046,focus
volkswagen,6516,128730,golf
opel,5252,129227,corsa


## Analysing impact of mileage of cars on their prices

In [51]:
mean_price_per_odo_slab = {}
odo_slabs = autos['odometer_km'].unique()
odo_slabs

for slab in odo_slabs:
    rows = autos[autos['odometer_km'] == slab]
    price = rows['price'].mean()
    mean_price_per_odo_slab[slab] = int(price)
mean_price_per_odo_slab

price_odo_slab_series = pd.Series(mean_price_per_odo_slab)
price_odo_slab_series


150000     7736
70000     10817
50000     25919
80000      9575
10000     19890
30000     16414
125000     6286
90000      8350
20000     17940
60000     12286
5000      11916
40000     49532
100000    12671
dtype: int64

- The relationship of the cars' mileage and their prices in somewhat unclear. Though there is a overall decrease in prices as the mileage increases, but it is not decisive. It is apparent that other factors, apart from a car's mileage are contributing or even reversing the mileage's impact 

## Conclusions:
- There is almost no relation between mileage and price among the top brands
- More than 90% of the cars were registered after 1994
- Mercedes Benz is not only the most expensive car but outlandishly expensive as well