# Exploring Ebay Car Sales Data
#### About:
Exploring dataset form Germany Ebay website 
#### Goal:
Determine most popular cars on sale in Germany

Dataset fields description:
 - dateCrawled - When this ad was first crawled. All field-values are taken from this date.
 - name - Name of the car.
 - seller - Whether the seller is private or a dealer.
 - offerType - The type of listing
 - price - The price on the ad to sell the car.
 - abtest - Whether the listing is included in an A/B test.
 - vehicleType - The vehicle Type.
 - yearOfRegistration - The year in which the car was first registered.
 - gearbox - The transmission type.
 - powerPS - The power of the car in PS.
 - model - The car model name.
 - kilometer - How many kilometers the car has driven.
 - monthOfRegistration - The month in which the car was first registered.
 - fuelType - What type of fuel the car uses.
 - brand - The brand of the car.
 - notRepairedDamage - If the car has a damage which is not yet repaired.
 - dateCreated - The date on which the eBay listing was created.
 - nrOfPictures - The number of pictures in the ad.
 - postalCode - The postal code for the location of the vehicle.
 - lastSeenOnline - When the crawler saw this ad last online.
 
 Dataset link: https://www.kaggle.com/orgesleka/used-cars-database/data

#### Import required libs

In [1]:
import pandas as pd
import numpy as np
import operator

#### Read file from csv

In [2]:
autos = pd.read_csv("autos.csv", encoding="Windows-1252")

In [3]:
autos.head(5)

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


In [4]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null obj

In [5]:
autos.dtypes

dateCrawled            object
name                   object
seller                 object
offerType              object
price                  object
abtest                 object
vehicleType            object
yearOfRegistration      int64
gearbox                object
powerPS                 int64
model                  object
odometer               object
monthOfRegistration     int64
fuelType               object
brand                  object
notRepairedDamage      object
dateCreated            object
nrOfPictures            int64
postalCode              int64
lastSeen               object
dtype: object

#### Initial observations:
 1. Column names should be renamed to snackcase
 2. Column Price and odometer can be converted to Int type
 3. Columns seller and offer_type contain almost the same values
 4. Columns dateCrawled, dateCreated and lastSeen can be converted to DateTime type
 - column Name contains information about Brand and Model, but we need check all abbreviations and short names
 - columns vehicleType, gearbox, model, fuelType and notRepairedDamage contain null fields, so we need to handle them



### 1. Convert column names to snackcase

In [6]:
autos.columns

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

In [7]:
new_columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'a_b_test',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen']

In [8]:
autos.columns = new_columns

In [9]:
autos.columns

Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'a_b_test',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')

In [10]:
autos.head(3)

Unnamed: 0,date_crawled,name,seller,offer_type,price,a_b_test,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37


### 2. Convert price and odometer columns to Int type

In [11]:
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,a_b_test,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-03-14 20:50:02,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


#### Replace all non-valid chars in columns price and odometer

In [12]:
autos['price'] = autos['price'].str.replace('$','')
autos['price'] = autos['price'].str.replace(',','')
autos['price'] = autos['price'].str.replace(' ','')
autos['odometer'] = autos['odometer'].str.replace('km','')
autos['odometer'] = autos['odometer'].str.replace(',','')
autos['odometer'] = autos['odometer'].str.replace(' ','')

#### Convert these columns to numeric type

In [13]:
autos['price'] = autos['price'].astype(float)
autos['odometer'] = autos['odometer'].astype(float)

#### Rename the odometer column to odometer_km

In [14]:
autos.rename({'odometer':'odometer_km'}, axis=1, inplace=True)

In [15]:
autos.head(3)

Unnamed: 0,date_crawled,name,seller,offer_type,price,a_b_test,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,5000.0,control,bus,2004,manuell,158,andere,150000.0,3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,8500.0,control,limousine,1997,automatik,286,7er,150000.0,6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,8990.0,test,limousine,2009,manuell,102,golf,70000.0,7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37


In [16]:
autos.describe()

Unnamed: 0,price,registration_year,power_ps,odometer_km,registration_month,nr_of_pictures,postal_code
count,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0
mean,9840.044,2005.07328,116.35592,125732.7,5.72336,0.0,50813.6273
std,481104.4,105.712813,209.216627,40042.211706,3.711984,0.0,25779.747957
min,0.0,1000.0,0.0,5000.0,0.0,0.0,1067.0
25%,1100.0,1999.0,70.0,125000.0,3.0,0.0,30451.0
50%,2950.0,2003.0,105.0,150000.0,6.0,0.0,49577.0
75%,7200.0,2008.0,150.0,150000.0,9.0,0.0,71540.0
max,100000000.0,9999.0,17700.0,150000.0,12.0,0.0,99998.0


In [17]:
autos.loc[autos["price"] > 10000000, :]

Unnamed: 0,date_crawled,name,seller,offer_type,price,a_b_test,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
2897,2016-03-12 21:50:57,Escort_MK_1_Hundeknochen_zum_umbauen_auf_RS_2000,privat,Angebot,11111111.0,test,limousine,1973,manuell,48,escort,50000.0,3,benzin,ford,nein,2016-03-12 00:00:00,0,94469,2016-03-12 22:45:27
24384,2016-03-21 13:57:51,Schlachte_Golf_3_gt_tdi,privat,Angebot,11111111.0,test,,1995,,0,,150000.0,0,,volkswagen,,2016-03-21 00:00:00,0,18519,2016-03-21 14:40:18
27371,2016-03-09 15:45:47,Fiat_Punto,privat,Angebot,12345678.0,control,,2017,,95,punto,150000.0,0,,fiat,,2016-03-09 00:00:00,0,96110,2016-03-09 15:45:47
39377,2016-03-08 23:53:51,Tausche_volvo_v40_gegen_van,privat,Angebot,12345678.0,control,,2018,manuell,95,v40,150000.0,6,,volvo,nein,2016-03-08 00:00:00,0,14542,2016-04-06 23:17:31
39705,2016-03-22 14:58:27,Tausch_gegen_gleichwertiges,privat,Angebot,99999999.0,control,limousine,1999,automatik,224,s_klasse,150000.0,9,benzin,mercedes_benz,,2016-03-22 00:00:00,0,73525,2016-04-06 05:15:30
42221,2016-03-08 20:39:05,Leasinguebernahme,privat,Angebot,27322222.0,control,limousine,2014,manuell,163,c4,40000.0,2,diesel,citroen,,2016-03-08 00:00:00,0,76532,2016-03-08 20:39:05
47598,2016-03-31 18:56:54,Opel_Vectra_B_1_6i_16V_Facelift_Tuning_Showcar...,privat,Angebot,12345678.0,control,limousine,2001,manuell,101,vectra,150000.0,3,benzin,opel,nein,2016-03-31 00:00:00,0,4356,2016-03-31 18:56:54


#### Clean all ads that with price more than 10 mio USD and less than  100 USD because it looks like irrelevant

In [18]:
autos = autos.loc[(autos["price"] <= 10000000) & (autos["price"] > 99), :]

In [19]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 48231 entries, 0 to 49999
Data columns (total 20 columns):
date_crawled          48231 non-null object
name                  48231 non-null object
seller                48231 non-null object
offer_type            48231 non-null object
price                 48231 non-null float64
a_b_test              48231 non-null object
vehicle_type          43808 non-null object
registration_year     48231 non-null int64
gearbox               46024 non-null object
power_ps              48231 non-null int64
model                 45832 non-null object
odometer_km           48231 non-null float64
registration_month    48231 non-null int64
fuel_type             44350 non-null object
brand                 48231 non-null object
unrepaired_damage     39345 non-null object
ad_created            48231 non-null object
nr_of_pictures        48231 non-null int64
postal_code           48231 non-null int64
last_seen             48231 non-null object
dtypes: float6

In [20]:
autos.loc[autos["price"] < 100, :]

Unnamed: 0,date_crawled,name,seller,offer_type,price,a_b_test,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen


In [21]:
autos.describe()

Unnamed: 0,price,registration_year,power_ps,odometer_km,registration_month,nr_of_pictures,postal_code
count,48231.0,48231.0,48231.0,48231.0,48231.0,48231.0,48231.0
mean,6332.251,2004.728867,117.694512,125916.008376,5.801601,0.0,50988.283905
std,50926.05,87.891383,201.219964,39546.446538,3.67711,0.0,25737.541947
min,100.0,1000.0,0.0,5000.0,0.0,0.0,1067.0
25%,1250.0,1999.0,73.0,125000.0,3.0,0.0,30823.0
50%,3000.0,2004.0,107.0,150000.0,6.0,0.0,49716.0
75%,7499.0,2008.0,150.0,150000.0,9.0,0.0,71668.5
max,10000000.0,9999.0,17700.0,150000.0,12.0,0.0,99998.0


#### Remove all ads with non-valid registaration year (less than 1910 and more than 2019)

In [22]:
reg_year = autos["registration_year"].unique()
sorted(reg_year)

[1000,
 1001,
 1111,
 1800,
 1910,
 1927,
 1929,
 1931,
 1934,
 1937,
 1938,
 1939,
 1941,
 1943,
 1948,
 1950,
 1951,
 1952,
 1953,
 1954,
 1955,
 1956,
 1957,
 1958,
 1959,
 1960,
 1961,
 1962,
 1963,
 1964,
 1965,
 1966,
 1967,
 1968,
 1969,
 1970,
 1971,
 1972,
 1973,
 1974,
 1975,
 1976,
 1977,
 1978,
 1979,
 1980,
 1981,
 1982,
 1983,
 1984,
 1985,
 1986,
 1987,
 1988,
 1989,
 1990,
 1991,
 1992,
 1993,
 1994,
 1995,
 1996,
 1997,
 1998,
 1999,
 2000,
 2001,
 2002,
 2003,
 2004,
 2005,
 2006,
 2007,
 2008,
 2009,
 2010,
 2011,
 2012,
 2013,
 2014,
 2015,
 2016,
 2017,
 2018,
 2019,
 2800,
 4100,
 4500,
 4800,
 5000,
 5911,
 6200,
 8888,
 9000,
 9999]

In [23]:
autos = autos.loc[(autos['registration_year'] >= 1910) & (autos['registration_year'] <= 2019), :]

#### Examine odometer_km field

In [24]:
autos['odometer_km'].unique()

array([150000.,  70000.,  50000.,  80000.,  10000.,  30000., 125000.,
        90000.,  20000.,  60000.,   5000., 100000.,  40000.])

In [25]:
autos['odometer_km'].value_counts()

150000.0    31215
125000.0     5038
100000.0     2102
90000.0      1733
80000.0      1412
70000.0      1214
60000.0      1153
50000.0      1010
40000.0       814
30000.0       777
20000.0       757
5000.0        749
10000.0       238
Name: odometer_km, dtype: int64

### 3. Examine seller and offer_type

In [26]:
autos['seller'].unique()

array(['privat', 'gewerblich'], dtype=object)

In [27]:
autos['offer_type'].unique()

array(['Angebot'], dtype=object)

#### field offer_type contains only 1 value so it can be dropped

In [28]:
autos = autos.drop(columns='offer_type')

## 4. Convert dateCrawled, dateCreated and lastSeen to DateTime type

In [29]:
dates = autos[['date_crawled','ad_created','last_seen']]

In [30]:
dates['date_crawled'].str[:10].value_counts(normalize=True, dropna=False)

2016-04-03    0.038600
2016-03-20    0.037791
2016-03-21    0.037231
2016-03-12    0.036900
2016-03-14    0.036671
2016-04-04    0.036568
2016-03-07    0.036070
2016-04-02    0.035593
2016-03-28    0.034950
2016-03-19    0.034742
2016-03-15    0.034307
2016-03-29    0.034120
2016-03-30    0.033747
2016-04-01    0.033664
2016-03-08    0.033187
2016-03-09    0.033021
2016-03-22    0.032917
2016-03-11    0.032585
2016-03-26    0.032316
2016-03-10    0.032295
2016-03-23    0.032253
2016-03-31    0.031839
2016-03-25    0.031507
2016-03-17    0.031486
2016-03-27    0.031133
2016-03-16    0.029474
2016-03-24    0.029453
2016-03-05    0.025346
2016-03-13    0.015681
2016-03-06    0.014021
2016-04-05    0.013067
2016-03-18    0.012901
2016-04-06    0.003173
2016-04-07    0.001390
Name: date_crawled, dtype: float64

In [31]:
dates['ad_created'].str[:10].value_counts(normalize=True, dropna=False)

2016-04-03    0.038849
2016-03-20    0.037854
2016-03-21    0.037460
2016-04-04    0.036920
2016-03-12    0.036734
2016-03-14    0.035302
2016-04-02    0.035282
2016-03-28    0.035054
2016-03-07    0.034805
2016-03-29    0.034079
2016-03-15    0.034037
2016-04-01    0.033643
2016-03-19    0.033622
2016-03-30    0.033560
2016-03-08    0.033187
2016-03-09    0.033104
2016-03-11    0.032896
2016-03-22    0.032730
2016-03-26    0.032378
2016-03-23    0.032087
2016-03-10    0.032004
2016-03-31    0.031880
2016-03-25    0.031631
2016-03-17    0.031154
2016-03-27    0.031050
2016-03-16    0.029972
2016-03-24    0.029391
2016-03-05    0.022899
2016-03-13    0.017050
2016-03-06    0.015287
                ...   
2016-02-18    0.000041
2016-02-05    0.000041
2016-01-10    0.000041
2016-02-24    0.000041
2016-02-20    0.000041
2016-02-26    0.000041
2016-02-02    0.000041
2016-02-12    0.000041
2016-02-14    0.000041
2016-02-16    0.000021
2016-02-01    0.000021
2016-02-09    0.000021
2016-02-07 

In [32]:
dates['last_seen'].str[:10].value_counts(normalize=True, dropna=False)

2016-04-06    0.221978
2016-04-07    0.132125
2016-04-05    0.125052
2016-03-17    0.028084
2016-04-03    0.025139
2016-04-02    0.024911
2016-03-30    0.024683
2016-04-04    0.024517
2016-03-31    0.023791
2016-03-12    0.023770
2016-04-01    0.022837
2016-03-29    0.022318
2016-03-22    0.021364
2016-03-28    0.020866
2016-03-20    0.020659
2016-03-21    0.020555
2016-03-24    0.019767
2016-03-25    0.019103
2016-03-23    0.018585
2016-03-26    0.016676
2016-03-16    0.016448
2016-03-15    0.015867
2016-03-19    0.015743
2016-03-27    0.015556
2016-03-14    0.012632
2016-03-11    0.012404
2016-03-10    0.010641
2016-03-09    0.009583
2016-03-13    0.008877
2016-03-18    0.007322
2016-03-08    0.007322
2016-03-07    0.005434
2016-03-06    0.004314
2016-03-05    0.001079
Name: last_seen, dtype: float64

### Examine top-20 most popular car brands and their median price

In [33]:
brands_list = autos["brand"].value_counts(normalize=True, dropna=False)

In [34]:
# Top 20 most popular brands
brands_list.head(20)

volkswagen        0.212997
bmw               0.108770
opel              0.108272
mercedes_benz     0.095951
audi              0.086016
ford              0.069630
renault           0.047830
peugeot           0.029453
fiat              0.025969
seat              0.018916
skoda             0.016075
nissan            0.015328
mazda             0.015266
smart             0.014312
citroen           0.014125
toyota            0.012673
hyundai           0.009935
sonstige_autos    0.009437
volvo             0.009023
mini              0.008649
Name: brand, dtype: float64

#### Top-3 car brands:
1. Volkswagen - 21%
2. BMW - 11%
3. Opel - 11%

In [35]:
brands_unique = autos["brand"].unique()
brands_mean_prices = {}
for br in brands_unique:
    brands_mean_prices[br] = autos.loc[autos['brand'] == br, 'price'].mean()

In [36]:
def sort_dict(d):
    d1 = sorted(d, key = lambda x : x[1])
    for key in sorted(d.items()):
        print("%s: %s" % (key, d[key]))

In [37]:
sorted_x = sorted(brands_mean_prices.items(), reverse=True, key=operator.itemgetter(1))

In [38]:
sorted_x[:20]

[('porsche', 46764.2),
 ('sonstige_autos', 45941.48791208791),
 ('land_rover', 18934.272727272728),
 ('jaguar', 11844.041666666666),
 ('jeep', 11590.214953271028),
 ('mini', 10566.824940047962),
 ('audi', 9259.510248372317),
 ('mercedes_benz', 8570.76869865975),
 ('bmw', 8543.978260869566),
 ('chevrolet', 6692.60294117647),
 ('skoda', 6394.309677419355),
 ('kia', 5923.288629737609),
 ('dacia', 5897.736434108527),
 ('volkswagen', 5559.968156587788),
 ('hyundai', 5416.23382045929),
 ('toyota', 5148.0032733224225),
 ('volvo', 4911.680459770115),
 ('nissan', 4681.94046008119),
 ('seat', 4353.146929824561),
 ('suzuki', 4166.767605633803)]

#### Top 3 most expensive car brands by mean price:

1. Porsche - $46.8k

2. Sonstige Autos - $46k

3. Land Rover - $19k

#### Examine mean milage per brand

In [39]:
mileage_mean = {}
for br in brands_unique:
    mileage_mean[br] = autos.loc[autos['brand'] == br, 'odometer_km'].mean()

In [40]:
sorted_x = sorted(mileage_mean.items(), reverse=True, key=operator.itemgetter(1))

In [41]:
sorted_x

[('saab', 143670.88607594935),
 ('volvo', 138839.0804597701),
 ('rover', 138230.76923076922),
 ('chrysler', 133125.0),
 ('bmw', 132827.99389778794),
 ('alfa_romeo', 131437.5),
 ('mercedes_benz', 131079.76653696498),
 ('audi', 129604.53339763684),
 ('opel', 129512.45210727969),
 ('volkswagen', 129060.27850813128),
 ('renault', 128261.05810928014),
 ('peugeot', 127316.9014084507),
 ('jeep', 127102.80373831776),
 ('mitsubishi', 127053.57142857143),
 ('subaru', 126100.0),
 ('jaguar', 125763.88888888889),
 ('mazda', 124959.23913043478),
 ('ford', 124361.03663985702),
 ('lancia', 123518.51851851853),
 ('honda', 123397.93281653746),
 ('seat', 122149.12280701754),
 ('daewoo', 121266.66666666667),
 ('citroen', 119985.31571218796),
 ('nissan', 118707.71312584574),
 ('land_rover', 118333.33333333333),
 ('fiat', 117408.14696485623),
 ('toyota', 116219.31260229132),
 ('daihatsu', 115619.8347107438),
 ('kia', 112521.86588921283),
 ('skoda', 111051.6129032258),
 ('suzuki', 108485.91549295775),
 ('hyu

#### Top 3 brands by mean mileage:

1. Saab - 144k km
2. Volvo - 139k km
3. Rover - 138k km

#### Add mean price and mean mileage columns to single dataframe

In [57]:
brand_price_info = pd.Series(brands_mean_prices)
brand_info = pd.DataFrame(brand_price_info, columns=["mean_price"])

Unnamed: 0,mean_price
alfa_romeo,4054.471875
audi,9259.510248
bmw,8543.978261
chevrolet,6692.602941
chrysler,3539.916667
citroen,3777.854626
dacia,5897.736434
daewoo,1093.6
daihatsu,1641.264463
fiat,2815.635783


In [60]:
mileage_info = pd.Series(mileage_mean)
mileage_info_df = pd.DataFrame(mileage_info, columns=["mean_mileage"])

In [61]:
mileage_info_df

Unnamed: 0,mean_mileage
alfa_romeo,131437.5
audi,129604.533398
bmw,132827.993898
chevrolet,100514.705882
chrysler,133125.0
citroen,119985.315712
dacia,84728.682171
daewoo,121266.666667
daihatsu,115619.834711
fiat,117408.146965


In [62]:
brand_info['mean_mileage'] = mileage_info_df['mean_mileage']

In [64]:
brand_info

Unnamed: 0,mean_price,mean_mileage
alfa_romeo,4054.471875,131437.5
audi,9259.510248,129604.533398
bmw,8543.978261,132827.993898
chevrolet,6692.602941,100514.705882
chrysler,3539.916667,133125.0
citroen,3777.854626,119985.315712
dacia,5897.736434,84728.682171
daewoo,1093.6,121266.666667
daihatsu,1641.264463,115619.834711
fiat,2815.635783,117408.146965


### Examine top-3 popular models by brand

In [66]:
autos['model'].unique()

array(['andere', '7er', 'golf', 'fortwo', 'focus', 'voyager', 'arosa',
       'megane', nan, 'a3', 'clio', 'vectra', 'scirocco', '3er', 'a4',
       '911', 'cooper', '5er', 'polo', 'e_klasse', 'c_klasse', 'corsa',
       'mondeo', 'altea', 'a1', 'twingo', 'a_klasse', 'cl', '3_reihe',
       's_klasse', 'sandero', 'passat', 'primera', 'fiesta', 'wrangler',
       'clubman', 'a6', 'transporter', 'astra', 'v40', 'ibiza', 'micra',
       '1er', 'yaris', 'colt', '6_reihe', '5_reihe', 'corolla', 'ka',
       'tigra', 'punto', 'vito', 'cordoba', 'galaxy', '100', '2_reihe',
       'octavia', 'm_klasse', 'lupo', 'superb', 'meriva', 'c_max',
       'laguna', 'touran', '1_reihe', 'm_reihe', 'touareg', 'seicento',
       'avensis', 'vivaro', 'x_reihe', 'ducato', 'carnival', 'boxster',
       'signum', 'sharan', 'zafira', 'rav', 'a5', 'beetle', 'c_reihe',
       'phaeton', 'i_reihe', 'sl', 'insignia', 'up', 'civic', '80',
       'mx_reihe', 'omega', 'sorento', 'z_reihe', 'berlingo', 'clk',
       '

In [112]:
brands_unique = autos["brand"].unique()
top_models = {}
for br in brands_unique:
    if not autos.loc[autos['brand'] == br, 'model'].value_counts().empty:
        top_models[br] = autos.loc[autos['brand'] == br, 'model'].value_counts().index.values[0]

In [113]:
top_models

{'alfa_romeo': '156',
 'audi': 'a4',
 'bmw': '3er',
 'chevrolet': 'andere',
 'chrysler': 'andere',
 'citroen': 'andere',
 'dacia': 'sandero',
 'daewoo': 'matiz',
 'daihatsu': 'cuore',
 'fiat': 'punto',
 'ford': 'focus',
 'honda': 'civic',
 'hyundai': 'i_reihe',
 'jaguar': 'andere',
 'jeep': 'grand',
 'kia': 'andere',
 'lada': 'niva',
 'lancia': 'ypsilon',
 'land_rover': 'freelander',
 'mazda': '3_reihe',
 'mercedes_benz': 'c_klasse',
 'mini': 'cooper',
 'mitsubishi': 'colt',
 'nissan': 'micra',
 'opel': 'corsa',
 'peugeot': '2_reihe',
 'porsche': '911',
 'renault': 'twingo',
 'rover': 'andere',
 'saab': 'andere',
 'seat': 'ibiza',
 'skoda': 'octavia',
 'smart': 'fortwo',
 'subaru': 'legacy',
 'suzuki': 'andere',
 'toyota': 'yaris',
 'trabant': '601',
 'volkswagen': 'golf',
 'volvo': 'v40'}

In [None]:
### 