# Guided Project: Exploring eBay Car Sales Data

In this guided project, we'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.

The dataset was originally scraped and uploaded to Kaggle by user orgesleka.
The original dataset isn't available on Kaggle anymore, but you can find it [here](https://data.world/data-society/used-cars-data).

We've made a few modifications from the original dataset:

* We sampled 50,000 data points from the full dataset, to ensure your code runs quickly in our hosted environment
* We dirtied the dataset a bit to more closely resemble what you would expect from a scraped dataset (the version uploaded to Kaggle was cleaned to be easier to work with)

**The aim of this project is to clean the data and analyze the included used car listings.**

## Loading and inspecting data

In [33]:
import numpy as np
import pandas as pd

In [34]:
autos = pd.read_csv('data/autos.csv', encoding="Latin-1")

In [35]:
autos.head()

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


In [36]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   dateCrawled          50000 non-null  object
 1   name                 50000 non-null  object
 2   seller               50000 non-null  object
 3   offerType            50000 non-null  object
 4   price                50000 non-null  object
 5   abtest               50000 non-null  object
 6   vehicleType          44905 non-null  object
 7   yearOfRegistration   50000 non-null  int64 
 8   gearbox              47320 non-null  object
 9   powerPS              50000 non-null  int64 
 10  model                47242 non-null  object
 11  odometer             50000 non-null  object
 12  monthOfRegistration  50000 non-null  int64 
 13  fuelType             45518 non-null  object
 14  brand                50000 non-null  object
 15  notRepairedDamage    40171 non-null  object
 16  date

In [37]:
autos.isnull().sum()

dateCrawled               0
name                      0
seller                    0
offerType                 0
price                     0
abtest                    0
vehicleType            5095
yearOfRegistration        0
gearbox                2680
powerPS                   0
model                  2758
odometer                  0
monthOfRegistration       0
fuelType               4482
brand                     0
notRepairedDamage      9829
dateCreated               0
nrOfPictures              0
postalCode                0
lastSeen                  0
dtype: int64

The data has been successfully loaded in pandas dataframe. There are 19 columns and 50,000 observations (rows) in the data, however there are some columns with missing or null values such as vehicleType, gearbox, model, fuelType and notRepairDamage.

## Cleaning Column Names

In [38]:
autos.columns 

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

In [39]:
cleaned_names = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'posta_code',
       'last_seen']
autos.columns = cleaned_names 
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   date_crawled        50000 non-null  object
 1   name                50000 non-null  object
 2   seller              50000 non-null  object
 3   offer_type          50000 non-null  object
 4   price               50000 non-null  object
 5   abtest              50000 non-null  object
 6   vehicle_type        44905 non-null  object
 7   registration_year   50000 non-null  int64 
 8   gearbox             47320 non-null  object
 9   power_ps            50000 non-null  int64 
 10  model               47242 non-null  object
 11  odometer            50000 non-null  object
 12  registration_month  50000 non-null  int64 
 13  fuel_type           45518 non-null  object
 14  brand               50000 non-null  object
 15  unrepaired_damage   40171 non-null  object
 16  ad_created          50

**The changes are made to the column names to change the variable names to snake_case and shorten some of the variables.**

## Initial exploration and cleaning

In [40]:
autos.describe(include='O')

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,gearbox,model,odometer,fuel_type,brand,unrepaired_damage,ad_created,last_seen
count,50000,50000,50000,50000,50000,50000,44905,47320,47242,50000,45518,50000,40171,50000,50000
unique,48213,38754,2,2,2357,2,8,2,245,13,7,40,2,76,39481
top,2016-04-02 11:37:04,Ford_Fiesta,privat,Angebot,$0,test,limousine,manuell,golf,"150,000km",benzin,volkswagen,nein,2016-04-03 00:00:00,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,36993,4024,32424,30107,10687,35232,1946,8


**Observations:**
* Columns to be dropped are: 
    * seller column consists mostly of privat 
    * offer_type --> Almost all the values are Angebot
    * nr_of_pictures --> All are 0 
* More investigation needed for: 
    * abtest --> need to check as more than 50% of the values are test
    * registration_month --> has nein which is weird
    * price --> many cars have 0 price which is strange
* Conversion of columns: 
    * To date: date_crawled, ad_created, last_seen, registration_year
    * To numeric: price, odometer (clean), 
    * To categorical / text: registration_month, postal code

### More exploration needed

In [41]:
# How many null values
autos.isnull().sum()

date_crawled             0
name                     0
seller                   0
offer_type               0
price                    0
abtest                   0
vehicle_type          5095
registration_year        0
gearbox               2680
power_ps                 0
model                 2758
odometer                 0
registration_month       0
fuel_type             4482
brand                    0
unrepaired_damage     9829
ad_created               0
nr_of_pictures           0
posta_code               0
last_seen                0
dtype: int64

In [42]:
# Why so many different names
autos['name'].value_counts().tail(20) # Can't help - need to keep them as is

Fiat_Punto_1.2_zu_verkaufen                                      1
Volkswagen_Passat_Variant_1.9_TDI_Ahk_tuev_12/2017_klima         1
Verkaufe_Mazda_CX5                                               1
E320_guter_Zustand                                               1
Volkswagen_Polo_Highline_TÜV_03.2017                             1
Benz_Mecerdes_Viano_Bus__116_PS__8_Sitz.                         1
Opel_Corsa_B_TÜV_08/2016                                         1
Mercedes_C180_Sport                                              1
Mitsubischi_Space_Runner_2_Jahre_TÜV/Au_neu                      1
Lada_Niva_1.7i_Only                                              1
Fiat_Punto_1.2_SX_Mit_LPG_Gasanlage                              1
Mercedes_Benz_A_210                                              1
BMW_330d_Touring_Sport_Aut._Modern_Line_Garantie                 1
Fiat_Punto_55_Team                                               1
BMW_320_Ci__G_Power__M_Paket__Bi_Xenon__Automatik__e46_Cabrio 

In [43]:
# Let's explore brands
autos['brand'].value_counts()  ## They are entirely unique - no issues found

volkswagen        10687
opel               5461
bmw                5429
mercedes_benz      4734
audi               4283
ford               3479
renault            2404
peugeot            1456
fiat               1308
seat                941
skoda               786
mazda               757
nissan              754
smart               701
citroen             701
toyota              617
sonstige_autos      546
hyundai             488
volvo               457
mini                424
mitsubishi          406
honda               399
kia                 356
alfa_romeo          329
porsche             294
suzuki              293
chevrolet           283
chrysler            181
dacia               129
daihatsu            128
jeep                110
subaru              109
land_rover           99
saab                 80
daewoo               79
trabant              78
jaguar               77
rover                69
lancia               57
lada                 31
Name: brand, dtype: int64

In [47]:
# Registration month
autos['unrepaired_damage'].value_counts()

nein    35232
ja       4939
Name: unrepaired_damage, dtype: int64

In [57]:
autos.loc[autos['price'].str.strip()=='$0',:].head()

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,posta_code,last_seen
27,2016-03-27 18:45:01,Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE,privat,Angebot,$0,control,,2005,,0,,"150,000km",0,,ford,,2016-03-27 00:00:00,0,66701,2016-03-27 18:45:01
71,2016-03-28 19:39:35,Suche_Opel_Astra_F__Corsa_oder_Kadett_E_mit_Re...,privat,Angebot,$0,control,,1990,manuell,0,,"5,000km",0,benzin,opel,,2016-03-28 00:00:00,0,4552,2016-04-07 01:45:48
80,2016-03-09 15:57:57,Nissan_Primera_Hatchback_1_6_16v_73_Kw___99Ps_...,privat,Angebot,$0,control,coupe,1999,manuell,99,primera,"150,000km",3,benzin,nissan,ja,2016-03-09 00:00:00,0,66903,2016-03-09 16:43:50
87,2016-03-29 23:37:22,Bmw_520_e39_zum_ausschlachten,privat,Angebot,$0,control,,2000,,0,5er,"150,000km",0,,bmw,,2016-03-29 00:00:00,0,82256,2016-04-06 21:18:15
99,2016-04-05 09:48:54,Peugeot_207_CC___Cabrio_Bj_2011,privat,Angebot,$0,control,cabrio,2011,manuell,0,2_reihe,"60,000km",7,diesel,peugeot,nein,2016-04-05 00:00:00,0,99735,2016-04-07 12:17:34


## Basic Cleaning 
Price and odometer shall be changed to numeric after removing any special characters

In [58]:
autos[['price','odometer']].head()

Unnamed: 0,price,odometer
0,"$5,000","150,000km"
1,"$8,500","150,000km"
2,"$8,990","70,000km"
3,"$4,350","70,000km"
4,"$1,350","150,000km"


In [75]:
# Lets remove special characters and change the data type at the same time

autos['price'] = autos['price'].str.replace('$', '', regex=False).str.replace(',','',regex=False).astype(float)
autos['odometer'] = autos['odometer'].str.replace('km', '').str.replace(',','').astype(float)


In [78]:
autos[['price', 'odometer']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   price     50000 non-null  float64
 1   odometer  50000 non-null  float64
dtypes: float64(2)
memory usage: 781.4 KB


In [79]:
# Rename the columns
autos.rename(columns={'odometer':'odometer_km'}, inplace=True)

In [80]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   date_crawled        50000 non-null  object 
 1   name                50000 non-null  object 
 2   seller              50000 non-null  object 
 3   offer_type          50000 non-null  object 
 4   price               50000 non-null  float64
 5   abtest              50000 non-null  object 
 6   vehicle_type        44905 non-null  object 
 7   registration_year   50000 non-null  int64  
 8   gearbox             47320 non-null  object 
 9   power_ps            50000 non-null  int64  
 10  model               47242 non-null  object 
 11  odometer_km         50000 non-null  float64
 12  registration_month  50000 non-null  int64  
 13  fuel_type           45518 non-null  object 
 14  brand               50000 non-null  object 
 15  unrepaired_damage   40171 non-null  object 
 16  ad_c

## Exploring the Odometer and Price Columns

From the last screen, we learned that there are a number of text columns where almost all of the values are the same (seller and offer_type). We also converted the price and odometer columns to numeric types and renamed odometer to odometer_km.

Let's continue exploring the data, specifically looking for data that doesn't look right. We'll start by analyzing the odometer_km and price columns. Here's the steps we'll take:

- Analyze the columns using minimum and maximum values and look for any values that look unrealistically high or low (outliers) that we might want to remove.
- We will use
    - Series.unique().shape to see how many unique values
    - Series.describe() to view min/max/median/mean etc
    - Series.value_counts(), with some variations:
        - chained to .head() if there are lots of values.
        - Because Series.value_counts() returns a series, we can use Series.sort_index() with ascending= True or False to view the highest and lowest values with their counts (can also chain to head() here).
    - When removing outliers, we can do df[(df["col"] >= x ) & (df["col"] <= y )], but it's more readable to use df[df["col"].between(x,y)]

**For each of the odometer_km and price columns:**
- Use the techniques above to explore the data
- If you find there are outliers, remove them and write a markdown paragraph explaining your decision.
- After you have removed the outliers, make some observations about the remaining values.

In [97]:

pd.set_option('display.float_format', '{:.2f}'.format)
autos[['price', 'odometer_km']].describe()

Unnamed: 0,price,odometer_km
count,50000.0,50000.0
mean,9840.04,125732.7
std,481104.38,40042.21
min,0.0,5000.0
25%,1100.0,125000.0
50%,2950.0,150000.0
75%,7200.0,150000.0
max,99999999.0,150000.0


In [84]:
autos['odometer_km'].shape

(50000,)

In [108]:
# Price has zeros as well as very high numbers. Lets check 
autos['price'].value_counts(dropna=False).sort_index(ascending=True).head(50)

0.00      1421
1.00       156
2.00         3
3.00         1
5.00         2
8.00         1
9.00         1
10.00        7
11.00        2
12.00        3
13.00        2
14.00        1
15.00        2
17.00        3
18.00        1
20.00        4
25.00        5
29.00        1
30.00        7
35.00        1
40.00        6
45.00        4
47.00        1
49.00        4
50.00       49
55.00        2
59.00        1
60.00        9
65.00        5
66.00        1
70.00       10
75.00        5
79.00        1
80.00       15
89.00        1
90.00        5
99.00       19
100.00     134
110.00       3
111.00       2
115.00       2
117.00       1
120.00      39
122.00       1
125.00       8
129.00       1
130.00      15
135.00       1
139.00       1
140.00       9
Name: price, dtype: int64

In [112]:
# Lots of big values in price column and there are 0s. Need to remove them and set them to blank. 
autos.loc[((autos['price']>=999990) | (autos['price'] == 0)), 'price'] = None 
autos[['price', 'odometer_km']].describe()

Unnamed: 0,price,odometer_km
count,48565.0,50000.0
mean,5888.94,125732.7
std,9059.85,40042.21
min,1.0,5000.0
25%,1200.0,125000.0
50%,3000.0,150000.0
75%,7490.0,150000.0
max,350000.0,150000.0


## Exploring the date columns

In [114]:
autos[['date_crawled','ad_created','last_seen']].head()

Unnamed: 0,date_crawled,ad_created,last_seen
0,2016-03-26 17:47:46,2016-03-26 00:00:00,2016-04-06 06:45:54
1,2016-04-04 13:38:56,2016-04-04 00:00:00,2016-04-06 14:45:08
2,2016-03-26 18:57:24,2016-03-26 00:00:00,2016-04-06 20:15:37
3,2016-03-12 16:58:10,2016-03-12 00:00:00,2016-03-15 03:16:28
4,2016-04-01 14:38:50,2016-04-01 00:00:00,2016-04-01 14:38:50


In [116]:
autos[['date_crawled','ad_created','last_seen']].info(0)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   date_crawled  50000 non-null  object
 1   ad_created    50000 non-null  object
 2   last_seen     50000 non-null  object
dtypes: object(3)
memory usage: 1.1+ MB


In [119]:
autos['date_crawled'].str[:10].value_counts(dropna=False).sort_index()

2016-03-05    1269
2016-03-06     697
2016-03-07    1798
2016-03-08    1665
2016-03-09    1661
2016-03-10    1606
2016-03-11    1624
2016-03-12    1839
2016-03-13     778
2016-03-14    1831
2016-03-15    1699
2016-03-16    1475
2016-03-17    1576
2016-03-18     653
2016-03-19    1745
2016-03-20    1891
2016-03-21    1876
2016-03-22    1647
2016-03-23    1619
2016-03-24    1455
2016-03-25    1587
2016-03-26    1624
2016-03-27    1552
2016-03-28    1742
2016-03-29    1709
2016-03-30    1681
2016-03-31    1596
2016-04-01    1690
2016-04-02    1770
2016-04-03    1934
2016-04-04    1826
2016-04-05     655
2016-04-06     159
2016-04-07      71
Name: date_crawled, dtype: int64

In [122]:
autos['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index() # Percentages

2016-03-05   0.00
2016-03-06   0.00
2016-03-07   0.01
2016-03-08   0.01
2016-03-09   0.01
2016-03-10   0.01
2016-03-11   0.01
2016-03-12   0.02
2016-03-13   0.01
2016-03-14   0.01
2016-03-15   0.02
2016-03-16   0.02
2016-03-17   0.03
2016-03-18   0.01
2016-03-19   0.02
2016-03-20   0.02
2016-03-21   0.02
2016-03-22   0.02
2016-03-23   0.02
2016-03-24   0.02
2016-03-25   0.02
2016-03-26   0.02
2016-03-27   0.02
2016-03-28   0.02
2016-03-29   0.02
2016-03-30   0.02
2016-03-31   0.02
2016-04-01   0.02
2016-04-02   0.02
2016-04-03   0.03
2016-04-04   0.02
2016-04-05   0.12
2016-04-06   0.22
2016-04-07   0.13
Name: last_seen, dtype: float64

In [126]:
autos['ad_created'].str[:10].value_counts(normalize=True, dropna=False).sort_index() # Percentages

2015-06-11   0.00
2015-08-10   0.00
2015-09-09   0.00
2015-11-10   0.00
2015-12-05   0.00
             ... 
2016-04-03   0.04
2016-04-04   0.04
2016-04-05   0.01
2016-04-06   0.00
2016-04-07   0.00
Name: ad_created, Length: 76, dtype: float64

In [127]:
autos['registration_year'].describe()

count   50000.00
mean     2005.07
std       105.71
min      1000.00
25%      1999.00
50%      2003.00
75%      2008.00
max      9999.00
Name: registration_year, dtype: float64

registration_year needs cleaning.. there are values beyond 9999 and smaller than 1000

## Dealing with Incorrect Registration Year Data

In [128]:
display(autos['registration_year'].value_counts(dropna=False).sort_index().head(10))
display(autos['registration_year'].value_counts(dropna=False).sort_index().tail(20))

1000    1
1001    1
1111    1
1500    1
1800    2
1910    9
1927    1
1929    1
1931    1
1934    2
Name: registration_year, dtype: int64

2011    1634
2012    1323
2013     806
2014     666
2015     399
2016    1316
2017    1453
2018     492
2019       3
2800       1
4100       1
4500       1
4800       1
5000       4
5911       1
6200       1
8888       1
9000       2
9996       1
9999       4
Name: registration_year, dtype: int64

In [132]:
reg_year_filer = (autos['registration_year'] <= 1800) | (autos['registration_year'] >= 2800)
autos.loc[reg_year_filer, 'registration_year'] = None
display(autos[['registration_year']].value_counts(normalize=True))

registration_year
2000.00             0.07
2005.00             0.06
1999.00             0.06
2004.00             0.05
2003.00             0.05
                    ... 
1939.00             0.00
1938.00             0.00
1931.00             0.00
1929.00             0.00
1927.00             0.00
Length: 81, dtype: float64

## Exploring Price by Brand

In [134]:
# Find unique values for brand 
autos['brand'].unique()

array(['peugeot', 'bmw', 'volkswagen', 'smart', 'ford', 'chrysler',
       'seat', 'renault', 'mercedes_benz', 'audi', 'sonstige_autos',
       'opel', 'mazda', 'porsche', 'mini', 'toyota', 'dacia', 'nissan',
       'jeep', 'saab', 'volvo', 'mitsubishi', 'jaguar', 'fiat', 'skoda',
       'subaru', 'kia', 'citroen', 'chevrolet', 'hyundai', 'honda',
       'daewoo', 'suzuki', 'trabant', 'land_rover', 'alfa_romeo', 'lada',
       'rover', 'daihatsu', 'lancia'], dtype=object)

In [137]:
# Use value counts to find top 20 brands
top_20_brands = autos['brand'].value_counts().head(20)

In [139]:
top_20_brands.index

Index(['volkswagen', 'opel', 'bmw', 'mercedes_benz', 'audi', 'ford', 'renault',
       'peugeot', 'fiat', 'seat', 'skoda', 'mazda', 'nissan', 'smart',
       'citroen', 'toyota', 'sonstige_autos', 'hyundai', 'volvo', 'mini'],
      dtype='object')

In [144]:
average_price = {}
for brand in top_20_brands.index:
    average_price[brand] = round(autos.loc[autos['brand'] == brand, 'price'].mean())
average_price

{'volkswagen': 5332,
 'opel': 2945,
 'bmw': 8261,
 'mercedes_benz': 8536,
 'audi': 9213,
 'ford': 3728,
 'renault': 2431,
 'peugeot': 3066,
 'fiat': 2794,
 'seat': 4316,
 'skoda': 6354,
 'mazda': 4059,
 'nissan': 4669,
 'smart': 3518,
 'citroen': 3762,
 'toyota': 5148,
 'sonstige_autos': 12149,
 'hyundai': 5372,
 'volvo': 4867,
 'mini': 10542}

## Storing Aggregate Data in a DataFrame

In [155]:
# we will use the top 20 brands above and calculate average mileage. 
# Lets check the mileage column first. 
autos['odometer_km'].describe()

count    50000.00
mean    125732.70
std      40042.21
min       5000.00
25%     125000.00
50%     150000.00
75%     150000.00
max     150000.00
Name: odometer_km, dtype: float64

In [156]:
average_mileage = {}
for brand in top_20_brands.index:
    average_mileage[brand] = round(autos.loc[autos['brand'] == brand, 'odometer_km'].mean())
average_mileage

{'volkswagen': 128955,
 'opel': 129299,
 'bmw': 132522,
 'mercedes_benz': 130886,
 'audi': 129644,
 'ford': 124132,
 'renault': 128224,
 'peugeot': 127352,
 'fiat': 117037,
 'seat': 122062,
 'skoda': 110948,
 'mazda': 125132,
 'nissan': 118979,
 'smart': 100756,
 'citroen': 119765,
 'toyota': 115989,
 'sonstige_autos': 87189,
 'hyundai': 106783,
 'volvo': 138632,
 'mini': 89375}

In [157]:
# Create two series with two dictionaries. 
price_series = pd.Series(average_price)
mileage_series = pd.Series(average_mileage)

brand_df = pd.DataFrame(price_series, columns=['average_price'])
brand_df['average_mileage'] = mileage_series

brand_df

Unnamed: 0,average_price,average_mileage
volkswagen,5332,128955
opel,2945,129299
bmw,8261,132522
mercedes_benz,8536,130886
audi,9213,129644
ford,3728,124132
renault,2431,128224
peugeot,3066,127352
fiat,2794,117037
seat,4316,122062
