# Dataquest.io Guided Project
## Exploring Ebay Car Sales Data
### Pandas and NumPy Fundamentals
#### The data come from here: https://www.kaggle.com/orgesleka/used-cars-database/data

## Setup Workspace

In [1]:
# Import libraries
import numpy as np
import pandas as pd

In [2]:
# Read in data
autos = pd.read_csv('autos.csv', encoding='Latin-1')

## Initial Data Exploration

#### Below are the definitions from the data dictionary:
- dateCrawled - When this ad was first crawled. All field-values are - - taken from this date.
- name - Name of the car.
- seller - Whether the seller is private or a dealer.
- offerType - The type of listing
- price - The price on the ad to sell the car.
- abtest - Whether the listing is included in an A/B test.
- vehicleType - The vehicle Type.
- yearOfRegistration - The year in which the car was first registered.
- gearbox - The transmission type.
- powerPS - The power of the car in PS.
- model - The car model name.
- kilometer - How many kilometers the car has driven.
- monthOfRegistration - The month in which the car was first registered.
- fuelType - What type of fuel the car uses.
- brand - The brand of the car.
- notRepairedDamage - If the car has a damage which is not yet repaired.
- dateCreated - The date on which the eBay listing was created.
- nrOfPictures - The number of pictures in the ad.
- postalCode - The postal code for the location of the vehicle.
- lastSeenOnline - When the crawler saw this ad last online.


In [3]:
# Take a look at data
autos

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50
5,2016-03-21 13:47:45,Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...,privat,Angebot,"$7,900",test,bus,2006,automatik,150,voyager,"150,000km",4,diesel,chrysler,,2016-03-21 00:00:00,0,22962,2016-04-06 09:45:21
6,2016-03-20 17:55:21,VW_Golf_III_GT_Special_Electronic_Green_Metall...,privat,Angebot,$300,test,limousine,1995,manuell,90,golf,"150,000km",8,benzin,volkswagen,,2016-03-20 00:00:00,0,31535,2016-03-23 02:48:59
7,2016-03-16 18:55:19,Golf_IV_1.9_TDI_90PS,privat,Angebot,"$1,990",control,limousine,1998,manuell,90,golf,"150,000km",12,diesel,volkswagen,nein,2016-03-16 00:00:00,0,53474,2016-04-07 03:17:32
8,2016-03-22 16:51:34,Seat_Arosa,privat,Angebot,$250,test,,2000,manuell,0,arosa,"150,000km",10,,seat,nein,2016-03-22 00:00:00,0,7426,2016-03-26 18:18:10
9,2016-03-16 13:47:02,Renault_Megane_Scenic_1.6e_RT_Klimaanlage,privat,Angebot,$590,control,bus,1997,manuell,90,megane,"150,000km",7,benzin,renault,nein,2016-03-16 00:00:00,0,15749,2016-04-06 10:46:35


In [4]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null obj

#### The autos data has 5,000 rows and 20 columns. 5 columns are ints, and 15 are strings. There are some null values in columns 'vehicleType', 'gearbox', 'model', 'fuelType', and 'notRepariedDamage'.

## Clean Data

In [5]:
# Print feature labels
autos_labels = autos.columns
autos_labels

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

In [6]:
### Edit feature labels

def clean_labels(col):
    if col == 'yearOfRegistration': return 'registration_year'
    if col == 'monthOfRegistration': return 'registration_month'
    if col == 'notRepairedDamage': return 'unrepaired_damage'
    if col == 'dateCreated': return 'ad_created'
    return col.lower()

new_labels = []
for col in autos.columns:
    col = clean_labels(col)
    new_labels.append(col)
    
autos.columns = new_labels
autos.head()

Unnamed: 0,datecrawled,name,seller,offertype,price,abtest,vehicletype,registration_year,gearbox,powerps,model,odometer,registration_month,fueltype,brand,unrepaired_damage,ad_created,nrofpictures,postalcode,lastseen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


### Above, the feature labels were converted to lowercase, and the labels below were changed as follows:
- yearOfRegistration to registration_year
- monthOfRegistration to registration_month
- notRepairedDamage to unrepaired_damage
- dateCreated to ad_created

In [7]:
# Descriptive stats of autos data
autos.describe(include='all')

Unnamed: 0,datecrawled,name,seller,offertype,price,abtest,vehicletype,registration_year,gearbox,powerps,model,odometer,registration_month,fueltype,brand,unrepaired_damage,ad_created,nrofpictures,postalcode,lastseen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-04-02 11:37:04,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


In [8]:
# Take a closer look at the 'fueltype' column
autos.fueltype.value_counts()

benzin     30107
diesel     14567
lpg          691
cng           75
hybrid        37
andere        22
elektro       19
Name: fueltype, dtype: int64

### Notes:
- Above, we can see that the **'nrofpictures'** data is a column that contains all zeros. Because it doesn't provide any information, this column can be dropped.
- The **'name'** column lists the Ford_Fiesta as the most frequent name. This happens to be the names of the brand and the model of the car. If this is the case, the information contained in this column might duplicate the 'brand' and 'model' columns. In this case, the 'name' column could be dropped.
- The **'seller'** column only has two unique values. And, one of the values only occurs once. For this reason, this column can be dropped.
- The **'offertype'** column can be dropped for the same reason we are dropping the 'seller' column.
- The **'price'** column deserves further investigation. There are 1,421 rows that contain 0 USD prices. This might only make sense in the context of auctions. From my experience with eBay, many auctions start the bidding at a price of zero. It is important to note that these items do not sell for a 0 price. Also, this is numeric data that is stored as text. The dollar sign and comma will need to be removed. And, it will have to be converted to numeric values.
- The **'powerps'** column contains many zeros after the decimal point. We can convert these data to integers.
- The **'odometer'** column is numerical data that is stored as text. It will have to be converted to numeric values. Worth noting is that the data seem to be in kilometers. We can change the column label to 'odometer_km' after we drop the 'km' for all observations. Also, the commas will need to be removed.
- The **'registration_month'** column can be converted to integers. Or, maybe we can drop the zeros after the decimals. In this later case, they can remain categorical.
- The **'fueltype'** columnn contains non-english types of fuel that I will need to do a bit of internet sleuthing to better understand.
- We can drop the zeros after the decimal for the data in the **'registration_year'** column.
- The specificity of the data contained in the **'postalcode'** column can be changed to exclude the zeros after the decimal.

In [9]:
# Clean 'price' data

# Remove non-numeric characters
autos['price'] = (autos['price'].str.replace('$','')
                                .str.replace(',','')
                                .astype(int)
                 )

In [10]:
# Clean 'odometer' data

# Change name of 'odometer' column to 'odometer_km'
autos.rename(columns={'odometer': 'odometer_km'}, inplace=True)

# Remove non-numeric characters
autos['odometer_km'] = (autos['odometer_km'].str.replace('km','')
                                            .str.replace(',','')
                                            .astype(int)
                    )

In [11]:
# Analyze 'price' data
autos['price'].describe().round(2)

count       50000.00
mean         9840.04
std        481104.38
min             0.00
25%          1100.00
50%          2950.00
75%          7200.00
max      99999999.00
Name: price, dtype: float64

In [12]:
# Sort price data
autos['price'].value_counts().sort_index(ascending=False).head(20)

99999999    1
27322222    1
12345678    3
11111111    2
10000000    1
3890000     1
1300000     1
1234566     1
999999      2
999990      1
350000      1
345000      1
299000      1
295000      1
265000      1
259000      1
250000      1
220000      1
198000      1
197000      1
Name: price, dtype: int64

In [13]:
# Drop observations with price data greater than 350,000
autos['price'] = autos[autos['price'].between(0,350000)]

### Above, the price outliers were removed. Observations with prices greater than 350,000, were dropped.

In [14]:
# Analyze 'odometer_km' data
autos['odometer_km'].describe()

count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64

In [15]:
# Sort odometer_km data
autos['odometer_km'].value_counts().sort_index(ascending=False).head(20)

150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
70000      1230
60000      1164
50000      1027
40000       819
30000       789
20000       784
10000       264
5000        967
Name: odometer_km, dtype: int64

#### It looks as if the odometer_km data that is 150,000 is for vehicles that have 150,000 or more. I'm not too sure what to do with these data, at the moment. Also, the 50 and 75 percent quartiles are the same as the max. This doesn't look right.

In [16]:
# Rename datecrawled and lastseen columns
autos.rename(columns={'datecrawled': 'date_crawled'}, inplace=True)
autos.rename(columns={'lastseen': 'last_seen'}, inplace=True)

# Take a look at these data
autos[['date_crawled','ad_created','last_seen']][0:5]

Unnamed: 0,date_crawled,ad_created,last_seen
0,2016-03-26 17:47:46,2016-03-26 00:00:00,2016-04-06 06:45:54
1,2016-04-04 13:38:56,2016-04-04 00:00:00,2016-04-06 14:45:08
2,2016-03-26 18:57:24,2016-03-26 00:00:00,2016-04-06 20:15:37
3,2016-03-12 16:58:10,2016-03-12 00:00:00,2016-03-15 03:16:28
4,2016-04-01 14:38:50,2016-04-01 00:00:00,2016-04-01 14:38:50


### Above, we can see that the time data includes the day first and the time of day second. The day is in YYYY-MM-DD format and the time of day information is in HH:MM:SS format.

In [17]:
# Analyze the date_crawled data
autos['date_crawled'].str[:10].describe()

count          50000
unique            34
top       2016-04-03
freq            1934
Name: date_crawled, dtype: object

In [18]:
# Print distribution of date_crawled day data
autos['date_crawled'].str[:10].value_counts(normalize=True)

2016-04-03    0.03868
2016-03-20    0.03782
2016-03-21    0.03752
2016-03-12    0.03678
2016-03-14    0.03662
2016-04-04    0.03652
2016-03-07    0.03596
2016-04-02    0.03540
2016-03-19    0.03490
2016-03-28    0.03484
2016-03-29    0.03418
2016-03-15    0.03398
2016-04-01    0.03380
2016-03-30    0.03362
2016-03-08    0.03330
2016-03-09    0.03322
2016-03-22    0.03294
2016-03-11    0.03248
2016-03-26    0.03248
2016-03-23    0.03238
2016-03-10    0.03212
2016-03-31    0.03192
2016-03-25    0.03174
2016-03-17    0.03152
2016-03-27    0.03104
2016-03-16    0.02950
2016-03-24    0.02910
2016-03-05    0.02538
2016-03-13    0.01556
2016-03-06    0.01394
2016-04-05    0.01310
2016-03-18    0.01306
2016-04-06    0.00318
2016-04-07    0.00142
Name: date_crawled, dtype: float64

In [19]:
# Analyze the ad_created data
autos['ad_created'].str[:10].describe()

count          50000
unique            76
top       2016-04-03
freq            1946
Name: ad_created, dtype: object

In [20]:
# Print distribution of ad_created day data
autos['ad_created'].str[:10].value_counts(normalize=True)

2016-04-03    0.03892
2016-03-20    0.03786
2016-03-21    0.03772
2016-04-04    0.03688
2016-03-12    0.03662
2016-03-14    0.03522
2016-04-02    0.03508
2016-03-28    0.03496
2016-03-07    0.03474
2016-03-29    0.03414
2016-03-19    0.03384
2016-04-01    0.03380
2016-03-15    0.03374
2016-03-30    0.03344
2016-03-08    0.03334
2016-03-09    0.03324
2016-03-22    0.03280
2016-03-11    0.03278
2016-03-26    0.03256
2016-03-23    0.03218
2016-03-31    0.03192
2016-03-25    0.03188
2016-03-10    0.03186
2016-03-17    0.03120
2016-03-27    0.03090
2016-03-16    0.03000
2016-03-24    0.02908
2016-03-05    0.02304
2016-03-13    0.01692
2016-03-06    0.01512
               ...   
2016-02-21    0.00006
2016-02-05    0.00004
2016-02-20    0.00004
2016-02-09    0.00004
2016-02-24    0.00004
2016-02-18    0.00004
2016-01-10    0.00004
2016-02-14    0.00004
2016-02-02    0.00004
2016-02-26    0.00004
2016-01-22    0.00002
2016-01-29    0.00002
2015-06-11    0.00002
2015-08-10    0.00002
2016-02-17

In [21]:
# Analyze the last_seen data
autos['last_seen'].str[:10].describe()

count          50000
unique            34
top       2016-04-06
freq           11050
Name: last_seen, dtype: object

In [22]:
# Print distribution of last_seen day data
autos['last_seen'].str[:10].value_counts(normalize=True)

2016-04-06    0.22100
2016-04-07    0.13092
2016-04-05    0.12428
2016-03-17    0.02792
2016-04-03    0.02536
2016-04-02    0.02490
2016-03-30    0.02484
2016-04-04    0.02462
2016-03-31    0.02384
2016-03-12    0.02382
2016-04-01    0.02310
2016-03-29    0.02234
2016-03-22    0.02158
2016-03-28    0.02086
2016-03-21    0.02074
2016-03-20    0.02070
2016-03-24    0.01956
2016-03-25    0.01920
2016-03-23    0.01858
2016-03-26    0.01696
2016-03-16    0.01644
2016-03-27    0.01602
2016-03-15    0.01588
2016-03-19    0.01574
2016-03-14    0.01280
2016-03-11    0.01252
2016-03-10    0.01076
2016-03-09    0.00986
2016-03-13    0.00898
2016-03-08    0.00760
2016-03-18    0.00742
2016-03-07    0.00536
2016-03-06    0.00442
2016-03-05    0.00108
Name: last_seen, dtype: float64

In [23]:
# Print the name and registration year data sorted by reg in ascending order
autos.loc[:,['name','registration_year']].sort_values('registration_year').head(20)

Unnamed: 0,name,registration_year
22316,VW_Kaefer.__Zwei_zum_Preis_von_einem.,1000
49283,Citroen_HY,1001
24511,Trabant__wartburg__Ostalgie,1111
35238,Suche_Skoda_Fabia____Skoda_Fabia_Combi_mit_Klima,1500
10556,UNFAL_Auto,1800
32585,UNFAL_Auto,1800
28693,Renault_Twingo,1910
42181,SAMSUNG_55_3D_Tv_und_Soundbar_gegen_Auto,1910
15898,Tausch_alles_aus_meinen_Anzeigen_gegen_Auto,1910
3679,Suche_Auto,1910


- Bellier was founded in 1980. (https://de.wikipedia.org/wiki/Bellier)
- First Renault Twingo was 1992 (https://en.wikipedia.org/wiki/Renault_Twingo)
- Samsung sound bars are not cars.
- Not sure what UNFAL is.
- Citroen HY's were first created in 1947. (https://www.letubestation.com/short-history-hy-vans/)

In [24]:
# Print the name and registration year data sorted by reg in descending order
autos.loc[:,['name','registration_year']].sort_values('registration_year', ascending=False).head(20)

Unnamed: 0,name,registration_year
33950,58er_karmann_ghia_lowlight_Kaefer__zum_restaur...,9999
8012,Opel_GT_Karosserie_mit_Brief!,9999
38076,Mercedes_Benz_A180,9999
14341,Hole_kostenlos_ab,9999
6308,Kaufe_Autos_jeglicher,9996
49910,Schoener_fast_neuer_Opel_Mokka_in_Zell_Mosel_m...,9000
13559,Saab_9000_CSE_Automatik_2_3_ltr._mit_EGSD,9000
25003,Reo_Vorkriegs_Oldtimer_Rennwagen_1928,8888
8360,Vito_touret_119_Blue_Tec,6200
27618,Golf_1_75ps_5911km_Original_Automatik_einer_de...,5911


In [25]:
# Convert the registration_year data to integers
autos['registration_year'] = autos['registration_year'].astype(int)

# Drop observations with registration years before 1910 and after 2016
autos['registration_year'] = autos[autos['registration_year'].between(1910,2016)]

# Take a look at the distribution of the data
autos['registration_year'].value_counts(normalize=True)

2016-03-25 19:57:10    0.000062
2016-03-29 23:42:13    0.000062
2016-03-16 21:50:53    0.000062
2016-04-02 11:37:04    0.000062
2016-03-30 19:48:02    0.000062
2016-03-21 16:37:21    0.000062
2016-04-02 15:49:30    0.000062
2016-03-09 11:54:38    0.000062
2016-03-22 09:51:06    0.000062
2016-03-27 22:55:05    0.000062
2016-03-30 17:37:35    0.000062
2016-03-14 20:50:02    0.000062
2016-03-05 16:57:05    0.000062
2016-03-08 10:40:35    0.000062
2016-03-23 18:39:34    0.000062
2016-03-19 17:36:18    0.000062
2016-04-04 16:40:33    0.000062
2016-03-10 15:36:24    0.000062
2016-03-23 19:38:20    0.000062
2016-03-26 11:55:54    0.000042
2016-04-03 14:36:23    0.000042
2016-04-03 14:46:47    0.000042
2016-03-19 09:54:53    0.000042
2016-03-12 22:52:37    0.000042
2016-03-10 19:40:24    0.000042
2016-03-31 21:53:59    0.000042
2016-04-02 09:55:10    0.000042
2016-03-22 20:50:09    0.000042
2016-03-28 12:57:27    0.000042
2016-03-23 22:50:56    0.000042
                         ...   
2016-03-

### Explore car brand data

In [47]:
# Find occurrences of each brand in data set
brand_counts = autos['brand'].value_counts()

# Save the number of cars as num_cars
num_cars = len(autos['brand'])

# Create list to store the most frequently listed brands
top_brands = []

# If a brand accounts for more than 5% 0f the listed cars, save it to list
for count in brand_counts:
    if count / num_cars > 0.05:
        index = brand_counts.index
        top_brands.append(index)

# Print top_brands
top_brands

[Index(['volkswagen', 'opel', 'bmw', 'mercedes_benz', 'audi', 'ford', 'renault',
        'peugeot', 'fiat', 'seat', 'skoda', 'mazda', 'nissan', 'smart',
        'citroen', 'toyota', 'sonstige_autos', 'hyundai', 'volvo', 'mini',
        'mitsubishi', 'honda', 'kia', 'alfa_romeo', 'porsche', 'suzuki',
        'chevrolet', 'chrysler', 'dacia', 'daihatsu', 'jeep', 'subaru',
        'land_rover', 'saab', 'daewoo', 'trabant', 'jaguar', 'rover', 'lancia',
        'lada'],
       dtype='object'),
 Index(['volkswagen', 'opel', 'bmw', 'mercedes_benz', 'audi', 'ford', 'renault',
        'peugeot', 'fiat', 'seat', 'skoda', 'mazda', 'nissan', 'smart',
        'citroen', 'toyota', 'sonstige_autos', 'hyundai', 'volvo', 'mini',
        'mitsubishi', 'honda', 'kia', 'alfa_romeo', 'porsche', 'suzuki',
        'chevrolet', 'chrysler', 'dacia', 'daihatsu', 'jeep', 'subaru',
        'land_rover', 'saab', 'daewoo', 'trabant', 'jaguar', 'rover', 'lancia',
        'lada'],
       dtype='object'),
 Index(['vol

In [46]:
type(brand_counts)

pandas.core.series.Series