# Guided Project: Exploring Ebay Car Sales Data

In this guided project, we'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.

The dataset was originally scraped and uploaded to Kaggle but has been modified from the original dataset that was uploaded to Kaggle:

- We sampled 50,000 data points from the full dataset, to ensure your code runs quickly in our hosted environment
- We dirtied the dataset a bit to more closely resemble what you would expect from a scraped dataset (the version uploaded to Kaggle was cleaned to be easier to work with)



## Import Libraries and Read in the file

In [5]:
import pandas as pd
import numpy as np


In [6]:
pwd

'/Users/robsalter/Desktop/Dataquest/Step 2 Intermediate Python and Pandas/Guided Project Updated_ Exploring Ebay Car Sales Data'

In [7]:
# when reading the file, it's easiest if the csv is saved in the same file as we're working in. See pwd. Make sure that it's "/" in the file name and not "\"
autos = pd.read_csv("/Users/robsalter/Desktop/Dataquest/Step 2 Intermediate Python and Pandas/Guided Project Updated_ Exploring Ebay Car Sales Data/autos.csv", encoding="Latin-1")
autos.head()

Unnamed: 0,dateCrawled,name,seller,offerType,price,abtest,vehicleType,yearOfRegistration,gearbox,powerPS,model,odometer,monthOfRegistration,fuelType,brand,notRepairedDamage,dateCreated,nrOfPictures,postalCode,lastSeen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


In [8]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null obj

## Initial Observations
- The dataset features 20 columns with information seller, car, location, when the ad was uploaded and the price.
- Some of the columns have null values.
- Interesting columns include price, year of registration, brand and odometer.
- The column names use camelcase instead of Python's preferred snakecase, which means we can't just replace spaces with underscores.

## Renaming and Converting the Column Names

In [9]:
#An array of the column names
autos.columns

Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')

In [10]:
#Converting the names from camelcase to snakecase
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
       'last_seen']

In [11]:
autos.head()

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,"$5,000",control,bus,2004,manuell,158,andere,"150,000km",3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,"$8,500",control,limousine,1997,automatik,286,7er,"150,000km",6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,"$8,990",test,limousine,2009,manuell,102,golf,"70,000km",7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,"$4,350",control,kleinwagen,2007,automatik,71,fortwo,"70,000km",6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,"$1,350",test,kombi,2003,manuell,0,focus,"150,000km",7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


The column names have been converted to Python's preferred snakecase format.

## Identifying Cleaning Tasks

We will initially look for text columns where all or almost all values are the same. These can often be dropped as they don't have useful information for analysis.

We will also look for examples of numeric data stored as text which can be cleaned and converted.

In [12]:
# Include all to get both categorical and numberical columns
autos.describe(include='all')

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
count,50000,50000,50000,50000,50000,50000,44905,50000.0,47320,50000.0,47242,50000,50000.0,45518,50000,40171,50000,50000.0,50000.0,50000
unique,48213,38754,2,2,2357,2,8,,2,,245,13,,7,40,2,76,,,39481
top,2016-03-25 19:57:10,Ford_Fiesta,privat,Angebot,$0,test,limousine,,manuell,,golf,"150,000km",,benzin,volkswagen,nein,2016-04-03 00:00:00,,,2016-04-07 06:17:27
freq,3,78,49999,49999,1421,25756,12859,,36993,,4024,32424,,30107,10687,35232,1946,,,8
mean,,,,,,,,2005.07328,,116.35592,,,5.72336,,,,,0.0,50813.6273,
std,,,,,,,,105.712813,,209.216627,,,3.711984,,,,,0.0,25779.747957,
min,,,,,,,,1000.0,,0.0,,,0.0,,,,,0.0,1067.0,
25%,,,,,,,,1999.0,,70.0,,,3.0,,,,,0.0,30451.0,
50%,,,,,,,,2003.0,,105.0,,,6.0,,,,,0.0,49577.0,
75%,,,,,,,,2008.0,,150.0,,,9.0,,,,,0.0,71540.0,


In [13]:
autos['nr_of_pictures'].value_counts()

0    50000
Name: nr_of_pictures, dtype: int64

Text columns with few unique values:
- seller 
- offer_type 
- abtest
- gearbox
- unrepaired damage
- nr_of_pictures - no unique values. Candidate to be dropped

Numeric data stored as text that needs to be cleaned:
- price
- odometer


## Cleaning The Price Column

In [14]:
autos['price'].value_counts()

$0         1421
$500        781
$1,500      734
$2,500      643
$1,200      639
           ... 
$8,798        1
$23,850       1
$3,410        1
$12,889       1
$4,333        1
Name: price, Length: 2357, dtype: int64

The dollar sign and comma needs to be removed before we convert to the values to a float.

In [15]:
autos['price'] = autos['price'].str.replace('$','').str.replace(',','').astype(float)
autos['price'].head()

0    5000.0
1    8500.0
2    8990.0
3    4350.0
4    1350.0
Name: price, dtype: float64

## Cleaning And Renaming The Odometer Column

In [16]:
autos['odometer'].value_counts()

150,000km    32424
125,000km     5170
100,000km     2169
90,000km      1757
80,000km      1436
70,000km      1230
60,000km      1164
50,000km      1027
5,000km        967
40,000km       819
30,000km       789
20,000km       784
10,000km       264
Name: odometer, dtype: int64

Remove the both the comma and km before converting to a float

In [17]:
autos['odometer'] = autos['odometer'].str.replace('km','').str.replace(',','').astype(float)
autos['odometer'].head()

0    150000.0
1    150000.0
2     70000.0
3     70000.0
4    150000.0
Name: odometer, dtype: float64

In [18]:
# Rename the odometer column to odometer_km
autos.rename(columns = {'odometer': 'odometer_km'}, inplace=True)
autos.head()

Unnamed: 0,date_crawled,name,seller,offer_type,price,abtest,vehicle_type,registration_year,gearbox,power_ps,model,odometer_km,registration_month,fuel_type,brand,unrepaired_damage,ad_created,nr_of_pictures,postal_code,last_seen
0,2016-03-26 17:47:46,Peugeot_807_160_NAVTECH_ON_BOARD,privat,Angebot,5000.0,control,bus,2004,manuell,158,andere,150000.0,3,lpg,peugeot,nein,2016-03-26 00:00:00,0,79588,2016-04-06 06:45:54
1,2016-04-04 13:38:56,BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik,privat,Angebot,8500.0,control,limousine,1997,automatik,286,7er,150000.0,6,benzin,bmw,nein,2016-04-04 00:00:00,0,71034,2016-04-06 14:45:08
2,2016-03-26 18:57:24,Volkswagen_Golf_1.6_United,privat,Angebot,8990.0,test,limousine,2009,manuell,102,golf,70000.0,7,benzin,volkswagen,nein,2016-03-26 00:00:00,0,35394,2016-04-06 20:15:37
3,2016-03-12 16:58:10,Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...,privat,Angebot,4350.0,control,kleinwagen,2007,automatik,71,fortwo,70000.0,6,benzin,smart,nein,2016-03-12 00:00:00,0,33729,2016-03-15 03:16:28
4,2016-04-01 14:38:50,Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...,privat,Angebot,1350.0,test,kombi,2003,manuell,0,focus,150000.0,7,benzin,ford,nein,2016-04-01 00:00:00,0,39218,2016-04-01 14:38:50


## Analysing The Price And Odometer Columns

Analyze the columns using minimum and maximum values and look for any values that look unrealistically high or low (outliers) that we might want to remove.

### Price

In [19]:
autos['price'].unique().shape

(2357,)

The column features 2357 unique values

In [20]:
autos['price'].describe()

count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64

In [21]:
autos['price'].value_counts().sort_index(ascending=True).head(20)

0.0     1421
1.0      156
2.0        3
3.0        1
5.0        2
8.0        1
9.0        1
10.0       7
11.0       2
12.0       3
13.0       2
14.0       1
15.0       2
17.0       3
18.0       1
20.0       4
25.0       5
29.0       1
30.0       7
35.0       1
Name: price, dtype: int64

There are 1,421 values that have a $0 price. We with drop these rows as replacing such a large number of values with a mean will skew our sample.

In [22]:
autos['price'].value_counts().sort_index(ascending=False).head(20)

99999999.0    1
27322222.0    1
12345678.0    3
11111111.0    2
10000000.0    1
3890000.0     1
1300000.0     1
1234566.0     1
999999.0      2
999990.0      1
350000.0      1
345000.0      1
299000.0      1
295000.0      1
265000.0      1
259000.0      1
250000.0      1
220000.0      1
198000.0      1
197000.0      1
Name: price, dtype: int64

There are a number of unrealistically high prices which range into the millions. We can assume that these are error values, and as there are not that many can be removed. There is a large price increase between 350,000 and 999,990 so we will make our maximum value 350,000.

In [23]:
#Selecting values between 1 and 350,000. Could also use df[(df["col"] > x ) & (df["col"] < y )]
autos = autos[autos['price'].between(1,350000)]
#check
autos['price'].describe()

count     48565.000000
mean       5888.935591
std        9059.854754
min           1.000000
25%        1200.000000
50%        3000.000000
75%        7490.000000
max      350000.000000
Name: price, dtype: float64

The majority of used cars are priced between 1 - 7,490. These prices can be investigated further to find which brands have the highest selling price.

### odometer_km column

In [24]:
autos['odometer_km'].unique().shape
# 13 unique values

(13,)

In [25]:
autos['odometer_km'].describe()

count     48565.000000
mean     125770.101925
std       39788.636804
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64

In [26]:
# Investigating values
autos['odometer_km'].value_counts().sort_index(ascending=True).head(15)

5000.0        836
10000.0       253
20000.0       762
30000.0       780
40000.0       815
50000.0      1012
60000.0      1155
70000.0      1217
80000.0      1415
90000.0      1734
100000.0     2115
125000.0     5057
150000.0    31414
Name: odometer_km, dtype: int64

The data seems have an amount rounded to the nearest 10,000 with the exception of 125,000. There are a high amount of values for 5000kms seems low for a used car but certainly not an outlier. There are also a high amount of values for 150,000kms which would seem to indicate that this is a 150,000+ value. No rows have been removed.

## Date Columns

Right now, the date_crawled, last_seen, and ad_created columns are all identified as string values by pandas. Because these three columns are represented as strings, we need to convert the data into a numerical representation so we can understand it quantitatively.



In [27]:
autos[['date_crawled','last_seen','ad_created']][0:5]

Unnamed: 0,date_crawled,last_seen,ad_created
0,2016-03-26 17:47:46,2016-04-06 06:45:54,2016-03-26 00:00:00
1,2016-04-04 13:38:56,2016-04-06 14:45:08,2016-04-04 00:00:00
2,2016-03-26 18:57:24,2016-04-06 20:15:37,2016-03-26 00:00:00
3,2016-03-12 16:58:10,2016-03-15 03:16:28,2016-03-12 00:00:00
4,2016-04-01 14:38:50,2016-04-01 14:38:50,2016-04-01 00:00:00


### Calculating the distribution of values for date_crawled

In [28]:
autos['date_crawled'].value_counts(normalize=True, dropna=False).sort_index(ascending=False)

2016-04-07 14:36:56    0.000021
2016-04-07 14:36:55    0.000021
2016-04-07 14:36:44    0.000021
2016-04-07 14:30:26    0.000021
2016-04-07 14:30:09    0.000021
                         ...   
2016-03-05 14:07:21    0.000021
2016-03-05 14:07:08    0.000021
2016-03-05 14:07:04    0.000021
2016-03-05 14:06:40    0.000021
2016-03-05 14:06:30    0.000021
Name: date_crawled, Length: 46882, dtype: float64

The values would indicate that the data was crawled between 2016-03-05 and 2016-04_07.

### Calculating the distribution of values for last_seen

In [29]:
autos['last_seen'].value_counts(normalize=True, dropna=False).sort_index(ascending=False)

2016-04-07 14:58:50    0.000062
2016-04-07 14:58:48    0.000062
2016-04-07 14:58:46    0.000021
2016-04-07 14:58:45    0.000021
2016-04-07 14:58:44    0.000062
                         ...   
2016-03-05 15:16:47    0.000021
2016-03-05 15:16:11    0.000021
2016-03-05 14:49:34    0.000021
2016-03-05 14:46:02    0.000021
2016-03-05 14:45:46    0.000021
Name: last_seen, Length: 38474, dtype: float64

These values are the same as date crawled and as a result can be dropped.

### Calculating the distribution of values for last_seen

In [30]:
autos['ad_created'].value_counts(normalize=True, dropna=False).sort_index(ascending=False)

2016-04-07 00:00:00    0.001256
2016-04-06 00:00:00    0.003253
2016-04-05 00:00:00    0.011819
2016-04-04 00:00:00    0.036858
2016-04-03 00:00:00    0.038855
                         ...   
2015-12-05 00:00:00    0.000021
2015-11-10 00:00:00    0.000021
2015-09-09 00:00:00    0.000021
2015-08-10 00:00:00    0.000021
2015-06-11 00:00:00    0.000021
Name: ad_created, Length: 76, dtype: float64

The ad_created column includes values between 2015-06-11 and 2016-04-07 (which makes sense as this was the final date crawled).

## Understanding Registration Year

In [31]:
autos['registration_year'].describe()

count    48565.000000
mean      2004.755421
std         88.643887
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

There are clearly a number of error values in the registration_year column as shown by the min, max and 25% values. 
- No vehicle can be registered after 2016-04-07 when the data was crawled. These values can be removed
- The earliest registered vehicle is more challenged but will be somewhere in the first half of 20th century.

In [32]:
autos['registration_year'].value_counts().sort_index(ascending=True).head(20)

1000    1
1001    1
1111    1
1800    2
1910    5
1927    1
1929    1
1931    1
1934    2
1937    4
1938    1
1939    1
1941    2
1943    1
1948    1
1950    3
1951    2
1952    1
1953    1
1954    2
Name: registration_year, dtype: int64

In [33]:
autos['registration_year'].value_counts().sort_index(ascending=False).head(20)

9999       3
9000       1
8888       1
6200       1
5911       1
5000       4
4800       1
4500       1
4100       1
2800       1
2019       2
2018     470
2017    1392
2016    1220
2015     392
2014     663
2013     803
2012    1310
2011    1623
2010    1589
Name: registration_year, dtype: int64

Any values before 1910 and after 2016 can be removed.

In [34]:
autos = autos[autos['registration_year'].between(1910,2016)]
#check
autos['registration_year'].describe()

count    46681.000000
mean      2002.910756
std          7.185103
min       1910.000000
25%       1999.000000
50%       2003.000000
75%       2008.000000
max       2016.000000
Name: registration_year, dtype: float64

### The Distribution of Registration Year

In [35]:
autos['registration_year'].value_counts(normalize=True) * 100

2000    6.760781
2005    6.289497
1999    6.205951
2004    5.790364
2003    5.781796
          ...   
1938    0.002142
1948    0.002142
1927    0.002142
1931    0.002142
1952    0.002142
Name: registration_year, Length: 78, dtype: float64

The most popular registration years are between 1999 and 2004 which would make the cars around 10-16 years old when they were posted.

## Exploring The Unique Values In The Brand Column

We will use the following aggregation technique to explore the brand data:

- Identify the unique values we want to aggregate by
- Create an empty dictionary to store our aggregate data
- Loop over the unique values, and for each:
    - Subset the dataframe by the unique values
    - Calculate the mean of whichever column we're interested in
    - Assign the val/mean to the dict as k/v.
    
We will firstly identify which brands we want to aggregate based on their popularity.

In [36]:
autos['brand'].value_counts(normalize=True) * 100

volkswagen        21.126368
bmw               11.004477
opel              10.758124
mercedes_benz      9.646323
audi               8.656627
ford               6.989996
renault            4.714980
peugeot            2.984083
fiat               2.564212
seat               1.827296
skoda              1.640925
nissan             1.527388
mazda              1.518819
smart              1.415994
citroen            1.400998
toyota             1.270324
hyundai            1.002549
sonstige_autos     0.981127
volvo              0.914719
mini               0.876159
mitsubishi         0.822604
honda              0.784045
kia                0.706926
alfa_romeo         0.664082
porsche            0.612669
suzuki             0.593389
chevrolet          0.569825
chrysler           0.351321
dacia              0.263490
daihatsu           0.250637
jeep               0.227073
subaru             0.214220
land_rover         0.209936
saab               0.164949
jaguar             0.156381
daewoo             0

### Aggregate for brands over 5% frequency

We will aggregate the brands which have a frequency over 5%. We want to to identify trends in the average price for each brand and will make a dictonary with this data.

In [37]:
brands = autos['brand'].value_counts(normalize=True)
most_common_brands = brands[brands>0.05].index
most_common_brands

Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')

In [38]:
brand_mean_prices = {}

for brands in most_common_brands:
    brand_only = autos[autos["brand"] == brands]
    mean_price = brand_only['price'].mean()
    brand_mean_prices[brands] = int(mean_price)
    
brand_mean_prices    


{'volkswagen': 5402,
 'bmw': 8332,
 'opel': 2975,
 'mercedes_benz': 8628,
 'audi': 9336,
 'ford': 3749}

Out of the 6 most popular brands, Audi, Mercedes and BMW have the 3 highest average prices which would reflect their premium brand status. Volkswagen is the most popular brand has an average selling price which again reflects it's place in the market between the premium and economy brands.

## Calculating The Mean Mileage For The Top 6 Brands

In [41]:
brand_mean_mileage = {}

for brands in most_common_brands:
    brand_only = autos[autos['brand'] == brands]
    mean_mileage = brand_only['odometer_km'].mean()
    brand_mean_mileage[brands] = int(mean_mileage)
    
brand_mean_mileage    

{'volkswagen': 128707,
 'bmw': 132572,
 'opel': 129310,
 'mercedes_benz': 130788,
 'audi': 129157,
 'ford': 124266}

## Converting Both Mean Mileage And Mean Prices Into Series Objects

Convert both dictionaries to series objects, using the series constructor.


__Pandas Series Constructor__
bmp_series = pd.Series(brand_mean_prices)

__Pandas Dataframe Constructor__
df = pd.DataFrame(bmp_series, columns=['mean_price'])

In [42]:
#sorted values by descending at the same time as creating series objects
bmp_series = pd.Series(brand_mean_prices).sort_values(ascending=False)
bmm_series = pd.Series(brand_mean_mileage).sort_values(ascending=False)

#check
bmm_series


bmw              132572
mercedes_benz    130788
opel             129310
audi             129157
volkswagen       128707
ford             124266
dtype: int64

## Create a dataframe from the first series object using the dataframe constructor.

In [43]:
df = pd.DataFrame(bmp_series, columns=['mean_price'])
df

Unnamed: 0,mean_price
audi,9336
mercedes_benz,8628
bmw,8332
volkswagen,5402
ford,3749
opel,2975


## Assign the other series as a new column in this dataframe.

In [44]:
df['mean_mileage'] = bmm_series
df

Unnamed: 0,mean_price,mean_mileage
audi,9336,129157
mercedes_benz,8628,130788
bmw,8332,132572
volkswagen,5402,128707
ford,3749,124266
opel,2975,129310


Out of the 3 premium brands, Audi also has the lowest mean mileage which make reflect the higher mean price. From the 2 economy brands, Ford has a lower mean mileage which again may reflect the higher mean price.

## Conclusion

In this guided project, we practiced applying a variety of pandas methods to explore and understand a data set on car listings.