# Car Ad Data Analysis
In this notebook, exploritory data analysis will be done on a data set containing information about used car listings. 

#### Project Sections:
1. Set Up: Loading in data and Packages, Cleaning Data
2. Exploritory Data Analysis:
    - Explore the relationship between the number of vehicles listed and make
    - Explore the relationship between model year, make, and price


### Section 1: Set Up 

In [20]:
# Import the necessary Packages 
import numpy as np
import pandas as pd
import plotly.express as px

# Read in and sample data 
df = pd.read_csv("C:\\Users\\Leigh\\Desktop\\Sprint 4 Project\\vehicles_us.csv")
df.sample(5)

Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed
15665,5995,2011.0,toyota camry le,like new,4.0,gas,105385.0,automatic,sedan,silver,,2019-01-06,54
27805,7991,2007.0,honda cr-v,excellent,4.0,gas,94174.0,automatic,SUV,,,2018-10-28,13
673,5450,2013.0,chrysler 200,good,4.0,gas,84000.0,automatic,sedan,,,2018-10-20,39
35873,6995,2014.0,dodge charger,good,8.0,gas,171000.0,automatic,sedan,blue,,2018-12-31,86
18420,21900,2016.0,nissan frontier crew cab sv,good,6.0,gas,4998.0,other,pickup,white,,2018-06-09,38


In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    47906 non-null  float64
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     46265 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      43633 non-null  float64
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   42258 non-null  object 
 10  is_4wd        25572 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(4), int64(2), object(7)
memory usage: 5.1+ MB


#### Clean up the data

From the initial sampling of the data, I can see that the model column holds both the make and model of the vehicle. In order to preform my analysis, the make and model will need to be separated into two distinct columns. I can also see some data type issues, including that the date related columns are not in the appropriate datetime data type.  

In [22]:
# Separate make and model into 2 columns 
df[['make', 'model']] = df['model'].str.split(' ',n=1, expand=True)

# Make model_year and date_posted the datetime data type 
df['model_year'] = pd.to_datetime(df['model_year'], format='%Y')
df['date_posted'] = pd.to_datetime(df['date_posted'], format= '%Y-%m-%d')

# Verify Make and Model Column Corrections 
df.head(10)

Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed,make
0,9400,2011-01-01,x5,good,6.0,gas,145000.0,automatic,SUV,,1.0,2018-06-23,19,bmw
1,25500,NaT,f-150,good,6.0,gas,88705.0,automatic,pickup,white,1.0,2018-10-19,50,ford
2,5500,2013-01-01,sonata,like new,4.0,gas,110000.0,automatic,sedan,red,,2019-02-07,79,hyundai
3,1500,2003-01-01,f-150,fair,8.0,gas,,automatic,pickup,,,2019-03-22,9,ford
4,14900,2017-01-01,200,excellent,4.0,gas,80903.0,automatic,sedan,black,,2019-04-02,28,chrysler
5,14990,2014-01-01,300,excellent,6.0,gas,57954.0,automatic,sedan,black,1.0,2018-06-20,15,chrysler
6,12990,2015-01-01,camry,excellent,4.0,gas,79212.0,automatic,sedan,white,,2018-12-27,73,toyota
7,15990,2013-01-01,pilot,excellent,6.0,gas,109473.0,automatic,SUV,black,1.0,2019-01-07,68,honda
8,11500,2012-01-01,sorento,excellent,4.0,gas,104174.0,automatic,SUV,,1.0,2018-07-16,19,kia
9,9200,2008-01-01,pilot,excellent,,gas,147191.0,automatic,SUV,blue,1.0,2019-02-15,17,honda


In [23]:
# Verify Datatype Corrections 
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   price         51525 non-null  int64         
 1   model_year    47906 non-null  datetime64[ns]
 2   model         51525 non-null  object        
 3   condition     51525 non-null  object        
 4   cylinders     46265 non-null  float64       
 5   fuel          51525 non-null  object        
 6   odometer      43633 non-null  float64       
 7   transmission  51525 non-null  object        
 8   type          51525 non-null  object        
 9   paint_color   42258 non-null  object        
 10  is_4wd        25572 non-null  float64       
 11  date_posted   51525 non-null  datetime64[ns]
 12  days_listed   51525 non-null  int64         
 13  make          51525 non-null  object        
dtypes: datetime64[ns](2), float64(3), int64(2), object(7)
memory usage: 5.5+ MB


### 2. Exploritory Data Analysis

#### Number of Vehicles Listed By Make 

In [24]:
px.histogram(df, x="make", title=' Number of Vehicles Listed by Make')

#### Make vs. Price

In [25]:
fig = px.scatter(df, x="model_year", y="price", color="make",
                title="Price vs. Model Year")
fig.show()


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result



In [26]:
fig = px.scatter(df, x="model_year", y="price", color="condition",
                title="Price vs. Model Year")
fig.show()


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result



### Conclusion

The majority of vehicles listed are made by Ford (12.6k) followed by Chevrolet (10.61k) and Toyota (5.4k). Mercedes has the least number of vehicles listed (41). 

There is a trend suggesting that the price of newer cars typicaly has a higher ceiling. I can also see an that Chevy's and Fords from the 1960's are an exception to that trend. 