# Analysis of Used Cars from 2018-2019

## Introduction <br>
<br>
The used car market is dynamic and has been subject to several new selling methods. The traditional car dealership model has been  displaced by the increasing market share held by online based retailers. The data provided covers a period of two years of rental car ads posted. 

## Goal <br>
<br>
The purpose of this analysis will be to examine how the used car market places has changed over the time period. This analysis is meant to inform future lines of inquiry concerning used car sales.The following factors will be looked at:<br>
1. Fuel type <br>
2. Condition of vehicles <br>
3. Manufacturer Popularity <br>
4. 

## Initilization

In [1]:
import streamlit as st
import pandas as pd
import plotly_express as px

In [2]:
df_cars = pd.read_csv('~/car_analysis/vehicles_us.csv')

## Data Preperation

In [3]:
display(df_cars.sample(10))

Unnamed: 0,price,model_year,model,condition,cylinders,fuel,odometer,transmission,type,paint_color,is_4wd,date_posted,days_listed
32954,9800,2015.0,ford escape,like new,4.0,gas,46000.0,automatic,SUV,silver,,2018-10-23,2
78,23800,2019.0,nissan frontier crew cab sv,good,6.0,gas,10899.0,other,pickup,silver,1.0,2019-02-28,30
32991,14995,2014.0,volkswagen passat,excellent,4.0,diesel,30231.0,automatic,sedan,,,2018-07-09,24
34092,19995,2015.0,chevrolet camaro,excellent,8.0,gas,75763.0,automatic,convertible,black,,2018-11-26,6
4191,1500,1992.0,ram 1500,good,6.0,gas,108106.0,manual,truck,red,,2019-02-10,16
47845,14995,2008.0,toyota 4runner,good,6.0,gas,46691.0,automatic,SUV,silver,1.0,2018-12-12,34
47008,5599,2011.0,toyota prius,like new,,hybrid,150000.0,automatic,hatchback,silver,,2019-02-23,66
15715,15397,2016.0,chevrolet malibu,excellent,,gas,13216.0,automatic,sedan,blue,,2018-10-17,53
30308,16995,2009.0,ford econoline,excellent,8.0,gas,11657.0,automatic,van,blue,,2019-02-26,50
49156,4495,2012.0,chevrolet impala,good,,gas,122988.0,automatic,sedan,white,,2018-07-27,4


In [4]:
df_cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51525 entries, 0 to 51524
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         51525 non-null  int64  
 1   model_year    47906 non-null  float64
 2   model         51525 non-null  object 
 3   condition     51525 non-null  object 
 4   cylinders     46265 non-null  float64
 5   fuel          51525 non-null  object 
 6   odometer      43633 non-null  float64
 7   transmission  51525 non-null  object 
 8   type          51525 non-null  object 
 9   paint_color   42258 non-null  object 
 10  is_4wd        25572 non-null  float64
 11  date_posted   51525 non-null  object 
 12  days_listed   51525 non-null  int64  
dtypes: float64(4), int64(2), object(7)
memory usage: 5.1+ MB


In [5]:
print(df_cars.duplicated().sum())

0


In [6]:
df_cars.isna().sum()

price               0
model_year       3619
model               0
condition           0
cylinders        5260
fuel                0
odometer         7892
transmission        0
type                0
paint_color      9267
is_4wd          25953
date_posted         0
days_listed         0
dtype: int64

In [56]:
#counting the number of ads posted in 2018
print(df_clean[df_clean['year']==2018].count())

price           23049
model_year      23049
model           23049
condition       23049
cylinders       23049
fuel            23049
odometer        23049
transmission    23049
type            23049
paint_color     23049
is_4wd          23049
date_posted     23049
days_listed     23049
manufacturer    23049
year            23049
age             23049
dtype: int64


In [57]:
#counting the number of ads posted in 2019
print(df_clean[df_clean['year']==2019].count())

price           10257
model_year      10257
model           10257
condition       10257
cylinders       10257
fuel            10257
odometer        10257
transmission    10257
type            10257
paint_color     10257
is_4wd          10257
date_posted     10257
days_listed     10257
manufacturer    10257
year            10257
age             10257
dtype: int64


## Fix Data

In [7]:
#removing all entries with information that is missing
df_clean= df_cars.dropna(subset=['model_year', 'paint_color', 'odometer']) 

In [20]:
df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 33306 entries, 2 to 51523
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   price         33306 non-null  int64         
 1   model_year    33306 non-null  int64         
 2   model         33306 non-null  object        
 3   condition     33306 non-null  object        
 4   cylinders     33306 non-null  int64         
 5   fuel          33306 non-null  object        
 6   odometer      33306 non-null  float64       
 7   transmission  33306 non-null  object        
 8   type          33306 non-null  object        
 9   paint_color   33306 non-null  object        
 10  is_4wd        33306 non-null  float64       
 11  date_posted   33306 non-null  datetime64[ns]
 12  days_listed   33306 non-null  int64         
 13  manufacturer  33306 non-null  object        
 14  year          33306 non-null  int64         
 15  age           33306 non-null  int64 

In [9]:
#filling null values with 0
df_clean= df_clean.fillna(0) 

In [10]:
#converting float64 to int64 for all data expected to be treated as whole numbers
df_clean['model_year'] = df_clean['model_year'].apply(int)
df_clean['cylinders'] = df_clean['cylinders'].apply(int)
df_clean['days_listed'] = df_clean['days_listed'].apply(int)

In [11]:
#converting date_posted to date time data type
df_clean['date_posted'] = pd.to_datetime(df_clean['date_posted'], format='%Y-%m-%d')

In [12]:
#creating column for car manufacturer
df_clean['manufacturer'] = df_clean['model'].apply(lambda x: x.split()[0])

In [13]:
#extracting year that the car was posted
df_clean['year'] = pd.DatetimeIndex(df_clean['date_posted']).year

In [14]:
#calculating the age of the car
df_clean['age'] = df_clean['year'] - df_clean['model_year']

## Car Purchasing Behavior

price           23049
model_year      23049
model           23049
condition       23049
cylinders       23049
fuel            23049
odometer        23049
transmission    23049
type            23049
paint_color     23049
is_4wd          23049
date_posted     23049
days_listed     23049
manufacturer    23049
year            23049
age             23049
dtype: int64


price           10257
model_year      10257
model           10257
condition       10257
cylinders       10257
fuel            10257
odometer        10257
transmission    10257
type            10257
paint_color     10257
is_4wd          10257
date_posted     10257
days_listed     10257
manufacturer    10257
year            10257
age             10257
dtype: int64


In [50]:
fuel_fig= px.histogram(df_clean, x='fuel',
                      color = 'year',
                      color_discrete_sequence = ['navy', 'darkorange'],
                       title='Fuel Types of Car Ads Posted'
                      )
fuel_fig.show()

In [51]:
condition_fig = px.histogram(df_clean, x='condition',
                             color = 'year',
                            color_discrete_sequence = ['navy', 'darkorange'],
                             title='Conditions of Cars by Year'
                            )
condition_fig.show()

In [52]:
manufact_fig = px.histogram(df_clean, x='manufacturer',
                           color='year',
                           color_discrete_sequence = ['navy', 'darkorange'],
                           title='Car Manufacturer Postings by Year'
                           )
manufact_fig.show()

In [53]:
cond_fig = px.histogram(df_clean, x='condition', 
                        color ='year',
                       color_discrete_sequence = ['navy', 'darkorange'],
                       title='Car Conditions by Ad year')

cond_fig.show()

In [54]:
age_fig = px.scatter (df_clean, x='price', y='age', symbol ='year', 
                      color_discrete_sequence = ['navy', 'darkorange'],
                     title='Age of Car and Selling Price by Year')
age_fig.show()