In [17]:
#importing libraries.
import pandas as pd
import plotly.express as px
from ipywidgets import interact, IntSlider, Dropdown
import numpy as np

# CarDash: Interactive Car Sales Insights
This notebook performs an exploratory data analysis (EDA) on a dataset of car sales. The analysis includes data cleaning, visualizing distributions of various features, and handling missing data.

Explore trends and insights into car sales data, focusing on different vehicle types and their features.

In [9]:
#loading data
df = pd.read_csv('../vehicles_us.csv')
df.head()
df.describe()

Unnamed: 0,price,model_year,cylinders,odometer,is_4wd,days_listed
count,51525.0,47906.0,46265.0,43633.0,25572.0,51525.0
mean,12132.46492,2009.75047,6.125235,115553.461738,1.0,39.55476
std,10040.803015,6.282065,1.66036,65094.611341,0.0,28.20427
min,1.0,1908.0,3.0,0.0,1.0,0.0
25%,5000.0,2006.0,4.0,70000.0,1.0,19.0
50%,9000.0,2011.0,6.0,113000.0,1.0,33.0
75%,16839.0,2014.0,8.0,155000.0,1.0,53.0
max,375000.0,2019.0,12.0,990000.0,1.0,271.0


In [18]:
#checking duplicates and removing.
df.drop_duplicates(inplace=True)
print("Initial missing values:\n", df.isnull().sum())


Initial missing values:
 price              0
model_year         0
model              0
condition          0
cylinders       5260
fuel               0
odometer        7892
transmission       0
type               0
paint_color        0
is_4wd             0
date_posted        0
days_listed        0
dtype: int64


In [22]:
#missing value for paint_color and is_4wd with placeholders
df['paint_color'] = df['paint_color'].fillna('Unknown')
df['is_4wd'] = df['is_4wd'].fillna(0)

#restoring missing 'model_year', 'odometer', and 'cylinders' using median values
df['model_year'] = df['model_year'].fillna(df.groupby('model')['model_year'].transform('median'))
df['odometer'] = df['odometer'].fillna(df.groupby(['model', 'model_year'])['odometer'].transform('median'))
df['cylinders'] = df['cylinders'].fillna(df.groupby(['model', 'model_year'])['cylinders'].transform('median'))

print("Updated missing values:\n", df.isnull().sum())

Updated missing values:
 price            0
model_year       0
model            0
condition        0
cylinders       26
fuel             0
odometer        83
transmission     0
type             0
paint_color      0
is_4wd           0
date_posted      0
days_listed      0
dtype: int64


In [26]:
def update_plots(model_year, vehicle_type, fuel_type, transmission_type):
    filtered_data = df[
        (df['model_year'] == model_year) & 
        (df['type'] == vehicle_type) & 
        (df['fuel'] == fuel_type) & 
        (df['transmission'] == transmission_type)
    ]

    if filtered_data.empty:
        print("No data available for the selected criteria.")
    else:
        fig_price = px.histogram(filtered_data, x='price', title=f'Price Distribution for {model_year} {vehicle_type}')
        fig_mileage = px.scatter(filtered_data, x='odometer', y='price', color='condition', title=f'Price vs. Mileage for {model_year} {vehicle_type}')

        fig_price.show()
        fig_mileage.show()


In [29]:
interact(
    update_plots, 
    model_year=IntSlider(min=df['model_year'].min(), max=df['model_year'].max(), step=1, value=df['model_year'].median()),
    vehicle_type=Dropdown(options=df['type'].unique(), value=df['type'].unique()[0]),
    fuel_type=Dropdown(options=df['fuel'].unique(), value=df['fuel'].unique()[0]),
    transmission_type=Dropdown(options=df['transmission'].unique(), value=df['transmission'].unique()[0])
)

interactive(children=(IntSlider(value=2011, description='model_year', max=2019, min=1908), Dropdown(descriptio…

<function __main__.update_plots(model_year, vehicle_type, fuel_type, transmission_type)>

In [31]:
scatter_price_mileage = px.scatter(df, x='odometer', y='price', color='condition',
                                   title='Price vs. Mileage by Vehicle Condition')
scatter_price_mileage.show()

scatter_year_price = px.scatter(df, x='model_year', y='price', color='condition',
                                title='Year vs. Price by Vehicle Condition')
scatter_year_price.show()










## Conclusion

In conclusion, the exploratory data analysis provided significant insights into the car sales dataset. The histograms and scatter plots revealed key trends in vehicle pricing and mileage, highlighting how these attributes vary by vehicle condition and type.

I observed that newer models tend to have higher prices and lower mileage, indicating a depreciation effect on both price and usage as cars age. The ability to filter the data dynamically allowed for a deeper understanding of specific segments of the market.

Restoring missing data improved the overall robustness of our analysis, making our dataset more complete and our insights more reliable. This practice not only helps in maintaining the integrity of the dataset but also enhances the accuracy of any predictive models that might be built on this data in the future.

Overall, this analysis serves as a foundational step towards building more complex analytical tools and models that could help prospective buyers, sellers, and researchers gain a more nuanced understanding of the automotive market.
