In [None]:
# Car Sales Advertisements Data Analysis

## Introduction

This notebook performs exploratory data analysis (EDA) on a car sales advertisements dataset. The goal is to preprocess the data, handle missing values, and create informative visualizations to gain insights into car prices, model years, and other features.

## Data Preprocessing

We start by loading the necessary libraries and the dataset.

```python
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('vehicles_us.csv')

# Display the first few rows of the dataframe
df.head()

# Fill missing values for model_year
df['model_year'] = df.groupby('model')['model_year'].transform(lambda x: x.fillna(x.median()))

# Fill missing values for cylinders
df['cylinders'] = df.groupby('model')['cylinders'].transform(lambda x: x.fillna(x.median()))

# Fill missing values for odometer
df['odometer'] = df.groupby(['model', 'model_year'])['odometer'].transform(lambda x: x.fillna(x.mean()))

# Verify missing values are filled
df.isnull().sum()

# Remove outliers for model_year
q_low = df['model_year'].quantile(0.01)
q_high = df['model_year'].quantile(0.99)
df = df[(df['model_year'] > q_low) & (df['model_year'] < q_high)]

# Remove outliers for price
q_low = df['price'].quantile(0.01)
q_high = df['price'].quantile(0.99)
df = df[(df['price'] > q_low) & (df['price'] < q_high)]

# Basic statistics
df.describe()

# Distribution of car prices
fig = px.histogram(df, x='price', title='Distribution of Car Prices')
fig.show()

# Scatter plot of price vs. odometer
fig = px.scatter(df, x='odometer', y='price', title='Price vs. Odometer')
fig.show()

# Scatter plot of price vs. model year
fig = px.scatter(df, x='model_year', y='price', title='Price vs. Model Year')
fig.show()

##Conclusion
In this notebook, we have preprocessed the car sales dataset by handling missing values and removing outliers. We have also created several visualizations to understand the distribution of car prices and the relationships between price, odometer, and model year. These insights will help in further analysis and model building.