<a href="https://colab.research.google.com/github/muiruriiii/Lux-Tech-Academy-Week-4/blob/main/Lux_Tech_Academy_Week_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Week 4 Project:

###Using the Craigslist Vehicles Dataset available on Kaggle (https://www.kaggle.com/datasets/mbaabuharun/craigslist-vehicles), we'd like you to create a Time-Series Model following the approach outlined below.

Here are the key steps:

Start by addressing missing values in the dataset. You can handle this by filling in missing values with the median for numerical columns and the mode for categorical columns.
Ensure that the data types of the columns are appropriate. Specifically, make sure to convert the 'posting_date' column to a datetime data type.
Utilize the 'posting_date' column to create a datetime index for the dataset. This will facilitate the analysis of temporal patterns.
With clean data, explore it using various visualizations and statistical analysis techniques. This step is crucial for understanding temporal patterns, identifying seasonal trends, and analyzing demand-supply dynamics by region and vehicle type.
Build the time-series chart.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/LUX"

In [None]:
# Data Preprocessing
# Addressing missing values in the dataset by filling in missing values
# with the median for numerical columns and the mode for categorical columns.
import pandas as pd
import numpy as np
import csv

# Loading the dataset
data = pd.read_csv("craigslist_vehicles.csv")

# Filling missing values with median for numerical columns and mode for categorical columns
numerical_cols = data.select_dtypes(include=[np.number]).columns
categorical_cols = data.select_dtypes(include=[np.object]).columns

data[numerical_cols] = data[numerical_cols].fillna(data[numerical_cols].median())
data[categorical_cols] = data[categorical_cols].fillna(data[categorical_cols].mode().iloc[0])


In [None]:
# Data Type Conversion
# Converting 'posting_date' column to datetime
data['posting_date'] = pd.to_datetime(data['posting_date'])


In [None]:
# Creating a DateTime Index
# Setting 'posting_date' as the index
data.set_index('posting_date', inplace=True)

In [None]:
# Data Exploration
# Exploring the data using various visualizations
import matplotlib.pyplot as plt

# Exploring temporal patterns
data['price'].resample('M').mean().plot(title='Average Price Over Time')
plt.xlabel('Time')
plt.ylabel('Average Price')
plt.show()

# Exploring seasonal trends
data['price'].groupby(data.index.month).mean().plot(kind='bar', title='Monthly Average Price')
plt.xlabel('Month')
plt.ylabel('Average Price')
plt.show()

# Analyzing demand-supply dynamics by region and vehicle type
demand_supply = data.groupby(['region', 'type']).size().unstack()
demand_supply.plot(kind='bar', stacked=True, title='Demand-Supply Dynamics')
plt.xlabel('Region')
plt.ylabel('Count')
plt.show()


In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Seasonal decomposition of price
result = seasonal_decompose(data['price'], model='additive')
result.plot()
plt.show()