# Model Development

In this notebook, we will develop predictive models using the [Vehicles.csv](https://www.kaggle.com/datasets/austinreese/craigslist-carstrucks-data) dataset. dataset. We will explore various machine learning techniques to create models that can accurately predict used car prices. The process will include building base models and optimizing them through techniques such as regularization and hyperparameter tuning.

By the end of this notebook, we will have established a set of foundational models ready for evaluation, aiming to identify the most effective model for predicting used car prices.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler, OrdinalEncoder
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import SimpleImputer, IterativeImputer
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import os, zipfile
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:,.2f}'.format)

df = pd.read_csv('../data/clean_vehicles.csv')

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55806 entries, 0 to 55805
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         55806 non-null  int64  
 1   year          55806 non-null  int64  
 2   manufacturer  55806 non-null  object 
 3   model         55805 non-null  object 
 4   condition     55806 non-null  object 
 5   odometer      55806 non-null  float64
 6   cylinders     55806 non-null  object 
 7   fuel          55806 non-null  object 
 8   title_status  55806 non-null  object 
 9   transmission  55806 non-null  object 
 10  drive         55806 non-null  object 
 11  type          55806 non-null  object 
 12  paint_color   55806 non-null  object 
 13  state         55806 non-null  object 
 14  description   55806 non-null  object 
dtypes: float64(1), int64(2), object(12)
memory usage: 6.4+ MB


In [None]:
zip_path = '../data/clean_vehicles.zip'
extract_path = '../data/'

with zipfile.ZipFile(zip_path,'r') as zip_ref:
    zip_ref.extractall(extract_path)

file_path = os.path.join(extract_path, 'clean_vehicles.csv')
df = pd.read_csv(file_path)
df.info