Now that we have clean data we can train a model. The user will give a car (manufacturer, model and year) with a certain odometer reading and the price that the seller is asking for. The model will look at the data available for that car, fit a linear regression and look at the predicted price for the vehicle. Then we can tell the user whether the price is over, under or around what is typical. If the price is under then this suggests they may be getting a good deal.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import sys
from tqdm.notebook import tqdm
sys.path.append("..")
from car_purchase_help.data_processing import format_raw_df, split_by_description
from car_purchase_help.model1 import predict_price, fit_lin_regression, get_advice

%matplotlib inline
pd.options.display.float_format = '{:20,.2f}'.format
plt.style.use('fivethirtyeight')

In [2]:
# https://www.kaggle.com/austinreese/craigslist-carstrucks-data/
df = pd.read_csv(Path('../data/vehicles.csv'))
df = format_raw_df(df)

We loop over every manufacturer, then model and then year training a regression and saving in a pkl file for quick access by the web app.
There are various errors that can occur which are caught and ignored.

In [3]:
for manu in tqdm(df['manufacturer'].unique()):
    data_manu = df[df['manufacturer'] == manu]
    for mod in tqdm(data_manu['model'].unique()):
        data_mod = data_manu[data_manu['model'] == mod]
        for year in data_mod['year'].unique():
            try:
                fit_lin_regression(df, manu, mod, year)
            except Exception as e:
                pass

HBox(children=(FloatProgress(value=0.0, max=43.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=280.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=126.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=972.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=367.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=2634.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1091.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1580.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1067.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=710.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3782.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1265.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=602.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1246.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=413.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=272.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=769.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=289.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=747.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=508.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1113.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=408.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=575.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=318.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=172.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=348.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=115.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=414.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=141.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=105.0), HTML(value='')))




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=72.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=70.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=113.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=225.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=174.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=37.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=12.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=22.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))





Now that the models are saved we can make predicitons.

Let's see what advice we get for a Honda Accord from 2015 with 100,000 miles on the clock.

In [15]:
pred_price, mean_abs_res = predict_price('honda', 'accord', 2015, odometer=100000)

In [16]:
get_advice(pred_price, listed_price=6000, mean_absolute_residual=mean_absolute_residual)

'This appears to be a very good deal'

In [17]:
get_advice(pred_price, listed_price=10000, mean_absolute_residual=mean_absolute_residual)

'This appears to be a good deal'

In [18]:
get_advice(pred_price, listed_price=14000, mean_absolute_residual=mean_absolute_residual)

'This appears to be a bad deal'

In [19]:
get_advice(pred_price, listed_price=18000, mean_absolute_residual=mean_absolute_residual)

'This appears to be a very bad deal'

In [20]:
pred_price, mean_abs_res = predict_price('tesla', 'model-s', 2015, odometer=10000)

AssertionError: No regression model for this car from that year

This query fails because there is too little data for Teslas and no model was saved

In [21]:
pred_price, mean_abs_res = predict_price(' MerCedeS-BenZ ', 'C-class', 2015, 45000.4)

In [23]:
get_advice(pred_price, listed_price=20000, mean_absolute_residual=mean_abs_res)

'This appears to be a fair price for the car'

In [24]:
get_advice(pred_price, listed_price=30000, mean_absolute_residual=mean_abs_res)

'This appears to be a very bad deal'

It appears this model is giving appropriate predictions and is ready to be used in the web app.