###  Car Price Modeling Project

* Part 1 - Predict price with car metadata

* Part 2 - Predict price with metadata + image

* Details:  Get metadata from craigslist API.  Get additional relevant features, like gas price.  
  Clean data.  Put clean data in SQLite database.

In [1]:
#KEY FEATURES
#price label, brand, model, age, horsepower/engine_type, mileage. time passage from ad post?

#SECONDARY FEATURES
#make, fuel type, mpg, exterior color, door number, transmission type,
#dimensions, safety, air condition, interior, navigation y/n, engine type
#cubic capacity, number of ad views, power steering, rim type
#registered city, number of doors, 4WD, damaged, leather, alarm,
#A/C, parking sensors, xenon lights, remote unlock, electric mirrors,
#seat heat, moon roof, cruise control, abs, traction control

#OTHER FEATURES
#estimated car life, price rank as category (cheap, moderate, expensive)

#Non-craigslist features
# gas price,

#Sparse features to ignore?: damaged, city

In [2]:
#FEATURE ENHANCEMENT:
#Train with stratified sample of make
#Encode Thumbs Up/Down fields: [ON, OFF, NotAvailable] -> [1, -1, 0]
#Poly features for regression.  Ex. Mileage and year
#Differencing features and explicit features for orest.

In [3]:
# Candidate models
# Regression, KNN, SVM, random forest, naive bayes, neural networks
# Neural network should be best, followed by RF and SVM.

In [4]:
# Frame as Price as Regression vs Classification Problem
# Classification: bin continuous features using cluster analysis?

In [5]:
import numpy as np
import pandas as pd
pd.options.display.max_columns = 40

In [6]:
#Use python-craigslist API from here:  https://github.com/juliomalegria/python-craigslist
from craigslist import CraigslistForSale
CraigslistForSale.show_filters()

Base filters:
* query = ...
* search_titles = True/False
* has_image = True/False
* posted_today = True/False
* bundle_duplicates = True/False
* search_distance = ...
* zip_code = ...
Section specific filters:
* min_price = ...
* max_price = ...
* make = ...
* model = ...
* min_year = ...
* max_year = ...
* min_miles = ...
* max_miles = ...
* language = 'af', 'ca', 'da', 'de', 'en', 'es', 'fi', 'fr', 'it', 'nl', 'no', 'pt', 'sv', 'tl', 'tr', 'zh', 'ar', 'ja', 'ko', 'ru', 'vi'
* condition = 'new', 'like new', 'excellent', 'good', 'fair', 'salvage'


In [7]:
#Put results in list-of-lists.  bundle_duplicates needed?
filt={'make' : 'Honda', 'min_year' : 1999, 'max_year' : 2019, 'has_image' : True, 'min_price': 0}

cl_s = CraigslistForSale(site='houston', category='cta',
                         filters=filt)
carLOE=[]
for result in cl_s.get_results(sort_by='newest', limit=5):
    carLOE.append(result)

In [8]:
#Show first two dictionary entries in list
carLOE[0:2]

[{'id': '7006277672',
  'repost_of': None,
  'name': '2005 Honda Accord LX 2dr Coupe - SE HABLA ESPANOL',
  'url': 'https://houston.craigslist.org/ctd/d/south-houston-2005-honda-accord-lx-2dr/7006277672.html',
  'datetime': '2019-10-24 14:04',
  'last_updated': '2019-10-24 14:04',
  'price': '$0',
  'where': '+ WE HAVE FINANCING APPROVALS FOR ALL CREDIT TYPES!',
  'has_image': True,
  'geotag': None},
 {'id': '7006267953',
  'repost_of': None,
  'name': '2016 Honda Pilot Touring',
  'url': 'https://houston.craigslist.org/cto/d/friendswood-2016-honda-pilot-touring/7006267953.html',
  'datetime': '2019-10-24 13:53',
  'last_updated': '2019-10-24 13:53',
  'price': '$25200',
  'where': 'Friendswood',
  'has_image': True,
  'geotag': None}]

In [9]:
#Put carLOE into DataFrame
cars=pd.DataFrame(carLOE)
pd.set_option('max_colwidth', 300)

In [10]:
#View DataFrame
cars.head(4)

Unnamed: 0,id,repost_of,name,url,datetime,last_updated,price,where,has_image,geotag
0,7006277672,,2005 Honda Accord LX 2dr Coupe - SE HABLA ESPANOL,https://houston.craigslist.org/ctd/d/south-houston-2005-honda-accord-lx-2dr/7006277672.html,2019-10-24 14:04,2019-10-24 14:04,$0,+ WE HAVE FINANCING APPROVALS FOR ALL CREDIT TYPES!,True,
1,7006267953,,2016 Honda Pilot Touring,https://houston.craigslist.org/cto/d/friendswood-2016-honda-pilot-touring/7006267953.html,2019-10-24 13:53,2019-10-24 13:53,$25200,Friendswood,True,
2,7006251818,,2002 Honda CR-V EX,https://houston.craigslist.org/cto/d/2002-honda-cr-ex/7006251818.html,2019-10-24 13:34,2019-10-24 13:34,$3900,Houston,True,
3,7006224685,6887552051.0,2003 honda accord exelent conditio,https://houston.craigslist.org/cto/d/houston-2003-honda-accord-exelent/7006224685.html,2019-10-24 13:04,2019-10-24 13:04,$3999,45 south and bway 8,True,


In [11]:
#Show first 4 URLs
cars.loc[0:3, 'url']

0    https://houston.craigslist.org/ctd/d/south-houston-2005-honda-accord-lx-2dr/7006277672.html
1      https://houston.craigslist.org/cto/d/friendswood-2016-honda-pilot-touring/7006267953.html
2                          https://houston.craigslist.org/cto/d/2002-honda-cr-ex/7006251818.html
3         https://houston.craigslist.org/cto/d/houston-2003-honda-accord-exelent/7006224685.html
Name: url, dtype: object