
# Price Prediction Model for Automobiles

## Predict the Price of an automobile based on the features and hardware properties of some previously sold vehicles

I used the "automobile" price prediction public dataset from the UCI repository (source: https://archive.ics.uci.edu/ml/datasets/automobile)

The data was cleansed and formatted with the following attributes:


   1. Index: The dataset table was deliberately randomized to avoid incidences of uniform clustered data hence the unordered index. 
   2. The rest of the attributes include the price, model and other selected automobile features as listed and explained in detail in the naming document downloadable alongside the UCI dataset.
   3. String values were converted to their associated numerical values and prepared for Machine Learning with one-hot encoding. 
   


## Task

To build a Machine Learning Model for price prediction which outputs an estimated price of a vehicle based on user-inputted features.

Simple linear regression will be utilized using the Scikit_Learn ML Library

In [33]:
##Import Statements

### To serialize the model and produce a pickle file output for further model deployment and hosting in any ML engine 
import datetime
import pickle

### To import pandas library for data manipulation and analysis
import pandas as pd

### To build a simple regression model using the Scikit_Learn framework
from sklearn.linear_model import LinearRegression
from sklearn.externals import joblib
from sklearn import preprocessing
from sklearn.pipeline import Pipeline
import warnings

In [29]:
# The code was removed by Watson Studio for sharing.

In [28]:
### To return the first five rows of the dataframe including the header as a glimpse of the dataset
automobile.head()

Unnamed: 0,index,alfa-romero,audi,bmw,chevrolet,dodge,honda,isuzu,jaguar,mazda,...,l,ohc,ohcf,ohcv,num-of-cylinders,bore,horsepower,4wd,fwd,rwd
0,15,0,0,1,0,0,0,0,0,0,...,0,1,0,0,6,3.62,182,0,0,1
1,139,0,0,0,0,0,0,0,0,0,...,0,0,1,0,4,3.62,73,0,1,0
2,190,0,0,0,0,0,0,0,0,0,...,0,1,0,0,4,3.19,90,0,1,0
3,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,4,3.47,111,0,0,1
4,161,0,0,0,0,0,0,0,0,0,...,0,1,0,0,4,3.19,70,0,1,0


In [26]:
### To view the header columns
automobile.columns

Index(['index', 'alfa-romero', 'audi', 'bmw', 'chevrolet', 'dodge', 'honda',
       'isuzu', 'jaguar', 'mazda', 'mercedes-benz', 'mercury', 'mitsubishi',
       'nissan', 'peugot', 'plymouth', 'porsche', 'saab', 'subaru', 'toyota',
       'volkswagen', 'volvo', 'diesel', 'gas', 'convertible', 'hardtop',
       'hatchback', 'sedan', 'wagon', 'price', 'num-of-doors', 'avg-mpg',
       'dohc', 'l', 'ohc', 'ohcf', 'ohcv', 'num-of-cylinders', 'bore',
       'horsepower', '4wd', 'fwd', 'rwd'],
      dtype='object')

In [27]:
### To convert the price column to numeric form as the label will be the target of the prediction
auto_price = automobile[['price']].apply(pd.to_numeric)

In [28]:
### Drop the price column and leave the remaining data in a new table dataframe
auto_features = automobile.drop('price', axis=1)

In [29]:
### The index column will also be dropped as its no more useful from this point as it has no predictive value
auto_features = auto_features.drop(['index'], axis=1)

### We are now left with the needed features for the ML model

In [30]:
### To view the descriptive analysis of the final dataset
auto_features.describe()

Unnamed: 0,alfa-romero,audi,bmw,chevrolet,dodge,honda,isuzu,jaguar,mazda,mercedes-benz,...,l,ohc,ohcf,ohcv,num-of-cylinders,bore,horsepower,4wd,fwd,rwd
count,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,...,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0
mean,0.015789,0.031579,0.042105,0.015789,0.047368,0.078947,0.010526,0.005263,0.057895,0.036842,...,0.063158,0.736842,0.078947,0.063158,4.368421,3.320842,101.536842,0.042105,0.6,0.357895
std,0.12499,0.175338,0.20136,0.12499,0.212987,0.270369,0.102326,0.072548,0.234161,0.188872,...,0.243889,0.441511,0.270369,0.243889,0.86131,0.276848,36.474478,0.20136,0.491192,0.480647
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,3.0,2.54,34.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,4.0,3.135,70.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,4.0,3.31,94.0,0.0,1.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,4.0,3.59,116.0,0.0,1.0,1.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,8.0,3.94,207.0,1.0,1.0,1.0


In [31]:
### To build and trim a linear regression model using the scikit_learn library LinearRegression class
model = LinearRegression(fit_intercept=False)

In [32]:
model.fit(auto_features,auto_price)

LinearRegression(copy_X=True, fit_intercept=False, n_jobs=None, normalize=False)

In [None]:
### Serialize the model on a pickle file for further deployment in any Ml engine
with open('model.pkl', 'wb') as model_file:
    pickle.dump(model,model_file)

In [33]:
### Use the model for prediction using sample feature instances of a vehicle
model.predict([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,2,30.5,0,0,1,0,0,2,3.19,70,0,1,0]])

array([[2305.625]])

In [None]:
### Locate the serialized pickle file
ls