# Project 1

# Used Vehicle Price Prediction

## Introduction

- 1.2 Million listings scraped from TrueCar.com - Price, Mileage, Make, Model dataset from Kaggle: [data](https://www.kaggle.com/jpayne/852k-used-car-listings)
- Each observation represents the price of an used car

## Part 1: Phishing Detection

Phishing, by definition, is the act of defrauding an online user in order to obtain personal information by posing as a trustworthy institution or entity. Users usually have a hard time differentiating between legitimate and malicious sites because they are made to look exactly the same. Therefore, there is a need to create better tools to combat attackers.

In [None]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')


In [None]:
data = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTrain_carListings.zip')

In [None]:
data.head()

In [None]:
data.shape

#### Data preparation

In [None]:
y_ = data['Price']
y_.head()

In [None]:
y_.shape

In [None]:
data_train = data.drop(['Price'], axis = 1)
data_train.index.name = "ID"
data_train.head()

In [None]:
import category_encoders as ce

encoder = ce.BinaryEncoder()
encoder.fit(data_train, axis=1)
train = encoder.transform(data_train)

In [None]:
train.columns

In [None]:
train.head()

In [None]:
train.shape

In [None]:
x_text=[{"Year": 2014, "Mileage": 31909, "State": "MD","Make":"Nissan","Model":"MuranoAWD" }]
x_text

In [None]:
input_ = pd.DataFrame.from_dict(x_text)
input_

In [None]:
input_encode = encoder.transform(input_)
input_encode

#### dataset split

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(train, y_, random_state=6, train_size = 0.01)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

#### Model

In [None]:
from sklearn.ensemble import RandomForestClassifier

tree = RandomForestClassifier()
tree.fit(X_train, y_train)

### Save model

In [None]:
from sklearn.externals import joblib

In [None]:
joblib.dump(tree, 'model_deployment/price_pred.pkl', compress=3)

In [None]:
joblib.dump(encoder, 'model_deployment/encode.pkl', compress=3)

## Part 2: Model in batch

See m07_model_deployment.py

In [1]:
from model_deployment.m09_model_deployment import predict_price

In [2]:
predict_price(2014,31909,"MD","Nissan","MuranoAWD")

31980

## Part 3: API

Flask is considered more Pythonic than Django because Flask web application code is in most cases more explicit. Flask is easy to get started with as a beginner because there is little boilerplate code for getting a simple app up and running.

First we need to install some libraries 

```
pip install flask-restplus
```

Load Flask

In [1]:
from flask import Flask
from flask_restplus import Api, Resource, fields
from sklearn.externals import joblib

Create api

In [2]:
app = Flask(__name__)

api = Api(
    app, 
    version='1.0', 
    title='Prediction Price API',
    description='Prediction Price API')

ns = api.namespace('predict', 
     description='Price')
   
parser = api.parser()

parser.add_argument(
    'Data Year',
    type=int, 
    required=True, 
    help='Year', 
    location='args')
parser.add_argument(
    'Data Mileage',
    type=int, 
    required=True, 
    help='Mileage', 
    location='args')

parser.add_argument(
    'Data State',
    type=str, 
    required=True, 
    help='State', 
    location='args')


parser.add_argument(
    'Data Make',
    type=str, 
    required=True, 
    help='Make', 
    location='args')

parser.add_argument(
    'Data Model',
    type=str, 
    required=True, 
    help='Model', 
    location='args')

resource_fields = api.model('Resource', {
    'result': fields.Integer,
})

Load model and create function that predicts an URL

In [3]:
from model_deployment.m09_model_deployment import predict_price

In [4]:
@ns.route('/')
class PriceApi(Resource):

    @api.doc(parser=parser)
    @api.marshal_with(resource_fields)
    def get(self):
        args = parser.parse_args()
        
        return {
         "result": predict_price(args['Data Year'],args['Data Mileage'],args['Data State'],
                                                                             args['Data Make'],args['Data Model'])
        }, 200

Run API

In [None]:
app.run(debug=True, use_reloader=False, host='0.0.0.0', port=5001)

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: on


 * Running on http://0.0.0.0:5001/ (Press CTRL+C to quit)
127.0.0.1 - - [07/Mar/2019 19:50:09] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [07/Mar/2019 19:50:10] "GET /swagger.json HTTP/1.1" 200 -
127.0.0.1 - - [07/Mar/2019 19:51:03] "GET /predict/?Data%20Year=2018&Data%20Mileage=70000&Data%20State=MD&Data%20Make=de&Data%20Model=ds HTTP/1.1" 200 -


Check using 

* http://localhost:5000/predict/?URL=http://consultoriojuridico.co/pp/www.paypal.com/
