# Task


##### A datascientist in our team made a basic model to predict car prices.
##### The model was saved to disk ('lgbr_cars.model') using joblib's dump fuctionality.
##### Documentation states the model is a LightGBM Regressor, trained using the sk-learn api. 
##### As engineer, your task it to expose this model as REST-api. 
##### First, retrieve the model via the function below.
##### Change the path according to your setup. 

In [1]:
from sklearn import datasets
from sklearn.externals import joblib
!pip install lightgbm
import lightgbm as lgb



[33mDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support[0m
Collecting lightgbm
  Using cached https://files.pythonhosted.org/packages/21/d1/7773d81964183f6892f71cf43b92f90d0bb8c954c05651d5071a2b480420/lightgbm-2.3.1-py2.py3-none-macosx_10_9_x86_64.macosx_10_10_x86_64.macosx_10_11_x86_64.macosx_10_12_x86_64.macosx_10_13_x86_64.macosx_10_14_x86_64.macosx_10_15_x86_64.whl
Collecting scikit-learn
  Using cached https://files.pythonhosted.org/packages/19/af/1e116d24d6d74da12d90c42f408f16dae8f1a59ab4d95a48acbd2c277183/scikit_learn-0.20.4-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
[31mERROR: scikit-learn 0.20.4 has requirement numpy>

This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.


In [2]:
def retrieve_model(name):
    trained_model = joblib.load(name)
    return  trained_model 


In [3]:
# Assign the model to lgbr_cars
lgbr_cars = retrieve_model('lgbr_cars.model')

In [4]:
lgbr_cars

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=5, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

## Sanity check

In [5]:
import unittest

class correct_model(unittest.TestCase):

    def test_records(self):
        self.assertEqual(str(type(lgbr_cars)),"<class 'lightgbm.sklearn.LGBMRegressor'>", type(lgbr_cars))

suite = unittest.TestLoader().loadTestsFromTestCase(correct_model)
unittest.TextTestRunner(verbosity=2).run(suite)
        

test_records (__main__.correct_model) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.runner.TextTestResult run=1 errors=0 failures=0>

## A feature importance visualization 

In [6]:
import seaborn as sns
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(14, 6))
lgb.plot_importance(lgbr_cars, max_num_features=8, ax=ax)
plt.title("LightGBM - Feature Importance");


Now we have a trained model, lets do a functional test based on the parameters below.  
We have to present the parameters in this order.  

* vehicleType: coupe
* gearbox: manuell
* powerPS: 190
* model: NaN
* kilometer: 125000
* monthOfRegistration: 5 
* fuelType: diesel
* brand: audi

Based on these parameters, we should get a predicted value of 14026.35068804
However, the model doesnt accept string inputs, see the integer encoding below:

In [7]:
model_test_input = [[3,1,190,-1,125000,5,3,1]]

In [8]:
model_test_input

[[3, 1, 190, -1, 125000, 5, 3, 1]]

## Define a function which predicts the output

In [9]:
def make_prediction(trained_model, single_input):
    predicted_value = trained_model.predict(single_input)
    return predicted_value

In [10]:

predicted_value = make_prediction(lgbr_cars, model_test_input)


In [11]:
predicted_value

array([14026.35068804])

## Sanity check

In [12]:
import unittest

class correct_prediction(unittest.TestCase):

    def test_prediction(self):
        self.assertAlmostEqual(float(predicted_value[0]), 14026.35, places = 2)

suite = unittest.TestLoader().loadTestsFromTestCase(correct_prediction)
unittest.TextTestRunner(verbosity=2).run(suite)
        

test_prediction (__main__.correct_prediction) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.runner.TextTestResult run=1 errors=0 failures=0>

Now we have the model up and  it runs well, we want you to **expose it as a rest api.**  


Once its up and running, use it to predict the following input:
* [-1,1,0,118,150000,0,1,38] ==> prediction should be 13920.70

In [13]:
import pickle
filename = 'finalized_model.pkl'
pickle.dump(lgbr_cars, open(filename, 'wb'))


In [16]:
# I ran the python API server locally (see server.py file) and made it publicly accessible using a service called Serveo.
# Then I  called the API via my Jupyter notebooks to predict and the answer was correct.

import requests
import json

url = 'https://farid.serveo.net/api'
#url = 'http://127.0.0.1:5000/'
data = [[-1,1,0,118,150000,0,1,38]]
j_data = json.dumps(data)
headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}
r = requests.post(url, data=j_data, headers=headers)
print(r, r.text)

ConnectionError: HTTPSConnectionPool(host='farid.serveo.net', port=443): Max retries exceeded with url: /api (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x11a6dccf8>: Failed to establish a new connection: [Errno 60] Operation timed out',))