## Lighthouse Labs
### W07D2 Deployment of ML Models

Instructor: Jeremy Eng

Credit: [Socorro Dominguez](https://github.com/sedv8808/LighthouseLabs/tree/main/W07D2)

Let's create a super fast model for predicting Boston's house pricing.

Disclaimer: we are just quickly creating a trained model. No pre-processing, hyperparameter tuning, etc.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics

#importing dataset from sklearn
from sklearn.datasets import load_boston
boston_data = load_boston()


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [2]:
# initializing dataset
data_ = pd.DataFrame(boston_data.data)

### Top five rows of dataset
data_.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [3]:
# Adding names to our columns
data_.columns = boston_data.feature_names
data_.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [4]:
# Target feature of Boston Housing data
data_['PRICE'] = boston_data.target

In [5]:
# creating feature and target variable 
X = data_.drop(['PRICE'], axis=1)
y = data_['PRICE']

In [6]:
X.head(1)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98


In [6]:
# splitting into training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=1)
print("X training shape : ", X_train.shape)
print("X test shape : ", X_test.shape )
print("y training shape :", y_train.shape )
print("y test shape :", y_test.shape )

X training shape :  (404, 13)
X test shape :  (102, 13)
y training shape : (404,)
y test shape : (102,)


In [7]:
 # creating model
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor()
regressor.fit(X_train, y_train)

In [8]:
# Model evaluation for training data
prediction = regressor.predict(X_train)
print("r^2 : ", metrics.r2_score(y_train, prediction))
print("Mean Absolute Error: ", metrics.mean_absolute_error(y_train, prediction))
print("Mean Squared Error: ", metrics.mean_squared_error(y_train, prediction))
print("Root Mean Squared Error : ", np.sqrt(metrics.mean_squared_error(y_train, prediction)))

r^2 :  0.9816683115205477
Mean Absolute Error:  0.8153217821782169
Mean Squared Error:  1.4808591831683158
Root Mean Squared Error :  1.216905576932046


In [9]:
# Model evaluation for testing data
prediction_test = regressor.predict(X_test)
print("r^2 : ", metrics.r2_score(y_test, prediction_test))
print("Mean Absolute Error : ", metrics.mean_absolute_error(y_test, prediction_test))
print("Mean Squared Error : ", metrics.mean_squared_error(y_test, prediction_test))
print("Root Mean Absolute Error : ", np.sqrt(metrics.mean_squared_error(y_test, prediction_test)))

r^2 :  0.9077466426089644
Mean Absolute Error :  2.3209509803921566
Mean Squared Error :  9.117158480392156
Root Mean Absolute Error :  3.0194632768742453


In [10]:
y_test

307    28.2
343    23.9
47     16.6
67     22.0
362    20.8
       ... 
92     22.9
224    44.8
110    21.7
426    10.2
443    15.4
Name: PRICE, Length: 102, dtype: float64

In [11]:
prediction_test

array([30.012, 27.763, 19.655, 20.498, 19.436, 20.017, 27.77 , 19.564,
       20.523, 23.568, 28.199, 30.568, 20.677, 19.688, 19.928, 25.664,
       11.77 , 40.817, 24.027, 14.632, 19.782, 16.124, 24.689, 23.802,
       25.457,  9.307, 14.826, 19.573, 42.682, 12.272, 26.563, 19.899,
       47.489, 16.22 , 23.467, 20.844, 15.721, 33.312, 13.051, 19.689,
       24.558, 22.962, 26.026, 16.223, 15.476, 10.331, 47.573, 11.172,
       20.925, 18.751, 24.477, 21.462, 24.94 , 21.507, 11.034, 23.7  ,
       11.737, 23.1  , 18.748, 42.184, 14.33 , 26.752, 13.058, 14.971,
       17.598, 33.187, 42.286, 25.269, 21.629, 20.257, 23.962,  6.793,
       18.74 , 22.523, 19.519, 20.533, 40.951, 24.616, 26.452, 32.888,
       17.407, 20.36 , 33.632, 11.635, 25.077, 25.057, 14.622, 24.379,
       19.63 , 17.194, 26.12 , 43.332, 16.196, 20.983, 14.822, 20.828,
       23.952, 23.67 , 42.296, 20.933, 16.416, 15.347])

In [12]:
# saving the model
import pickle

# saving the columns
model_columns = list(X.columns)
with open('model_columns.pkl','wb') as file:
    pickle.dump(model_columns, file)
    
    
pickle.dump(regressor, open('regressor.pkl', 'wb'))

# Create app.py

Once we save our trained pickle model, we need to create a script named [app.py](app.py) to run the deployment.

Once app.py is created, run it:
1. Open terminal
1. Go to working directory `cd .\Dropbox\LighthouseLabs\DataScienceBootcamp\W07D2-ModelDeployment\`
1. `python app.py`

- 127.0.0.1:5000
- 127.0.0.1:5000/predict

## Running our Work on an API
### Checking on Postman

1. Let's review how to do a script and do the script in your favourite IDE.
2. From Terminal, navigate to the root folder of your app and run:
`python3 app.py`
3. An HTML link will pop up, copy it.
4. Open Postman. In the URL section, paste the link. You will see the greeting.
5. Append to the link `/predict`, change to POST, and choose Body, raw, and JSON
6. Copy the following Examples:



Example 1

```
[
    {
    "CRIM" : 0.0063,
    "ZN" : 10.0,
    "INDUS" : 2.31,
    "CHAS" : 0.0,
    "NOX" : 0.0538,
    "RM" : 6.575,
    "AGE" : 65.2,
    "DIS" : 4.0900,
    "RAD" : 1.0,
    "TAX" : 296.0,
    "PTRATIO" : 15.3,
    "B" : 369.90,
    "LSTAT": 0
    }
]
```

### Running API through Python
We can also make a POST request through python.
1. view test.py
1. Open another terminal and run `python test.py`