# Fast AI with Tabular data

This notebook is based on fastai's cours v3 lesson 4.  We are going to train a model that predict salary range base on the data we provided.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=nb&ea=open&el=gallery-example&dt=fastai-tabular-csv)

In [1]:
!pip install fastai
!pip install bentoml





In [2]:
from fastai.tabular import *

In [8]:
!ls data

[31madult.csv[m[m


In [7]:
PATH=Path('data/')

In [9]:
df = pd.read_csv(PATH/'adult.csv')

In [10]:
dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [FillMissing, Categorify, Normalize]

In [12]:
test = TabularList.from_df(df.iloc[800:1000].copy(), path=PATH, cat_names=cat_names, cont_names=cont_names)

In [13]:
data = (TabularList.from_df(df, path=PATH, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(list(range(800,1000)))
                           .label_from_df(cols=dep_var)
                           .add_test(test)
                           .databunch())

In [14]:
data.show_batch(rows=10)

workclass,education,marital-status,occupation,relationship,race,education-num_na,age,fnlwgt,education-num,target
Private,Bachelors,Never-married,Prof-specialty,Own-child,White,False,-1.1425,0.1197,1.1422,<50k
Private,Some-college,Never-married,Sales,Not-in-family,White,False,-1.1425,0.0429,-0.0312,<50k
Private,5th-6th,Never-married,Sales,Own-child,Asian-Pac-Islander,False,-0.8493,-0.7499,-2.7692,<50k
Local-gov,Assoc-acdm,Married-civ-spouse,Tech-support,Husband,White,False,0.3235,-1.439,0.7511,<50k
Private,Masters,Married-civ-spouse,Sales,Husband,White,False,1.203,-0.0404,1.5334,>=50k
Self-emp-inc,Masters,Married-civ-spouse,Sales,Husband,White,False,0.8365,0.4802,1.5334,>=50k
Private,10th,Never-married,#na#,Own-child,White,True,-1.5823,0.9933,-0.0312,<50k
State-gov,Bachelors,Married-civ-spouse,Protective-serv,Husband,White,False,-0.9959,0.6916,1.1422,>=50k
Private,Some-college,Never-married,Adm-clerical,Own-child,White,False,-1.3624,1.6801,-0.0312,<50k
Private,HS-grad,Divorced,Other-service,Unmarried,White,False,-0.4095,-0.0657,-0.4224,<50k


In [15]:
learn = tabular_learner(data, layers=[200,100], metrics=accuracy)

In [16]:
learn.fit(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.355843,0.371649,0.84,00:08


In [17]:
row = df.iloc[0]

In [18]:
learn.predict(row)

(Category >=50k, tensor(1), tensor([0.2998, 0.7002]))

# Save model as machine learning service with BentoML

In [19]:
%%writefile tabular_csv.py

from bentoml import env, api, artifacts, BentoService
from bentoml.artifact import FastaiModelArtifact
from bentoml.handlers import DataframeHandler


@env(conda_environment=['fastai'])
@artifacts([FastaiModelArtifact('model')])
class TabularModel(BentoService):
    
    @api(DataframeHandler)
    def predict(self, df):
        result = []
        for index, row in df.iterrows():            
            result.append(self.artifacts.model.predict(row))
        return str(result)

Overwriting tabular_csv.py


In [20]:
from tabular_csv import TabularModel

svc = TabularModel.pack(model=learn)
saved_path = svc.save('/tmp/bento_archive')
print(saved_path)

[2019-07-20 17:58:07,258] INFO - Searching for dependant modules of tabular_csv:/Users/mac/Desktop/tabular-csv/tabular_csv.py
[2019-07-20 17:58:29,658] INFO - Copying local python module '/Users/mac/Desktop/tabular-csv/tabular_csv.py'
[2019-07-20 17:58:29,661] INFO - Done copying local python dependant modules
[2019-07-20 17:58:29,841] INFO - BentoService TabularModel:2019_07_20_4a68a58f saved to /tmp/bento_archive/TabularModel/2019_07_20_4a68a58f
/tmp/bento_archive/TabularModel/2019_07_20_4a68a58f


## Use BentoML Archive as CLI TOOL

In [21]:
!pip install {saved_path}

Processing /tmp/bento_archive/TabularModel/2019_07_20_4a68a58f
Building wheels for collected packages: TabularModel
  Building wheel for TabularModel (setup.py) ... [?25ldone
[?25h  Stored in directory: /private/var/folders/lb/vtg9bbk1379_rzkczkzlxc0w0000gn/T/pip-ephem-wheel-cache-x6o8666t/wheels/3f/60/98/c6c581599d32df948b246a21db5037058fe81d21a8d434e153
Successfully built TabularModel
Installing collected packages: TabularModel
  Found existing installation: TabularModel 2019-07-20-bf8e13bf
    Uninstalling TabularModel-2019-07-20-bf8e13bf:
      Successfully uninstalled TabularModel-2019-07-20-bf8e13bf
Successfully installed TabularModel-2019-07-20-4a68a58f


In [22]:
# Use json data
!TabularModel predict --input=test.json

[(Category <50k, tensor(0), tensor([0.7044, 0.2956]))]


In [23]:
# Use CSV data
!TabularModel predict --input=test.csv

[(Category >=50k, tensor(1), tensor([0.2998, 0.7002]))]


## Use it as REST API server


*Note: Running as local rest api server does not work with Google Colab, please copy this notebook to run it locally*

In [24]:
!bentoml serve {saved_path}

 * Serving Flask app "TabularModel" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [20/Jul/2019 17:59:11] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [20/Jul/2019 17:59:13] "GET /docs.json HTTP/1.1" 200 -
127.0.0.1 - - [20/Jul/2019 17:59:21] "POST /predict HTTP/1.1" 200 -
[2019-07-20 18:00:15,596] ERROR in app: Exception on /predict [POST]
Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/anaconda3/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/anaconda3/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
    ra

## Make request to REST API server

#### Post as JSON

```bash
curl -X POST \
  http://localhost:5000/predict \
  -H 'Content-Type: application/json' \
  -d '[{
  "age": 49,
  "workclass": "Private",
  "fnlwgt": 101320,
  "education": "Assoc-acdm",
  "education-num": 12.0,
  "marital-status": "Married-civ-spouse",
  "occupation": "",
  "relationship": "Wift",
  "race": "White",
  "sex": "Female",
  "capital-gain": 0,
  "capital-loss": 1902,
  "hours-per-week": 40,
  "native-country": "United-States",
  "salary": ">=50k"
}]'
```

#### Post as CSV

```bash
curl -X POST \
  http://localhost:5000/predict \
  -H 'Content-Type: text/csv' \
  -d 'age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
49, Private,101320, Assoc-acdm,12.0, Married-civ-spouse,, Wife, White, Female,0,1902,40, United-States,>=50k'
```