In this notebook, we wrap a TensorFlow model into a REST API using `BentoML`. The datasets used here are taken from [this](https://github.com/Nilabhra/kolkata_nlp_workshop_2019) repository. The notebook also takes references from [this](https://github.com/bentoml/BentoML/blob/master/examples/tf-keras-text-classification/tf-keras-text-classification.ipynb) example notebook from BentoML itself. 

In [1]:
import pandas as pd

train = pd.read_csv('https://raw.githubusercontent.com/Nilabhra/kolkata_nlp_workshop_2019/master/data/train.csv')
validation = pd.read_csv('https://raw.githubusercontent.com/Nilabhra/kolkata_nlp_workshop_2019/master/data/valid.csv')
test = pd.read_csv('https://raw.githubusercontent.com/Nilabhra/kolkata_nlp_workshop_2019/master/data/test.csv')

In [2]:
train.shape, validation.shape, test.shape

((9131, 3), (1142, 3), (1141, 3))

In [6]:
train.head()

Unnamed: 0,text,class
0,"I ordered a biryani, and the taste of the Biry...",positive
1,A nice place to hangout since it has both the ...,positive
2,This place is awesome for having lunch or dinn...,positive
3,I got shell of egg in the egg roll. as a resul...,negative
4,"Their biryani is oily, with a bit disconcertin...",negative


In [7]:
validation.head()

Unnamed: 0,text,class
0,The food was excellent with surplus quantity. ...,positive
1,This place nearer to the Gitanjali metro stati...,positive
2,Ordered for Aloo tikki with choley just now @0...,negative
3,Hatari is one of those restaurants that our fa...,positive
4,Disappointing.......\nThey have altered the ta...,negative


In [8]:
test.head()

Unnamed: 0,text,class
0,This place is amazing. I think the best place ...,positive
1,This place has been on my list for quite some ...,positive
2,What a wonderful cold winter evening it was. M...,positive
3,BabBQ had always been a personal favorite when...,positive
4,Know for its Deep Dish Pizza this place is sur...,negative


In [9]:
train['text'].loc[0]

'I ordered a biryani, and the taste of the Biryani was beyond my expectations and the quantity was also enough comparatively to the price!\nReally nice much appreciable'

### Removing digits for the text

In [2]:
from string import digits

def remove_digits(s):
    remove_digits = str.maketrans('', '', digits)
    res = s.translate(remove_digits)
    return res

In [3]:
train['text'] = train['text'].apply(remove_digits)
validation['text'] = validation['text'].apply(remove_digits)

### Bag of words representation 

In [4]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(stop_words=None, lowercase=True,
                             ngram_range=(1, 1), min_df=2, binary=True)

train_features = vectorizer.fit_transform(train['text'])
train_labels = train['class']

valid_features = vectorizer.transform(validation['text'])
valid_labels = validation['class']

## Label encode the classes

In [5]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
train_labels = le.fit_transform(train_labels)
valid_labels = le.transform(valid_labels)

### Model building and compilation

In [6]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dropout, Dense

In [7]:
model = keras.Sequential()

model.add(Dropout(rate=0.2, input_shape=train_features.shape[1:]))
for _ in range(2):
        model.add(Dense(units=64, activation='relu'))
        model.add(Dropout(rate=0.2))
model.add(Dense(units=1, activation='sigmoid'))

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Colocations handled automatically by placer.


In [8]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['acc'])

In [9]:
# Define an EarlyStopping callback
es_cb = keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

### We are ready to train the model and validate

In [10]:
model.fit(train_features,
                    train_labels,
                    epochs=15,
                    batch_size=512,
                    validation_data=(valid_features, valid_labels),
                    callbacks=[es_cb],
                    verbose=1)

Train on 9131 samples, validate on 1142 samples
Instructions for updating:
Use tf.cast instead.
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15


<tensorflow.python.keras.callbacks.History at 0x129f3d1d0>

### How good is the model? 

In [11]:
test['text'] = test['text'].apply(remove_digits)
test_features = vectorizer.transform(test['text'])
test_labels = le.transform(test['class'])

In [12]:
results = model.evaluate(test_features, test_labels)
print("Accuracy: {0:.2f}%".format(results[1]*100.))

Accuracy: 79.58%


### Combining the training and validation sets and retraining the model

In [13]:
data = pd.concat((train, validation), axis=0)

vectorizer = CountVectorizer(stop_words=None, lowercase=True,
                             ngram_range=(1, 1), min_df=2)

features = vectorizer.fit_transform(data['text'])
labels = le.fit_transform(data['class'])

test_features = vectorizer.transform(test['text'])
test_labels = le.transform(test['class'])

In [14]:
model = keras.Sequential()

model.add(Dropout(rate=0.2, input_shape=features.shape[1:]))
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(rate=0.2))
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(rate=0.2))
model.add(Dense(units=1, activation='sigmoid'))

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['acc'])

In [15]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['acc'])

In [16]:
model.fit(features,
                    labels,
                    epochs=15,
                    batch_size=512,
                    validation_data=(test_features, test_labels),
                    callbacks=[es_cb],
                    verbose=1)

Train on 10273 samples, validate on 1141 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15


<tensorflow.python.keras.callbacks.History at 0x12f2285c0>

> We will use this model for serving. 

### Inferencing on a single test sample

In [17]:
test_sample = vectorizer.transform([remove_digits('I had a very bad experience you know.')])
le.inverse_transform(model.predict_classes(test_sample))

  y = column_or_1d(y, warn=True)


array(['negative'], dtype=object)

### Model saving and serving just got easier

In [27]:
%%writefile text_classification_service.py
import pandas as pd
from tensorflow import keras
import tensorflow as tf
from string import digits
from bentoml import api, env, BentoService, artifacts
from bentoml.artifact import TfKerasModelArtifact, PickleArtifact
from bentoml.handlers import JsonHandler

@artifacts([
  TfKerasModelArtifact('model'),
  PickleArtifact('vectorizer')
])
@env(conda_dependencies=['tensorflow', 'scikit-learn'])
class TextClassificationService(BentoService):
    
    @api(JsonHandler)
    def predict(self, parsed_json):
        text = parsed_json['text']
        remove_digits = str.maketrans('', '', digits)
        text = text.translate(remove_digits)
        text = self.artifacts.vectorizer.transform([text])
        prediction =  self.artifacts.model.predict_classes(text)[0][0]
        if prediction==0:
            response = {'Sentiment': 'Negative'}
        elif prediction==1:
            response = {'Sentiment': 'Positive'}
        
        return response

Overwriting text_classification_service.py


In [28]:
from text_classification_service import TextClassificationService

# Construct the vectorizer once again for the artifact
vectorizer = CountVectorizer(stop_words=None, lowercase=True,
                             ngram_range=(1, 1), min_df=2)

train = pd.read_csv('https://raw.githubusercontent.com/Nilabhra/kolkata_nlp_workshop_2019/master/data/train.csv')
validation = pd.read_csv('https://raw.githubusercontent.com/Nilabhra/kolkata_nlp_workshop_2019/master/data/valid.csv')

def remove_digits(s):
    remove_digits = str.maketrans('', '', digits)
    res = s.translate(remove_digits)
    return res

train['text'] = train['text'].apply(remove_digits)
validation['text'] = validation['text'].apply(remove_digits)

data = pd.concat((train, validation), axis=0)

vectorizer.fit_transform(data['text'])

features = vectorizer.fit_transform(data['text'])

# Save and serve
svc = TextClassificationService.pack(model=model, vectorizer=vectorizer)
saved_path = svc.save('./text_classification')
print(saved_path)
!ls {saved_path}

./text_classification/TextClassificationService/0.0.2019_04_22_e5d9f7d0
Dockerfile                [34mTextClassificationService[m[m requirements.txt
MANIFEST.in               bentoml.yml               setup.py
README.md                 environment.yml


In [15]:
svc.predict({"text": "I had a wonderful experience eating their chicken noodles! Also loved the ambience."})

{'Sentiment': array([[1]], dtype=int32)}

> Correct! 

In [26]:
!rm -R {saved_path}