[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openlayer-ai/examples-gallery/blob/main/text-classification/tensorflow/tensorflow.ipynb)


# Text classification using Tensorflow

This notebook illustrates how Tensorflow models can be upladed to the Openlayer platform.

In [1]:
%%bash

if [ ! -e "requirements.txt" ]; then
    curl "https://raw.githubusercontent.com/openlayer-ai/examples-gallery/main/text-classification/tensorflow/requirements.txt" --output "requirements.txt"
fi

In [None]:
!pip install -r requirements.txt

## Importing the models and loading the dataset

In [1]:
import pandas as pd
import tensorflow as tf

from tensorflow import keras

In [2]:
# Constants we'll use for the dataset
MAX_WORDS = 10000
REVIEW_CLASSES = ['negative', 'positive']

# download dataset from keras.
(_X_train, _y_train), (_X_test, _y_test) = keras.datasets.imdb.load_data(num_words=MAX_WORDS)

## Helper functions and word embeddings

The original dataset contains the reviews as word indices. To make it human-readable, we need the word index dict, that maps the indices to words. 

In [3]:
# Word index dict for the IMDB dataset
tf.keras.datasets.imdb.get_word_index()

{'fawn': 34701,
 'tsukino': 52006,
 'nunnery': 52007,
 'sonja': 16816,
 'vani': 63951,
 'woods': 1408,
 'spiders': 16115,
 'hanging': 2345,
 'woody': 2289,
 'trawling': 52008,
 "hold's": 52009,
 'comically': 11307,
 'localized': 40830,
 'disobeying': 30568,
 "'royale": 52010,
 "harpo's": 40831,
 'canet': 52011,
 'aileen': 19313,
 'acurately': 52012,
 "diplomat's": 52013,
 'rickman': 25242,
 'arranged': 6746,
 'rumbustious': 52014,
 'familiarness': 52015,
 "spider'": 52016,
 'hahahah': 68804,
 "wood'": 52017,
 'transvestism': 40833,
 "hangin'": 34702,
 'bringing': 2338,
 'seamier': 40834,
 'wooded': 34703,
 'bravora': 52018,
 'grueling': 16817,
 'wooden': 1636,
 'wednesday': 16818,
 "'prix": 52019,
 'altagracia': 34704,
 'circuitry': 52020,
 'crotch': 11585,
 'busybody': 57766,
 "tart'n'tangy": 52021,
 'burgade': 14129,
 'thrace': 52023,
 "tom's": 11038,
 'snuggles': 52025,
 'francesco': 29114,
 'complainers': 52027,
 'templarios': 52125,
 '272': 40835,
 '273': 52028,
 'zaniacs': 52130,

In [4]:
# Invert the word index so that it maps words to ints, and not the other way around, like the default
word_index = tf.keras.datasets.imdb.get_word_index()

word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  
word_index["<UNUSED>"] = 3

# word_index.items  <str> to <int>
# reverse_word_index <int> to <str>
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

In [5]:
def decode_review(text):
    """Function that makes the samples human-readable"""
    return ' '.join([reverse_word_index.get(i, '#') for i in text])

In [6]:
def encode_review(text):
    """Function that converts a human-readable sentence to the list of indices format"""
    words = text.split(' ')
    ids = [word_index["<START>"]]
    for w in words:
        v = word_index.get(w, word_index["<UNK>"])
        # >1000, signed as <UNUSED>
        if v > MAX_WORDS:
            v = word_index["<UNUSED>"]
        ids.append(v)
    return ids    

In [7]:
decode_review(_X_train[0])

"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for wh

In [8]:
decode_review(_X_train[1])

"<START> big hair big boobs bad music and a giant safety pin these are the words to best describe this terrible movie i love cheesy horror movies and i've seen hundreds but this had got to be on of the worst ever made the plot is paper thin and ridiculous the acting is an abomination the script is completely laughable the best is the end showdown with the cop and how he worked out who the killer is it's just so damn terribly written the clothes are sickening and funny in equal <UNK> the hair is big lots of boobs <UNK> men wear those cut <UNK> shirts that show off their <UNK> sickening that men actually wore them and the music is just <UNK> trash that plays over and over again in almost every scene there is trashy music boobs and <UNK> taking away bodies and the gym still doesn't close for <UNK> all joking aside this is a truly bad film whose only charm is to look back on the disaster that was the 80's and have a good old laugh at how bad everything was back then"

In [9]:
X_train = keras.preprocessing.sequence.pad_sequences(_X_train,
                                                     dtype='int32',
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

X_test = keras.preprocessing.sequence.pad_sequences(_X_test,
                                                    dtype='int32',
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)


# classification. convert y to 2 dims 
y_train = tf.one_hot(_y_train, depth=2)
y_test = tf.one_hot(_y_test, depth=2)

## Define and train the model

In [10]:
# model setting
tf_model = tf.keras.Sequential([
            tf.keras.layers.Embedding(10000, 8),
            tf.keras.layers.GlobalAvgPool1D(),
            tf.keras.layers.Dense(6, activation="relu"),
            tf.keras.layers.Dense(2, activation="sigmoid"),
        ])


tf_model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
tf_model.fit(X_train, y_train, epochs=30, batch_size=512)

## Openlayer part!

### pip installing openlayer

In [None]:
!pip install openlayer

### Instantiating the client

In [12]:
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

### Create the project on the platform

In [None]:
from openlayer.tasks import TaskType


project = client.create_or_load_project(name="Text classification with Tensorflow",
                                        task_type=TaskType.TextClassification,
                                        description="Evaluating NN for text classification")

### Upload the validation set

To upload to Openlayer, we are going to put the human-readable validation set on a pandas dataframe. That where our helper functions will come in handy.

In [14]:
text_data = []

for indices in _X_test:
    special_chars = ["<PAD>", "<START>", "<UNK>", "<UNUSED>"]
    text = decode_review(indices)
    for char in special_chars:
        text = text.replace(char, "")
    text_data.append(text.strip())
    
labels = y_test.numpy().argmax(axis=1).tolist()
data_dict = {"text": text_data, "labels": labels}
df = pd.DataFrame.from_dict(data_dict).sample(frac=1, random_state=1)[:500]
df["text"] = df["text"].str[:700]

In [None]:
dataset = project.add_dataframe(
    df=df,
    class_names=['negative', 'positive'],
    label_column_name='labels',
    text_column_name='text',
    commit_message='this is my sentiment validation dataset'
)

### Uploading the model

First, it is important to create a `predict_proba` function, which is how Openlayer interacts with your model

In [20]:
def predict_proba(model, texts, word_index):
    X = [encode_review(t) for t in texts]
    X = keras.preprocessing.sequence.pad_sequences(X,
                                                   dtype="int32",
                                                   value=word_index["<PAD>"],
                                                   padding='post',
                                                   maxlen=256)
    y = model(X)
    
    return y.numpy().tolist()

Let's test the `predict_proba` function to make sure the input-output format is consistent with what Openlayer expects:

In [21]:
predict_proba(tf_model, ['this movie is funny!', 'haha loved it', 'oh, bad'], word_index)

[[0.5305119752883911, 0.49071040749549866],
 [0.40147554874420166, 0.5984136462211609],
 [0.6347380876541138, 0.4044014513492584]]

Now, we can upload the model:

In [None]:
from openlayer.models import ModelType

openlayer_model = project.add_model(
    function=predict_proba, 
    model=tf_model,
    model_type=ModelType.tensorflow,
    class_names=['negative', 'positive'],
    name='TF Sentiment Model',
    commit_message='this is my tensorflow sentiment model',
    requirements_txt_file='requirements.txt',
    word_index=word_index
)