# Text Classification using TensorHub

This example covers list of below things:

1. How to import libraries?
2. Load and Split the datasets from TensorFlow Datasets
3. Build the model and Apply Pre-trained Embeddings from TensorHub
4. Loss function and optimization
5. Evaluate the model
6. Save the model - (SavedModel & HDF5)
7. Load the saved model - (SavedModel & HDF5)

### Datasets

* The IMDB dataset is available on [imdb reviews](https://www.tensorflow.org/datasets/catalog/imdb_reviews) or on TensorFlow datasets.

## Import relevant libraries, frameworks etc.

In [74]:
!pip install -q tensorflow
!pip install -q tfds-nightly
!pip install -q tensorflow-hub

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense

print("Tensorflow version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("TensorHub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")

Tensorflow version:  2.1.0
Eager mode:  True
TensorHub version:  0.7.0
GPU is NOT AVAILABLE


## Load and Split the datasets

* Downloads the IMDB dataset to your machine (Note: Uses a cached copy if you've already downloaded it).
* Split the training set into 60% and 40%, so we'll end up with:
    * ~15,000 examples for **Training**
    * ~10,000 examples for **Validation**
    * ~25,000 examples for **Testing**
    
References:

* https://www.tensorflow.org/datasets/api_docs/python/tfds/load

In [75]:
train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews", 
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True
)

Understand the format of the data and print 1st 2 examples:

* Each example is a sentence representing the movie review and a corresponding label.
* The sentence is not preprocessed in any way.
* The label is an integer value of either 0 or 1, where 0 is a negative review, and 1 is a positive review.

In [76]:
train_examples_batch, train_labels_batch = next(iter(train_data.batch(2)))
print(train_examples_batch)
print('\n')
print(train_labels_batch)

tf.Tensor(
[b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it."
 b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot de

## Build the model

The neural network is created by stacking layers. It requires 3 main architectural decisions:

* How to represent the text?
* How many layers to use in the model?
* How many hidden units to use for each layer?

Here the input data consists of sentences. The labels to predict are either 0 or 1.

* **One way to represent the text is to convert sentences into embeddings vectors.**

### Use a pre-trained text embedding

We can use a pre-trained text embedding as the first layer, which will have 3 advantages:

* We don't have to worry about text preprocessing.
* We can benefit from transfer learning.
* The embedding has a fixed size, so it's simpler to process.

Here we will use a pre-trained text embedding model from **TensorFlow Hub** called [google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1)

There are 3 other pre-trained models to test:

1. [google/tf2-preview/gnews-swivel-20dim-with-oov/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1) - same as [google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1), but with 2.5% vocabulary converted to OOV buckets. This can help if vocabulary of the task and vocabulary of the model don't fully overlap.

2. [google/tf2-preview/nnlm-en-dim50/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1) - A much larger model with ~1M vocabulary size and 50 dimensions.

3. [google/tf2-preview/nnlm-en-dim128/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1) - Even larger model with ~1M vocabulary size and 128 dimensions.

Let's first create a Keras layer that uses a TensorFlow Hub model to embed the sentences, and try it out on a couple of input examples.

Notes:

* No matter the length of the input text, the output shape of the embeddings is: **(num_examples, embedding_dimension)**.

In [77]:
preTrainedEmbedding1 = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
preTrainedEmbedding2 = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1"
preTrainedEmbedding3 = "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1"

embedding = preTrainedEmbedding1
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])

<tf.Tensor: shape=(2, 20), dtype=float32, numpy=
array([[ 1.765786  , -3.882232  ,  3.9134233 , -1.5557289 , -3.3362343 ,
        -1.7357955 , -1.9954445 ,  1.2989551 ,  5.081598  , -1.1041286 ,
        -2.0503852 , -0.72675157, -0.65675956,  0.24436149, -3.7208383 ,
         2.0954835 ,  2.2969332 , -2.0689783 , -2.9489717 , -1.1315987 ],
       [ 1.8804485 , -2.5852382 ,  3.4066997 ,  1.0982676 , -4.056685  ,
        -4.891284  , -2.785554  ,  1.3874227 ,  3.8476458 , -0.9256538 ,
        -1.896706  ,  1.2113281 ,  0.11474707,  0.76209456, -4.8791065 ,
         2.906149  ,  4.7087674 , -2.3652055 , -3.5015898 , -1.6390051 ]],
      dtype=float32)>

In [78]:
model = Sequential()
model.add(hub_layer)
model.add(Dense(16, activation='relu'))
model.add(Dense(1))

model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer_4 (KerasLayer)   (None, 20)                400020    
_________________________________________________________________
dense_8 (Dense)              (None, 16)                336       
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 17        
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________


The layers are stacked sequentially to build the classifier:

1. The first layer is a TensorFlow Hub layer. This layer uses a pre-trained Saved Model to map a sentence into its embedding vector. The pre-trained text embedding model that we are using (google/tf2-preview/gnews-swivel-20dim/1) splits the sentence into tokens, embeds each token and then combines the embedding. The resulting dimensions are: (num_examples, embedding_dimension).

2. This fixed-length output vector is piped through a fully-connected (Dense) layer with 16 hidden units.

3. The last layer is densely connected with a single output node. Using the relu activation function, this value is a float between 0 and 1, representing a probability, or confidence level.

## Loss function and optimizer

Let's compile the model.

* A model needs a loss function and an optimizer for training. Since this is a binary classification problem and the model outputs a probability (a single-unit layer with a relu activation), we'll use the binary_crossentropy loss function.

* This isn't the only choice for a loss function, you could, for instance, choose mean_squared_error. But, generally, binary_crossentropy is better for dealing with probabilities—it measures the "distance" between probability distributions, or in our case, between the ground-truth distribution and the predictions.

Now, configure the model to use an optimizer and a loss function:

In [79]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

Train the model for 20 epochs in mini-batches of 512 samples. This is 20 iterations over all samples in the x_train and y_train tensors. While training, monitor the model's loss and accuracy on the 10,000 samples from the validation set:

In [80]:
history = model.fit(train_data.shuffle(10000).batch(512),
                    epochs=20,
                    validation_data=validation_data.batch(512),
                    verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## Evaluate the model

Let's see how the model performs.

2 values will be returned.
1. Loss (a number which represents our error, lower values are better)
2. Accuracy.

In [81]:
results = model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(model.metrics_names, results):
  print("%s: %.3f" % (name, value))

loss: 0.324
accuracy: 0.858


## Save and load entire models

Model progress can be saved during—and after—training.

* This means a model can resume where it left off and avoid long training times.
* Saving also means you can share your model and others can recreate your work.

When publishing research models and techniques, most machine learning practitioners share:

* Code to create the model, and
* The trained weights, or parameters, for the model

Sharing this data helps others understand how the model works and try it themselves with new data.

Notes:

* Call `model.save` to save a model's `architecture`, `weights`, and `training configuration` in a single file/folder.

* This allows you to export a model so it can be used without access to the original Python code*.

* Entire model can be saved in 2 different file formats (`SavedModel` and `HDF5`).

* It is to be noted that TensorFlow SavedModel format is the default file format in TF2.x.

Usages:

* Saving a fully-functional model is very useful—you can load them in TensorFlow.js (Saved Model, HDF5) and then train and run them in web browsers or
* Convert them to run on mobile devices using TensorFlow Lite (Saved Model, HDF5).

### 1. Save the entire model as a SavedModel.

In [82]:
!mkdir -p saved_model

model.save('saved_model/tct_model')

INFO:tensorflow:Assets written to: saved_model/tct_model/assets


INFO:tensorflow:Assets written to: saved_model/tct_model/assets


The SavedModel format is a directory containing:

1. A Protobuf binary
2. A Tensorflow checkpoint

In [83]:
!ls saved_model/tct_model

[34massets[m[m         saved_model.pb [34mvariables[m[m


### Reload a fresh model from the saved model

In [84]:
new_model = load_model('saved_model/tct_model')

# Check its architecture
new_model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer_4 (KerasLayer)   (None, 20)                400020    
_________________________________________________________________
dense_8 (Dense)              (None, 16)                336       
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 17        
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________


The restored model is compiled with the same arguments as the original model. Try running evaluate and predict with the loaded model:

In [85]:
results = new_model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(new_model.metrics_names, results):
  print("%s: %.3f" % (name, value))

loss: 0.324
accuracy: 0.858


### 2. Save the entire model as a HDF5 format

* The `.h5` extension indicates that the model should be saved to HDF5.

In [86]:
!mkdir -p hdf5
model.save('hdf5/tct_model.h5')

Now, recreate the model from that file including its weights and the optimizer

In [87]:
# GitHub Issue: https://github.com/tensorflow/tensorflow/issues/26835
new_h5_model = load_model('hdf5/tct_model.h5', custom_objects={'KerasLayer':hub.KerasLayer})

# Check its architecture
new_h5_model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer_4 (KerasLayer)   (None, 20)                400020    
_________________________________________________________________
dense_8 (Dense)              (None, 16)                336       
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 17        
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________


In [88]:
results = new_model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(new_model.metrics_names, results):
  print("%s: %.3f" % (name, value))

loss: 0.324
accuracy: 0.858
