<a href="https://colab.research.google.com/github/sahug/ds-tensorflow-colab/blob/master/Tensorflow%20-%20Text%20Classification%20using%20Tensorflow%20Hub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Tensorflow - Text Classification using Tensorflow Hub**

The tutorial demonstrates the basic application of transfer learning with TensorFlow Hub and Keras.

In transfer learning we use the knowledge from the previous model to build next model. In here we will use embedding from hub to build a Text Classification Model.

In [1]:
%pip install -q tensorflow-hub
%pip install -q tensorflow-datasets

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\sahug\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\sahug\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [2]:
import os
import numpy as np

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")

Version:  2.8.0
Eager mode:  True
Hub version:  0.12.0
GPU is NOT AVAILABLE


**Download and Exlpore Data**

In [4]:
# Split the training set into 60% and 40% to end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews",
    split=("train[:60%]", "train[60%:]", "test"),
    as_supervised=True
)

[1mDownloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\sahug\tensorflow_datasets\imdb_reviews\plain_text\1.0.0...[0m


Dl Size...: 100%|██████████| 80/80 [00:37<00:00,  2.11 MiB/s]rl]
Dl Completed...: 100%|██████████| 1/1 [00:37<00:00, 37.99s/ url]
                                                                        

[1mDataset imdb_reviews downloaded and prepared to C:\Users\sahug\tensorflow_datasets\imdb_reviews\plain_text\1.0.0. Subsequent calls will reuse this data.[0m


In [5]:
train_example_batch, train_labels_batch =  next(iter(train_data.batch(10)))

**Build the Model**

The neural network is created by stacking layers—this requires three main architectural decisions:

- How to represent the text?
- How many layers to use in the model?
- How many hidden units to use for each layer?

Let's first create a **Keras** layer that uses a **TensorFlow Hub model** to embed the sentences, and try it out on a couple of input examples. Note that no matter the length of the input text, the output shape of the embeddings is: **(num_examples, embedding_dimension)**.

**Embedding**

In [6]:
embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[], dtype=tf.string, trainable=True)
hub_layer(train_example_batch[:2])

<tf.Tensor: shape=(2, 50), dtype=float32, numpy=
array([[ 0.5423195 , -0.0119017 ,  0.06337538,  0.06862972, -0.16776837,
        -0.10581174,  0.16865303, -0.04998824, -0.31148055,  0.07910346,
         0.15442263,  0.01488662,  0.03930153,  0.19772711, -0.12215476,
        -0.04120981, -0.2704109 , -0.21922152,  0.26517662, -0.80739075,
         0.25833532, -0.3100421 ,  0.28683215,  0.1943387 , -0.29036492,
         0.03862849, -0.7844411 , -0.0479324 ,  0.4110299 , -0.36388892,
        -0.58034706,  0.30269456,  0.3630897 , -0.15227164, -0.44391504,
         0.19462997,  0.19528408,  0.05666234,  0.2890704 , -0.28468323,
        -0.00531206,  0.0571938 , -0.3201318 , -0.04418665, -0.08550783,
        -0.55847436, -0.23336391, -0.20782952, -0.03543064, -0.17533456],
       [ 0.56338924, -0.12339553, -0.10862679,  0.7753425 , -0.07667089,
        -0.15752277,  0.01872335, -0.08169781, -0.3521876 ,  0.4637341 ,
        -0.08492756,  0.07166859, -0.00670817,  0.12686075, -0.19326553,
 

**Model**

In [7]:
from keras import activations
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation=activations.relu))
model.add(tf.keras.layers.Dense(1))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 keras_layer (KerasLayer)    (None, 50)                48190600  
                                                                 
 dense (Dense)               (None, 16)                816       
                                                                 
 dense_1 (Dense)             (None, 1)                 17        
                                                                 
Total params: 48,191,433
Trainable params: 48,191,433
Non-trainable params: 0
_________________________________________________________________


**Loss Function and Optimizer**

In [10]:
from tensorflow import keras
model.compile(optimizer=tf.keras.optimizers.Adam(), 
                loss=keras.losses.BinaryCrossentropy(),
                metrics=["accuracy"])

**Train**

In [11]:
history = model.fit(train_data.shuffle(10000).batch(512), 
                    epochs=10, 
                    validation_data=validation_data.batch(512), 
                    verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


**Evaluate**

In [12]:
result = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, result):
  print("%s: %.3f" % (name, value))

49/49 - 3s - loss: 0.5808 - accuracy: 0.8055 - 3s/epoch - 67ms/step
loss: 0.581
accuracy: 0.805
