### Set up your TPU environment

In this section, you perform the following tasks:

*   Set up a Colab TPU running environment
*   Verify that you are connected to a TPU device
*   Upload your credentials to TPU to access your GCS bucket.

In [0]:
import os
import tensorflow as tf
import pprint
import json

In [0]:
tf.test.is_built_with_cuda()

False

In [0]:
tf.test.is_gpu_available()

False

In [0]:
if 'COLAB_TPU_ADDR' in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
  TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print('TPU address is', TPU_ADDRESS)

  from google.colab import auth
  auth.authenticate_user()
  with tf.Session(TPU_ADDRESS) as session:
    print('TPU devices:')
    pprint.pprint(session.list_devices())

    # Upload credentials to TPU.
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
    # Now credentials are set for all future sessions on this TPU.
else:
  from google.colab import auth
  auth.authenticate_user()

ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!
TPU address is grpc://10.114.53.234:8470
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 15369536310085832916),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 14923586399794998255),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 15375354203232184094),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 6678090521930828593),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/devi

In [0]:
import sys
!test -d bert_repo || git clone https://github.com/lapolonio/text_classification_tutorial bert_repo
if not 'bert_repo' in sys.path:
  sys.path += ['bert_repo/step_3/bert']

Cloning into 'bert_repo'...
remote: Enumerating objects: 207, done.[K
remote: Counting objects: 100% (207/207), done.[K
remote: Compressing objects: 100% (115/115), done.[K
remote: Total 207 (delta 102), reused 182 (delta 78), pack-reused 0[K
Receiving objects: 100% (207/207), 406.72 KiB | 4.37 MiB/s, done.
Resolving deltas: 100% (102/102), done.


## Specify Ouput Location

In [0]:
EXP_LOC="gs://tfw-text-classification/imdb_v1"

## Evaluate the task on BERT Base

In [0]:
%%bash -s "$TPU_ADDRESS" "$EXP_LOC"


export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=$1
export OUTPUT_DIR=$2/base_output
export EXPORT_DIR=$2/export

time python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_eval=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz





W1029 17:10:14.529710 140078959302528 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W1029 17:10:14.530269 140078959302528 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W1029 17:10:14.530759 140078959302528 module_wrapper.py:139] From /content/bert_repo/step_3/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1029 17:10:43.537277 1400

## Train, Evaluate, Save Predictions, Export

In [0]:
%%bash -s "$TPU_ADDRESS" "$EXP_LOC"

export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=$1
export OUTPUT_DIR=$2/output/
export EXPORT_DIR=$2/export/

time python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_train=true \
  --do_eval=true \
  --do_predict=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --do_serve=true \
  --export_dir=$EXPORT_DIR




W1029 17:17:31.503656 140251690563456 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W1029 17:17:31.504139 140251690563456 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W1029 17:17:31.504595 140251690563456 module_wrapper.py:139] From /content/bert_repo/step_3/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1029 17:18:13.317670 1402

## Print Evaluation

In [0]:
!gsutil cat {EXP_LOC}/output/eval_results.txt

auc = 0.89
eval_accuracy = 0.89
eval_loss = 0.58286726
f1_score = 0.8911666
false_negatives = 1241.0
false_positives = 1509.0
global_step = 2343
loss = 0.5831143
precision = 0.8818139
recall = 0.90072
true_negatives = 10991.0
true_positives = 11259.0


## Get Dev Examples into dataframe

In [0]:
from run_classifier import ImdbProcessor
processor = ImdbProcessor()
test_set = processor.get_dev_examples("")

import pandas as pd
dev_df = pd.DataFrame.from_records([s.__dict__ for s in test_set])
dev_df.head()





Unnamed: 0,guid,text_a,text_b,label
0,3992,That's right. Ohwon (the painter and the main ...,,positive
1,5251,What. Uh...<br /><br />This movie is so dissoc...,,negative
2,1448,"Frank Sinatra took this role, chewed it up wit...",,positive
3,6445,It is fitting that the title character in Sydn...,,negative
4,578,Divorced single mom in picturesque seaside tow...,,negative


## Get saved predictions and read into dataframe

In [0]:
!gsutil cp {EXP_LOC}/output/test_results.tsv .

labels = processor.get_labels()
test = pd.read_csv("test_results.tsv",
                   sep="\t",
                   header=None,
                   index_col=None,
                   names=labels)
test.head()

Copying gs://tfw-text-classification/imdb_v1/output/test_results.tsv...
/ [1 files][574.3 KiB/574.3 KiB]                                                
Operation completed over 1 objects/574.3 KiB.                                    


Unnamed: 0,negative,positive
0,0.00037,0.99963
1,0.999588,0.000412
2,0.003172,0.996828
3,0.999599,0.000401
4,0.999596,0.000404


## Combine Examples and Predictions

In [0]:
dev_df['pred'] = test.idxmax(axis=1)
dev_df['correct'] = dev_df.label == dev_df.pred
dev_df['pred_confidence'] = test.max(axis=1)
dev_df.head(20)

Unnamed: 0,guid,text_a,text_b,label,pred,correct,pred_confidence
0,3992,That's right. Ohwon (the painter and the main ...,,positive,positive,True,0.99963
1,5251,What. Uh...<br /><br />This movie is so dissoc...,,negative,negative,True,0.999588
2,1448,"Frank Sinatra took this role, chewed it up wit...",,positive,positive,True,0.996828
3,6445,It is fitting that the title character in Sydn...,,negative,negative,True,0.999599
4,578,Divorced single mom in picturesque seaside tow...,,negative,negative,True,0.999596
5,11578,I have seen 'The Sea Within' today and I loved...,,positive,positive,True,0.999496
6,1763,Could this be one of the earliest colour films...,,positive,positive,True,0.99953
7,7895,"I watched this movie with my boyfriend, an avi...",,negative,negative,True,0.999503
8,9327,No matter how you feel about Michael Jackson h...,,positive,positive,True,0.999568
9,3209,"Hands down, the best drama/comedy show on tele...",,positive,positive,True,0.999608


## Calulate F1

In [0]:
from sklearn.metrics import f1_score
f1_score(dev_df.label, dev_df.pred, pos_label="positive")

0.8911666930504986

In [0]:
import sklearn
report = sklearn.metrics.classification_report(
        dev_df.label, dev_df.pred,
        labels=labels)

print(report)

              precision    recall  f1-score   support

    negative       0.90      0.88      0.89     12500
    positive       0.88      0.90      0.89     12500

    accuracy                           0.89     25000
   macro avg       0.89      0.89      0.89     25000
weighted avg       0.89      0.89      0.89     25000



In [0]:
!saved_model_cli show --dir gs://tfw-text-classification/imdb_v1/export/1572370167 --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['examples'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: serving_input_fn/input_example_tensor:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['probabilities'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 2)
      name: loss/Softmax:0
Method name is: tensorflow/serving/predict


In [0]:
!saved_model_cli run --dir gs://tfw-text-classification/imdb_v1/export/1572370167 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'


2019-10-29 17:36:13.025632: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-10-29 17:36:13.026081: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e5d8f11480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-10-29 17:36:13.026128: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
W1029 17:36:13.107374 140620614846336 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/saved_model_cli.py:420: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
Result for output key probabilities:
[[0.01347715 

In [0]:
!saved_model_cli run --dir gs://tfw-text-classification/imdb_v1/export/1572370167 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'


2019-10-29 17:37:23.202210: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-10-29 17:37:23.202505: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b55b857100 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-10-29 17:37:23.202575: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
W1029 17:37:23.203001 139647937017728 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/saved_model_cli.py:420: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
Result for output key probabilities:
[[0.01347719 