### Set up your TPU environment

In this section, you perform the following tasks:

*   Set up a Colab TPU running environment
*   Verify that you are connected to a TPU device
*   Upload your credentials to TPU to access your GCS bucket.

In [0]:
import os
import tensorflow as tf
import pprint
import json

In [3]:
tf.test.is_built_with_cuda()

False

In [4]:
tf.test.is_gpu_available()

False

In [5]:
if 'COLAB_TPU_ADDR' in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
  TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print('TPU address is', TPU_ADDRESS)

  from google.colab import auth
  auth.authenticate_user()
  with tf.Session(TPU_ADDRESS) as session:
    print('TPU devices:')
    pprint.pprint(session.list_devices())

    # Upload credentials to TPU.
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
    # Now credentials are set for all future sessions on this TPU.
else:
  from google.colab import auth
  auth.authenticate_user()

ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!
TPU address is grpc://10.29.196.82:8470
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 4254711938935201463),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 13732597338811708297),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 5113040703751758512),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 2538362841378245928),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:

In [6]:
import sys
!test -d bert_repo || git clone https://github.com/lapolonio/text_classification_tutorial bert_repo
if not 'bert_repo' in sys.path:
  sys.path += ['bert_repo/step_3/bert']

Cloning into 'bert_repo'...
remote: Enumerating objects: 167, done.[K
remote: Counting objects: 100% (167/167), done.[K
remote: Compressing objects: 100% (97/97), done.[K
remote: Total 167 (delta 73), reused 150 (delta 56), pack-reused 0[K
Receiving objects: 100% (167/167), 322.72 KiB | 4.54 MiB/s, done.
Resolving deltas: 100% (73/73), done.


## Specify Ouput Location

In [0]:
EXP_LOC="gs://another-test-123rfae/imdb_v1"

## Evaluate the task on BERT Base

In [7]:
%%bash -s "$TPU_ADDRESS" "$EXP_LOC"


export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=$1
export OUTPUT_DIR=$2/base_output
export EXPORT_DIR=$2/export

time python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_eval=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz





W1029 12:39:30.634656 140257163392896 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W1029 12:39:30.634944 140257163392896 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W1029 12:39:30.635475 140257163392896 module_wrapper.py:139] From /content/bert_repo/step_3/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1029 12:40:03.736118 1402

## Train, Evaluate, Save Predictions, Export

In [11]:
%%bash -s "$TPU_ADDRESS" "$EXP_LOC"

export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=$1
export OUTPUT_DIR=$2/output/
export EXPORT_DIR=$2/export/

time python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_train=true \
  --do_eval=true \
  --do_predict=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --do_serve=true \
  --export_dir=$EXPORT_DIR

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz





W1029 12:42:42.767949 139872277211008 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W1029 12:42:42.768530 139872277211008 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W1029 12:42:42.769013 139872277211008 module_wrapper.py:139] From /content/bert_repo/step_3/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1029 12:43:13.678692 1398

## Print Evaluation

In [12]:
!gsutil cat {EXP_LOC}/output/eval_results.txt

auc = 0.88943994
eval_accuracy = 0.88944
eval_loss = 0.5878545
f1_score = 0.8902826
false_negatives = 1286.0
false_positives = 1478.0
global_step = 2343
loss = 0.5582262
precision = 0.8835487
recall = 0.89712
true_negatives = 11022.0
true_positives = 11214.0


## Get Dev Examples into dataframe

In [13]:
from run_classifier import ImdbProcessor
processor = ImdbProcessor()
test_set = processor.get_dev_examples("")

import pandas as pd
dev_df = pd.DataFrame.from_records([s.__dict__ for s in test_set])
dev_df.head()





Unnamed: 0,guid,text_a,text_b,label
0,6305,"Maybe my rating should have been a 9, but the ...",,positive
1,219,This movie is one of the worst movies I have e...,,negative
2,1382,Got the chance to see this at a friend's house...,,positive
3,7548,The premise of Bottom crossed with Fawlty Towe...,,negative
4,8186,If Jacqueline McKenzie and John Lynch weren't ...,,negative


## Get saved predictions and read into dataframe

In [14]:
!gsutil cp {EXP_LOC}/output/test_results.tsv .

labels = processor.get_labels()
test = pd.read_csv("test_results.tsv",
                   sep="\t",
                   header=None,
                   index_col=None,
                   names=labels)
test.head()

Copying gs://another-test-123rfae/imdb_v1/output/test_results.tsv...
/ [1 files][572.9 KiB/572.9 KiB]                                                
Operation completed over 1 objects/572.9 KiB.                                    


Unnamed: 0,negative,positive
0,0.000449,0.999551
1,0.99959,0.00041
2,0.990363,0.009637
3,0.999421,0.000579
4,0.999529,0.000471


## Combine Examples and Predictions

In [15]:
dev_df['pred'] = test.idxmax(axis=1)
dev_df['correct'] = dev_df.label == dev_df.pred
dev_df['pred_confidence'] = test.max(axis=1)
dev_df.head(20)

Unnamed: 0,guid,text_a,text_b,label,pred,correct,pred_confidence
0,6305,"Maybe my rating should have been a 9, but the ...",,positive,positive,True,0.999551
1,219,This movie is one of the worst movies I have e...,,negative,negative,True,0.99959
2,1382,Got the chance to see this at a friend's house...,,positive,negative,False,0.990363
3,7548,The premise of Bottom crossed with Fawlty Towe...,,negative,negative,True,0.999421
4,8186,If Jacqueline McKenzie and John Lynch weren't ...,,negative,negative,True,0.999529
5,5238,Sweeping and still impressive early Talkie Wes...,,positive,positive,True,0.999359
6,11727,This was one of those films I would always com...,,positive,positive,True,0.998996
7,509,I watched the 219 minute version and have to s...,,negative,negative,True,0.999612
8,6632,Shame represents a high point in the career of...,,positive,positive,True,0.998933
9,11734,The above profile was written by me when I use...,,positive,positive,True,0.997996


## Calulate F1

In [16]:
from sklearn.metrics import f1_score
f1_score(dev_df.label, dev_df.pred, pos_label="positive")

0.8902826294061607

In [20]:
import sklearn
report = sklearn.metrics.classification_report(
        dev_df.label, dev_df.pred,
        labels=labels)

print(report)

              precision    recall  f1-score   support

    negative       0.90      0.88      0.89     12500
    positive       0.88      0.90      0.89     12500

    accuracy                           0.89     25000
   macro avg       0.89      0.89      0.89     25000
weighted avg       0.89      0.89      0.89     25000



In [21]:
!saved_model_cli show --dir gs://another-test-123rfae/imdb_v1/export/1572353664 --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['examples'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: serving_input_fn/input_example_tensor:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['probabilities'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 2)
      name: loss/Softmax:0
Method name is: tensorflow/serving/predict


In [23]:
!saved_model_cli run --dir gs://another-test-123rfae/imdb_v1/export/1572353664 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'


2019-10-29 13:01:52.419762: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-10-29 13:01:52.420239: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f19931480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-10-29 13:01:52.420294: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
W1029 13:01:52.489428 140193349678976 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/saved_model_cli.py:420: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
Result for output key probabilities:
[[0.06587327 

In [24]:
!saved_model_cli run --dir gs://another-test-123rfae/imdb_v1/export/1572353664 --tag_set serve --signature_def serving_default \
--input_examples 'examples=[{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()},{"input_ids":np.zeros((128), dtype=int).tolist(),"input_mask":np.zeros((128), dtype=int).tolist(),"label_ids":[0],"segment_ids":np.zeros((128), dtype=int).tolist()}]'


2019-10-29 13:02:52.966622: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-10-29 13:02:52.966954: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559a2e847480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-10-29 13:02:52.966999: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
W1029 13:02:52.967557 140059193669504 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/saved_model_cli.py:420: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
Result for output key probabilities:
[[0.0658721 0