### Set up your TPU environment

In this section, you perform the following tasks:

*   Set up a Colab TPU running environment
*   Verify that you are connected to a TPU device
*   Upload your credentials to TPU to access your GCS bucket.

In [0]:
import os
import tensorflow as tf
import pprint
import json

In [0]:
tf.test.is_built_with_cuda()

False

In [0]:
tf.test.is_gpu_available()

False

In [0]:
if 'COLAB_TPU_ADDR' in os.environ:
  print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
  TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print('TPU address is', TPU_ADDRESS)

  from google.colab import auth
  auth.authenticate_user()
  with tf.Session(TPU_ADDRESS) as session:
    print('TPU devices:')
    pprint.pprint(session.list_devices())

    # Upload credentials to TPU.
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
    # Now credentials are set for all future sessions on this TPU.
else:
  from google.colab import auth
  auth.authenticate_user()

ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!
TPU address is grpc://10.127.63.178:8470
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 622230015375567764),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 10707800242603174931),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 2111716893822499807),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 7037966218705789183),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:

In [0]:
import sys
!test -d bert_repo || git clone https://github.com/lapolonio/text_classification_tutorial bert_repo
if not 'bert_repo' in sys.path:
  sys.path += ['bert_repo/step_3/bert']

Cloning into 'bert_repo'...
remote: Enumerating objects: 121, done.[K
remote: Counting objects: 100% (121/121), done.[K
remote: Compressing objects: 100% (69/69), done.[K
remote: Total 121 (delta 48), reused 112 (delta 39), pack-reused 0[K
Receiving objects: 100% (121/121), 230.66 KiB | 3.49 MiB/s, done.
Resolving deltas: 100% (48/48), done.


## Evaluate the task on BERT Base

In [0]:
%%bash -s "$TPU_ADDRESS"


export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=$1
export OUTPUT_DIR=gs://tfw_bert_demo/imdb_v1/base_output
export EXPORT_DIR=gs://tfw_bert_demo/imdb_v1/export

time python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_eval=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz





W1021 03:51:53.964831 140130963367808 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W1021 03:51:53.965038 140130963367808 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W1021 03:51:53.965470 140130963367808 module_wrapper.py:139] From /content/bert_repo/step_3/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1021 03:52:21.718031 1401

## Specify Ouput Location

In [0]:
EXP_LOC="gs://tfw_bert_demo/imdb_v4"

## Train, Evaluate, Save Predictions, Export

In [0]:
%%bash -s "$TPU_ADDRESS" "$EXP_LOC"

export BERT_BASE_DIR=gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12
export IMDB_DIR=NOT_USED
export TPU_NAME=$1
export OUTPUT_DIR=$2/output/
export EXPORT_DIR=$2/export/

time python bert_repo/step_3/bert/run_classifier.py \
  --task_name=IMDB \
  --do_train=true \
  --do_eval=true \
  --do_predict=true \
  --data_dir=$IMDB_DIR \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=$OUTPUT_DIR \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --do_serve=true \
  --export_dir=$EXPORT_DIR

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz





W1021 05:28:52.396408 140008509339520 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W1021 05:28:52.396672 140008509339520 module_wrapper.py:139] From bert_repo/step_3/bert/run_classifier.py:895: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W1021 05:28:52.397113 140008509339520 module_wrapper.py:139] From /content/bert_repo/step_3/bert/modeling.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1021 05:29:21.652381 1400

## Print Evaluation

In [0]:
!gsutil cat {EXP_LOC}/output/eval_results.txt

auc = 0.89076
eval_accuracy = 0.89076
eval_loss = 0.5659401
f1_score = 0.8919998
false_negatives = 1222.0
false_positives = 1509.0
global_step = 2343
loss = 0.5943107
precision = 0.88198954
recall = 0.90224
true_negatives = 10991.0
true_positives = 11278.0


## Get Dev Examples into dataframe

In [0]:
from run_classifier import ImdbProcessor
processor = ImdbProcessor()
test_set = processor.get_dev_examples("")

import pandas as pd
dev_df = pd.DataFrame.from_records([s.__dict__ for s in test_set])
dev_df.head()

## Get saved predictions and read into dataframe

In [0]:
!gsutil cp {EXP_LOC}/output/test_results.tsv .

labels = processor.get_labels()
test = pd.read_csv("test_results.tsv",
                   sep="\t",
                   header=None,
                   index_col=None,
                   names=labels)
test.head()

## Combine Examples and Predictions

In [0]:
dev_df['pred'] = test.idxmax(axis=1)
dev_df['correct'] = dev_df.label == dev_df.pred
dev_df['pred_confidence'] = test.max(axis=1)
dev_df.head(20)

## Calulate F1

In [0]:
from sklearn.metrics import f1_score
f1_score(dev_df.label, dev_df.pred, pos_label="positive")

In [0]:
report = sklearn.metrics.classification_report(
        test_df.label, test_df.pred_label,
        labels=labels)

print(report)