#Predicting Movie Review Sentiment with BERT on TF Hub

If you’ve been following Natural Language Processing over the past year, you’ve probably heard of BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

Now that BERT's been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. In an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. Alternatively, [finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a model to predict whether an IMDB movie review is positive or negative using BERT in Tensorflow with tf hub. Some code was adapted from [this colab notebook](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb). Let's get started!

In [0]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [0]:
!pip install bert-tensorflow



In [0]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [0]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'bert_model'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = False #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = 'BUCKET_NAME' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: bert_model *****


#Data

First, let's download the dataset, hosted by Stanford. The code below, which downloads, extracts, and imports the IMDB Large Movie Review Dataset, is borrowed from [this Tensorflow tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub).

In [0]:
from tensorflow import keras
import os
import re

# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
  data = {}
  data["sentence"] = []
  data["sentiment"] = []
  for file_path in os.listdir(directory):
    with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
      data["sentence"].append(f.read())
      data["sentiment"].append(re.match("\d+_(\d+)\.txt", file_path).group(1))
  return pd.DataFrame.from_dict(data)

# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
  pos_df = load_directory_data(os.path.join(directory, "pos"))
  neg_df = load_directory_data(os.path.join(directory, "neg"))
  pos_df["polarity"] = 1
  neg_df["polarity"] = 0
  return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# Download and process the dataset files.
def download_and_load_datasets(force_download=False):
  dataset = tf.keras.utils.get_file(
      fname="aclImdb.tar.gz", 
      origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz", 
      extract=True)
  
  train_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                       "aclImdb", "train"))
  test_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                      "aclImdb", "test"))
  
  return train_df, test_df


In [0]:
train, test = download_and_load_datasets()

To keep training fast, we'll take a sample of 5000 train and test examples, respectively.

In [0]:
train = train.sample(5000)
test = test.sample(5000)

In [0]:
train.columns

Index(['sentence', 'sentiment', 'polarity'], dtype='object')

For us, our input data is the 'sentence' column and our label is the 'polarity' column (0, 1 for negative and positive, respecitvely)

In [0]:
DATA_COLUMN = 'sentence'
LABEL_COLUMN = 'polarity'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = [0, 1]

In [0]:
label_list

[0, 1]

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [0]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [0]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0423 01:40:15.903031 139939410937728 saver.py:1483] Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [0]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [0]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

INFO:tensorflow:Writing example 0 of 5000


I0423 01:40:23.299641 139939410937728 run_classifier.py:774] Writing example 0 of 5000


INFO:tensorflow:*** Example ***


I0423 01:40:23.323042 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:23.325448 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] i couldn ' t help but feel that this could have been a bigger movie than it was . the screenplay is highly intelligent and it just seemed that it could have been opened up in a way more reminiscent of seven . not by changing the story - i think mainly through the cinematography . the cinematography was the only thing that i found to be holding back the film . on the other hand , the pacing was absolutely on point . whoever worked on the editing really did their job well . and i thought bill paxton did a great job of directing . now away from the technical stuff . . . < br / > < br / > this movie [SEP]


I0423 01:40:23.327732 139939410937728 run_classifier.py:464] tokens: [CLS] i couldn ' t help but feel that this could have been a bigger movie than it was . the screenplay is highly intelligent and it just seemed that it could have been opened up in a way more reminiscent of seven . not by changing the story - i think mainly through the cinematography . the cinematography was the only thing that i found to be holding back the film . on the other hand , the pacing was absolutely on point . whoever worked on the editing really did their job well . and i thought bill paxton did a great job of directing . now away from the technical stuff . . . < br / > < br / > this movie [SEP]


INFO:tensorflow:input_ids: 101 1045 2481 1005 1056 2393 2021 2514 2008 2023 2071 2031 2042 1037 7046 3185 2084 2009 2001 1012 1996 9000 2003 3811 9414 1998 2009 2074 2790 2008 2009 2071 2031 2042 2441 2039 1999 1037 2126 2062 14563 1997 2698 1012 2025 2011 5278 1996 2466 1011 1045 2228 3701 2083 1996 16434 1012 1996 16434 2001 1996 2069 2518 2008 1045 2179 2000 2022 3173 2067 1996 2143 1012 2006 1996 2060 2192 1010 1996 15732 2001 7078 2006 2391 1012 9444 2499 2006 1996 9260 2428 2106 2037 3105 2092 1012 1998 1045 2245 3021 27765 2106 1037 2307 3105 1997 9855 1012 2085 2185 2013 1996 4087 4933 1012 1012 1012 1026 7987 1013 1028 1026 7987 1013 1028 2023 3185 102


I0423 01:40:23.329967 139939410937728 run_classifier.py:465] input_ids: 101 1045 2481 1005 1056 2393 2021 2514 2008 2023 2071 2031 2042 1037 7046 3185 2084 2009 2001 1012 1996 9000 2003 3811 9414 1998 2009 2074 2790 2008 2009 2071 2031 2042 2441 2039 1999 1037 2126 2062 14563 1997 2698 1012 2025 2011 5278 1996 2466 1011 1045 2228 3701 2083 1996 16434 1012 1996 16434 2001 1996 2069 2518 2008 1045 2179 2000 2022 3173 2067 1996 2143 1012 2006 1996 2060 2192 1010 1996 15732 2001 7078 2006 2391 1012 9444 2499 2006 1996 9260 2428 2106 2037 3105 2092 1012 1998 1045 2245 3021 27765 2106 1037 2307 3105 1997 9855 1012 2085 2185 2013 1996 4087 4933 1012 1012 1012 1026 7987 1013 1028 1026 7987 1013 1028 2023 3185 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:23.333697 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:23.335956 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0423 01:40:23.340312 139939410937728 run_classifier.py:468] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0423 01:40:23.365459 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:23.367729 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] this worldwide was the cheap man ' s version of what the nwa under jim cr ##ock ##ett junior and jim cr ##ock ##ett promotions made back in the 1980s on the localized " big 3 " stations during the saturday morning / afternoon wrestling cr ##az ##e . when ted turner got his hands on cr ##ock ##ett ' s failed version of nwa he turned it into world championship wrestling and proceeded to drop all nwa references all together . nwa world wide and nwa pro wrestling were re ##lab ##ele ##d with the wcw logo and moved off the road to disney / mgm studios in orlando , florida and eventually became nothing more than rec ##ap shows for wcw ' s ni [SEP]


I0423 01:40:23.369718 139939410937728 run_classifier.py:464] tokens: [CLS] this worldwide was the cheap man ' s version of what the nwa under jim cr ##ock ##ett junior and jim cr ##ock ##ett promotions made back in the 1980s on the localized " big 3 " stations during the saturday morning / afternoon wrestling cr ##az ##e . when ted turner got his hands on cr ##ock ##ett ' s failed version of nwa he turned it into world championship wrestling and proceeded to drop all nwa references all together . nwa world wide and nwa pro wrestling were re ##lab ##ele ##d with the wcw logo and moved off the road to disney / mgm studios in orlando , florida and eventually became nothing more than rec ##ap shows for wcw ' s ni [SEP]


INFO:tensorflow:input_ids: 101 2023 4969 2001 1996 10036 2158 1005 1055 2544 1997 2054 1996 15737 2104 3958 13675 7432 6582 3502 1998 3958 13675 7432 6582 15365 2081 2067 1999 1996 3865 2006 1996 22574 1000 2502 1017 1000 3703 2076 1996 5095 2851 1013 5027 4843 13675 10936 2063 1012 2043 6945 6769 2288 2010 2398 2006 13675 7432 6582 1005 1055 3478 2544 1997 15737 2002 2357 2009 2046 2088 2528 4843 1998 8979 2000 4530 2035 15737 7604 2035 2362 1012 15737 2088 2898 1998 15737 4013 4843 2020 2128 20470 12260 2094 2007 1996 24215 8154 1998 2333 2125 1996 2346 2000 6373 1013 15418 4835 1999 10108 1010 3516 1998 2776 2150 2498 2062 2084 28667 9331 3065 2005 24215 1005 1055 9152 102


I0423 01:40:23.371966 139939410937728 run_classifier.py:465] input_ids: 101 2023 4969 2001 1996 10036 2158 1005 1055 2544 1997 2054 1996 15737 2104 3958 13675 7432 6582 3502 1998 3958 13675 7432 6582 15365 2081 2067 1999 1996 3865 2006 1996 22574 1000 2502 1017 1000 3703 2076 1996 5095 2851 1013 5027 4843 13675 10936 2063 1012 2043 6945 6769 2288 2010 2398 2006 13675 7432 6582 1005 1055 3478 2544 1997 15737 2002 2357 2009 2046 2088 2528 4843 1998 8979 2000 4530 2035 15737 7604 2035 2362 1012 15737 2088 2898 1998 15737 4013 4843 2020 2128 20470 12260 2094 2007 1996 24215 8154 1998 2333 2125 1996 2346 2000 6373 1013 15418 4835 1999 10108 1010 3516 1998 2776 2150 2498 2062 2084 28667 9331 3065 2005 24215 1005 1055 9152 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:23.374186 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:23.376239 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0423 01:40:23.378208 139939410937728 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0423 01:40:23.389787 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:23.391996 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] if the lion king was a disney version of hamlet , then the lion king 3 : ha ##ku ##na mata ##ta is a disney version of guild ##ens ##tern and rosen ##cr ##ant ##z are dead . just like tom stop ##par ##d ' s beg ##uil ##ing film , we get to view the action from the point of view of two of the minor characters from the original : tim ##on , the me ##er ##kat with a pen ##chan ##t for breaking into song at the drop of a hat , and pu ##mba ##a , the war ##th ##og with flat ##ule ##nce issues . by following their story - rather than sim ##ba ' s - we get to see [SEP]


I0423 01:40:23.394224 139939410937728 run_classifier.py:464] tokens: [CLS] if the lion king was a disney version of hamlet , then the lion king 3 : ha ##ku ##na mata ##ta is a disney version of guild ##ens ##tern and rosen ##cr ##ant ##z are dead . just like tom stop ##par ##d ' s beg ##uil ##ing film , we get to view the action from the point of view of two of the minor characters from the original : tim ##on , the me ##er ##kat with a pen ##chan ##t for breaking into song at the drop of a hat , and pu ##mba ##a , the war ##th ##og with flat ##ule ##nce issues . by following their story - rather than sim ##ba ' s - we get to see [SEP]


INFO:tensorflow:input_ids: 101 2065 1996 7006 2332 2001 1037 6373 2544 1997 8429 1010 2059 1996 7006 2332 1017 1024 5292 5283 2532 22640 2696 2003 1037 6373 2544 1997 9054 6132 16451 1998 21701 26775 4630 2480 2024 2757 1012 2074 2066 3419 2644 19362 2094 1005 1055 11693 19231 2075 2143 1010 2057 2131 2000 3193 1996 2895 2013 1996 2391 1997 3193 1997 2048 1997 1996 3576 3494 2013 1996 2434 1024 5199 2239 1010 1996 2033 2121 24498 2007 1037 7279 14856 2102 2005 4911 2046 2299 2012 1996 4530 1997 1037 6045 1010 1998 16405 11201 2050 1010 1996 2162 2705 8649 2007 4257 9307 5897 3314 1012 2011 2206 2037 2466 1011 2738 2084 21934 3676 1005 1055 1011 2057 2131 2000 2156 102


I0423 01:40:23.396430 139939410937728 run_classifier.py:465] input_ids: 101 2065 1996 7006 2332 2001 1037 6373 2544 1997 8429 1010 2059 1996 7006 2332 1017 1024 5292 5283 2532 22640 2696 2003 1037 6373 2544 1997 9054 6132 16451 1998 21701 26775 4630 2480 2024 2757 1012 2074 2066 3419 2644 19362 2094 1005 1055 11693 19231 2075 2143 1010 2057 2131 2000 3193 1996 2895 2013 1996 2391 1997 3193 1997 2048 1997 1996 3576 3494 2013 1996 2434 1024 5199 2239 1010 1996 2033 2121 24498 2007 1037 7279 14856 2102 2005 4911 2046 2299 2012 1996 4530 1997 1037 6045 1010 1998 16405 11201 2050 1010 1996 2162 2705 8649 2007 4257 9307 5897 3314 1012 2011 2206 2037 2466 1011 2738 2084 21934 3676 1005 1055 1011 2057 2131 2000 2156 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:23.398580 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:23.400805 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0423 01:40:23.402539 139939410937728 run_classifier.py:468] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0423 01:40:23.413723 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:23.416001 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] although i was born in the year that this movie came out and had never heard of it until my junior year of high school ( 1996 ) when i saw it i became totally eng ##ross ##ed laughing and crying and feeling along with the characters because me and my friends were them . < br / > < br / > their hair , clothes and speech were outdated but the emotions and the desperation of each situation were so familiar ! i remember thinking how real it was and how i wished that they would make movies like that still . < br / > < br / > in fact i saw this movie the night after i had been at a [SEP]


I0423 01:40:23.418097 139939410937728 run_classifier.py:464] tokens: [CLS] although i was born in the year that this movie came out and had never heard of it until my junior year of high school ( 1996 ) when i saw it i became totally eng ##ross ##ed laughing and crying and feeling along with the characters because me and my friends were them . < br / > < br / > their hair , clothes and speech were outdated but the emotions and the desperation of each situation were so familiar ! i remember thinking how real it was and how i wished that they would make movies like that still . < br / > < br / > in fact i saw this movie the night after i had been at a [SEP]


INFO:tensorflow:input_ids: 101 2348 1045 2001 2141 1999 1996 2095 2008 2023 3185 2234 2041 1998 2018 2196 2657 1997 2009 2127 2026 3502 2095 1997 2152 2082 1006 2727 1007 2043 1045 2387 2009 1045 2150 6135 25540 25725 2098 5870 1998 6933 1998 3110 2247 2007 1996 3494 2138 2033 1998 2026 2814 2020 2068 1012 1026 7987 1013 1028 1026 7987 1013 1028 2037 2606 1010 4253 1998 4613 2020 25963 2021 1996 6699 1998 1996 15561 1997 2169 3663 2020 2061 5220 999 1045 3342 3241 2129 2613 2009 2001 1998 2129 1045 6257 2008 2027 2052 2191 5691 2066 2008 2145 1012 1026 7987 1013 1028 1026 7987 1013 1028 1999 2755 1045 2387 2023 3185 1996 2305 2044 1045 2018 2042 2012 1037 102


I0423 01:40:23.420334 139939410937728 run_classifier.py:465] input_ids: 101 2348 1045 2001 2141 1999 1996 2095 2008 2023 3185 2234 2041 1998 2018 2196 2657 1997 2009 2127 2026 3502 2095 1997 2152 2082 1006 2727 1007 2043 1045 2387 2009 1045 2150 6135 25540 25725 2098 5870 1998 6933 1998 3110 2247 2007 1996 3494 2138 2033 1998 2026 2814 2020 2068 1012 1026 7987 1013 1028 1026 7987 1013 1028 2037 2606 1010 4253 1998 4613 2020 25963 2021 1996 6699 1998 1996 15561 1997 2169 3663 2020 2061 5220 999 1045 3342 3241 2129 2613 2009 2001 1998 2129 1045 6257 2008 2027 2052 2191 5691 2066 2008 2145 1012 1026 7987 1013 1028 1026 7987 1013 1028 1999 2755 1045 2387 2023 3185 1996 2305 2044 1045 2018 2042 2012 1037 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:23.422428 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:23.424573 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0423 01:40:23.426503 139939410937728 run_classifier.py:468] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0423 01:40:23.432688 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:23.437405 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] i am dumb ##founded that i actually sat and watched this . i love independent films , horror films , and the whole zombie thing in general . but when you add ni ##nga ' s , you ' ve crossed a line that should never be crossed . i hope the people in this movie had a great time making it , then at least it wasn ' t a total waste . you ' d never know by watching it though . script ? are you kidding . acting ? i think even the trees were fa ##king . cinematography ? well , there must ' ve been a camera there . period . i don ' t think there was any actual planning [SEP]


I0423 01:40:23.439420 139939410937728 run_classifier.py:464] tokens: [CLS] i am dumb ##founded that i actually sat and watched this . i love independent films , horror films , and the whole zombie thing in general . but when you add ni ##nga ' s , you ' ve crossed a line that should never be crossed . i hope the people in this movie had a great time making it , then at least it wasn ' t a total waste . you ' d never know by watching it though . script ? are you kidding . acting ? i think even the trees were fa ##king . cinematography ? well , there must ' ve been a camera there . period . i don ' t think there was any actual planning [SEP]


INFO:tensorflow:input_ids: 101 1045 2572 12873 21001 2008 1045 2941 2938 1998 3427 2023 1012 1045 2293 2981 3152 1010 5469 3152 1010 1998 1996 2878 11798 2518 1999 2236 1012 2021 2043 2017 5587 9152 13807 1005 1055 1010 2017 1005 2310 4625 1037 2240 2008 2323 2196 2022 4625 1012 1045 3246 1996 2111 1999 2023 3185 2018 1037 2307 2051 2437 2009 1010 2059 2012 2560 2009 2347 1005 1056 1037 2561 5949 1012 2017 1005 1040 2196 2113 2011 3666 2009 2295 1012 5896 1029 2024 2017 12489 1012 3772 1029 1045 2228 2130 1996 3628 2020 6904 6834 1012 16434 1029 2092 1010 2045 2442 1005 2310 2042 1037 4950 2045 1012 2558 1012 1045 2123 1005 1056 2228 2045 2001 2151 5025 4041 102


I0423 01:40:23.441426 139939410937728 run_classifier.py:465] input_ids: 101 1045 2572 12873 21001 2008 1045 2941 2938 1998 3427 2023 1012 1045 2293 2981 3152 1010 5469 3152 1010 1998 1996 2878 11798 2518 1999 2236 1012 2021 2043 2017 5587 9152 13807 1005 1055 1010 2017 1005 2310 4625 1037 2240 2008 2323 2196 2022 4625 1012 1045 3246 1996 2111 1999 2023 3185 2018 1037 2307 2051 2437 2009 1010 2059 2012 2560 2009 2347 1005 1056 1037 2561 5949 1012 2017 1005 1040 2196 2113 2011 3666 2009 2295 1012 5896 1029 2024 2017 12489 1012 3772 1029 1045 2228 2130 1996 3628 2020 6904 6834 1012 16434 1029 2092 1010 2045 2442 1005 2310 2042 1037 4950 2045 1012 2558 1012 1045 2123 1005 1056 2228 2045 2001 2151 5025 4041 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:23.443494 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:23.445572 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0423 01:40:23.447509 139939410937728 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:Writing example 0 of 5000


I0423 01:40:50.477196 139939410937728 run_classifier.py:774] Writing example 0 of 5000


INFO:tensorflow:*** Example ***


I0423 01:40:50.485968 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:50.493386 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] fa ##q ##rs ##cape is truly one of those shows that just has it all , great acting , great cast , great writing , sets , chemistry , mu ##ppet ##s . . . it ' s got it all and then some , except a home . this fantastic series it ' s seem has it all except and ending . t ##pt ##b seem to think this is a series that is consecutive single set shows , when anyone who watches know this is an ongoing , one epic , love story , that has an end that must been seen . if you have never watched far ##sca ##pe do you ##sel ##f a favor and check it out on dvd when [SEP]


I0423 01:40:50.503009 139939410937728 run_classifier.py:464] tokens: [CLS] fa ##q ##rs ##cape is truly one of those shows that just has it all , great acting , great cast , great writing , sets , chemistry , mu ##ppet ##s . . . it ' s got it all and then some , except a home . this fantastic series it ' s seem has it all except and ending . t ##pt ##b seem to think this is a series that is consecutive single set shows , when anyone who watches know this is an ongoing , one epic , love story , that has an end that must been seen . if you have never watched far ##sca ##pe do you ##sel ##f a favor and check it out on dvd when [SEP]


INFO:tensorflow:input_ids: 101 6904 4160 2869 19464 2003 5621 2028 1997 2216 3065 2008 2074 2038 2009 2035 1010 2307 3772 1010 2307 3459 1010 2307 3015 1010 4520 1010 6370 1010 14163 29519 2015 1012 1012 1012 2009 1005 1055 2288 2009 2035 1998 2059 2070 1010 3272 1037 2188 1012 2023 10392 2186 2009 1005 1055 4025 2038 2009 2035 3272 1998 4566 1012 1056 13876 2497 4025 2000 2228 2023 2003 1037 2186 2008 2003 5486 2309 2275 3065 1010 2043 3087 2040 12197 2113 2023 2003 2019 7552 1010 2028 8680 1010 2293 2466 1010 2008 2038 2019 2203 2008 2442 2042 2464 1012 2065 2017 2031 2196 3427 2521 15782 5051 2079 2017 11246 2546 1037 5684 1998 4638 2009 2041 2006 4966 2043 102


I0423 01:40:50.505078 139939410937728 run_classifier.py:465] input_ids: 101 6904 4160 2869 19464 2003 5621 2028 1997 2216 3065 2008 2074 2038 2009 2035 1010 2307 3772 1010 2307 3459 1010 2307 3015 1010 4520 1010 6370 1010 14163 29519 2015 1012 1012 1012 2009 1005 1055 2288 2009 2035 1998 2059 2070 1010 3272 1037 2188 1012 2023 10392 2186 2009 1005 1055 4025 2038 2009 2035 3272 1998 4566 1012 1056 13876 2497 4025 2000 2228 2023 2003 1037 2186 2008 2003 5486 2309 2275 3065 1010 2043 3087 2040 12197 2113 2023 2003 2019 7552 1010 2028 8680 1010 2293 2466 1010 2008 2038 2019 2203 2008 2442 2042 2464 1012 2065 2017 2031 2196 3427 2521 15782 5051 2079 2017 11246 2546 1037 5684 1998 4638 2009 2041 2006 4966 2043 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:50.507058 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.510495 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0423 01:40:50.512248 139939410937728 run_classifier.py:468] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0423 01:40:50.527501 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:50.529640 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] this bela ##bor ##ed and sloppy spy mel ##od ##rama featuring two buff ##oon ##ish ( one ideal ##istic , one drug add ##led ) california kids dealing secrets to the kgb never seems to get enough steam up to sustain any tension and suspense before it dies a very slow death over two hours later . john sc ##hl ##es ##inger ' s finished product gives the impression that he was asleep in his director ' s chair most of the time as the film la ##gs and the actors sleep walk , save for the highly annoying over the top performance of sean penn . < br / > < br / > childhood altar boys and friends chris ( tim hutton ) and [SEP]


I0423 01:40:50.531714 139939410937728 run_classifier.py:464] tokens: [CLS] this bela ##bor ##ed and sloppy spy mel ##od ##rama featuring two buff ##oon ##ish ( one ideal ##istic , one drug add ##led ) california kids dealing secrets to the kgb never seems to get enough steam up to sustain any tension and suspense before it dies a very slow death over two hours later . john sc ##hl ##es ##inger ' s finished product gives the impression that he was asleep in his director ' s chair most of the time as the film la ##gs and the actors sleep walk , save for the highly annoying over the top performance of sean penn . < br / > < br / > childhood altar boys and friends chris ( tim hutton ) and [SEP]


INFO:tensorflow:input_ids: 101 2023 20252 12821 2098 1998 28810 8645 11463 7716 14672 3794 2048 23176 7828 4509 1006 2028 7812 6553 1010 2028 4319 5587 3709 1007 2662 4268 7149 7800 2000 1996 25467 2196 3849 2000 2131 2438 5492 2039 2000 15770 2151 6980 1998 23873 2077 2009 8289 1037 2200 4030 2331 2058 2048 2847 2101 1012 2198 8040 7317 2229 9912 1005 1055 2736 4031 3957 1996 8605 2008 2002 2001 6680 1999 2010 2472 1005 1055 3242 2087 1997 1996 2051 2004 1996 2143 2474 5620 1998 1996 5889 3637 3328 1010 3828 2005 1996 3811 15703 2058 1996 2327 2836 1997 5977 9502 1012 1026 7987 1013 1028 1026 7987 1013 1028 5593 9216 3337 1998 2814 3782 1006 5199 20408 1007 1998 102


I0423 01:40:50.534307 139939410937728 run_classifier.py:465] input_ids: 101 2023 20252 12821 2098 1998 28810 8645 11463 7716 14672 3794 2048 23176 7828 4509 1006 2028 7812 6553 1010 2028 4319 5587 3709 1007 2662 4268 7149 7800 2000 1996 25467 2196 3849 2000 2131 2438 5492 2039 2000 15770 2151 6980 1998 23873 2077 2009 8289 1037 2200 4030 2331 2058 2048 2847 2101 1012 2198 8040 7317 2229 9912 1005 1055 2736 4031 3957 1996 8605 2008 2002 2001 6680 1999 2010 2472 1005 1055 3242 2087 1997 1996 2051 2004 1996 2143 2474 5620 1998 1996 5889 3637 3328 1010 3828 2005 1996 3811 15703 2058 1996 2327 2836 1997 5977 9502 1012 1026 7987 1013 1028 1026 7987 1013 1028 5593 9216 3337 1998 2814 3782 1006 5199 20408 1007 1998 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:50.536420 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.538246 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0423 01:40:50.540370 139939410937728 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0423 01:40:50.558853 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:50.560716 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] walter matt ##ha ##u is wonderful as the " phil ##ander ##ing " dentist dr . julian winston whose frequent fi ##bs to girlfriend gold ##ie provide textbook proof of the dangers of lying . gold ##ie ha ##wn ' s touching ko ##ok toni simmons certainly deserved to win her oscar . ingrid bergman ' s work as the stiff - as - star ##ch nurse stephanie is also touching to watch as she comes out of her shell , slowly and nervously . this is a great movie to watch in the spring ##time , or any time for that matter . it ' s very under ##rated ; i never heard about it until i found it in the video store , and [SEP]


I0423 01:40:50.562681 139939410937728 run_classifier.py:464] tokens: [CLS] walter matt ##ha ##u is wonderful as the " phil ##ander ##ing " dentist dr . julian winston whose frequent fi ##bs to girlfriend gold ##ie provide textbook proof of the dangers of lying . gold ##ie ha ##wn ' s touching ko ##ok toni simmons certainly deserved to win her oscar . ingrid bergman ' s work as the stiff - as - star ##ch nurse stephanie is also touching to watch as she comes out of her shell , slowly and nervously . this is a great movie to watch in the spring ##time , or any time for that matter . it ' s very under ##rated ; i never heard about it until i found it in the video store , and [SEP]


INFO:tensorflow:input_ids: 101 4787 4717 3270 2226 2003 6919 2004 1996 1000 6316 12243 2075 1000 24385 2852 1012 6426 10180 3005 6976 10882 5910 2000 6513 2751 2666 3073 16432 6947 1997 1996 16796 1997 4688 1012 2751 2666 5292 7962 1005 1055 7244 12849 6559 16525 13672 5121 10849 2000 2663 2014 7436 1012 22093 24544 1005 1055 2147 2004 1996 10551 1011 2004 1011 2732 2818 6821 11496 2003 2036 7244 2000 3422 2004 2016 3310 2041 1997 2014 5806 1010 3254 1998 12531 1012 2023 2003 1037 2307 3185 2000 3422 1999 1996 3500 7292 1010 2030 2151 2051 2005 2008 3043 1012 2009 1005 1055 2200 2104 9250 1025 1045 2196 2657 2055 2009 2127 1045 2179 2009 1999 1996 2678 3573 1010 1998 102


I0423 01:40:50.564645 139939410937728 run_classifier.py:465] input_ids: 101 4787 4717 3270 2226 2003 6919 2004 1996 1000 6316 12243 2075 1000 24385 2852 1012 6426 10180 3005 6976 10882 5910 2000 6513 2751 2666 3073 16432 6947 1997 1996 16796 1997 4688 1012 2751 2666 5292 7962 1005 1055 7244 12849 6559 16525 13672 5121 10849 2000 2663 2014 7436 1012 22093 24544 1005 1055 2147 2004 1996 10551 1011 2004 1011 2732 2818 6821 11496 2003 2036 7244 2000 3422 2004 2016 3310 2041 1997 2014 5806 1010 3254 1998 12531 1012 2023 2003 1037 2307 3185 2000 3422 1999 1996 3500 7292 1010 2030 2151 2051 2005 2008 3043 1012 2009 1005 1055 2200 2104 9250 1025 1045 2196 2657 2055 2009 2127 1045 2179 2009 1999 1996 2678 3573 1010 1998 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:50.567320 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.569100 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0423 01:40:50.571059 139939410937728 run_classifier.py:468] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0423 01:40:50.574410 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:50.576314 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] this movie is terrible . it ' s about some no brain surf ##in dude that inherit ##s some company . does carrot top have no shame ? < br / > < br / > [SEP]


I0423 01:40:50.578294 139939410937728 run_classifier.py:464] tokens: [CLS] this movie is terrible . it ' s about some no brain surf ##in dude that inherit ##s some company . does carrot top have no shame ? < br / > < br / > [SEP]


INFO:tensorflow:input_ids: 101 2023 3185 2003 6659 1012 2009 1005 1055 2055 2070 2053 4167 14175 2378 12043 2008 22490 2015 2070 2194 1012 2515 25659 2327 2031 2053 9467 1029 1026 7987 1013 1028 1026 7987 1013 1028 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.580309 139939410937728 run_classifier.py:465] input_ids: 101 2023 3185 2003 6659 1012 2009 1005 1055 2055 2070 2053 4167 14175 2378 12043 2008 22490 2015 2070 2194 1012 2515 25659 2327 2031 2053 9467 1029 1026 7987 1013 1028 1026 7987 1013 1028 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.582904 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.584942 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0423 01:40:50.586766 139939410937728 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0423 01:40:50.601276 139939410937728 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0423 01:40:50.604540 139939410937728 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] i saw this film last night and came online specifically to see if others thought it was as awful as i did . < br / > < br / > granted , obviously some people see a lot in this film that i didn ' t , so if you ' re one of those people , fine - good luck to you . but i ' m a patient person . i ' ve enjoyed extremely long films before . but this was an exercise in torture for me . < br / > < br / > i honestly felt that this was one of those films with little to say , and that it was more about style than substance - however [SEP]


I0423 01:40:50.606558 139939410937728 run_classifier.py:464] tokens: [CLS] i saw this film last night and came online specifically to see if others thought it was as awful as i did . < br / > < br / > granted , obviously some people see a lot in this film that i didn ' t , so if you ' re one of those people , fine - good luck to you . but i ' m a patient person . i ' ve enjoyed extremely long films before . but this was an exercise in torture for me . < br / > < br / > i honestly felt that this was one of those films with little to say , and that it was more about style than substance - however [SEP]


INFO:tensorflow:input_ids: 101 1045 2387 2023 2143 2197 2305 1998 2234 3784 4919 2000 2156 2065 2500 2245 2009 2001 2004 9643 2004 1045 2106 1012 1026 7987 1013 1028 1026 7987 1013 1028 4379 1010 5525 2070 2111 2156 1037 2843 1999 2023 2143 2008 1045 2134 1005 1056 1010 2061 2065 2017 1005 2128 2028 1997 2216 2111 1010 2986 1011 2204 6735 2000 2017 1012 2021 1045 1005 1049 1037 5776 2711 1012 1045 1005 2310 5632 5186 2146 3152 2077 1012 2021 2023 2001 2019 6912 1999 8639 2005 2033 1012 1026 7987 1013 1028 1026 7987 1013 1028 1045 9826 2371 2008 2023 2001 2028 1997 2216 3152 2007 2210 2000 2360 1010 1998 2008 2009 2001 2062 2055 2806 2084 9415 1011 2174 102


I0423 01:40:50.608767 139939410937728 run_classifier.py:465] input_ids: 101 1045 2387 2023 2143 2197 2305 1998 2234 3784 4919 2000 2156 2065 2500 2245 2009 2001 2004 9643 2004 1045 2106 1012 1026 7987 1013 1028 1026 7987 1013 1028 4379 1010 5525 2070 2111 2156 1037 2843 1999 2023 2143 2008 1045 2134 1005 1056 1010 2061 2065 2017 1005 2128 2028 1997 2216 2111 1010 2986 1011 2204 6735 2000 2017 1012 2021 1045 1005 1049 1037 5776 2711 1012 1045 1005 2310 5632 5186 2146 3152 2077 1012 2021 2023 2001 2019 6912 1999 8639 2005 2033 1012 1026 7987 1013 1028 1026 7987 1013 1028 1045 9826 2371 2008 2023 2001 2028 1997 2216 3152 2007 2210 2000 2360 1010 1998 2008 2009 2001 2062 2055 2806 2084 9415 1011 2174 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0423 01:40:50.610912 139939410937728 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0423 01:40:50.613577 139939410937728 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0423 01:40:50.615551 139939410937728 run_classifier.py:468] label: 0 (id = 0)


#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [0]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            "f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [0]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [0]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [0]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'bert_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f45ecde8400>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


I0423 01:42:25.808030 139939410937728 estimator.py:201] Using config: {'_model_dir': 'bert_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f45ecde8400>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [0]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Skipping training since max_steps has already saved.


I0423 01:42:34.132283 139939410937728 estimator.py:351] Skipping training since max_steps has already saved.


Training took time  0:00:00.018116


Now let's use our test data to see how well our model did:

In [0]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [0]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


I0423 01:43:57.377673 139939410937728 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0423 01:44:00.750058 139939410937728 saver.py:1483] Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


I0423 01:44:11.539109 139939410937728 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-04-23T01:44:11Z


I0423 01:44:11.567611 139939410937728 evaluation.py:257] Starting evaluation at 2019-04-23T01:44:11Z


INFO:tensorflow:Graph was finalized.


I0423 01:44:13.221276 139939410937728 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from bert_model/model.ckpt-468


I0423 01:44:13.230787 139939410937728 saver.py:1270] Restoring parameters from bert_model/model.ckpt-468


INFO:tensorflow:Running local_init_op.


I0423 01:44:15.337622 139939410937728 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0423 01:44:15.606696 139939410937728 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2019-04-23-02:24:10


I0423 02:24:10.587150 139939410937728 evaluation.py:277] Finished evaluation at 2019-04-23-02:24:10


INFO:tensorflow:Saving dict for global step 468: auc = 0.8607431, eval_accuracy = 0.8608, f1_score = 0.8588235, false_negatives = 362.0, false_positives = 334.0, global_step = 468, loss = 0.529705, precision = 0.8637291, recall = 0.8539734, true_negatives = 2187.0, true_positives = 2117.0


I0423 02:24:10.593217 139939410937728 estimator.py:1979] Saving dict for global step 468: auc = 0.8607431, eval_accuracy = 0.8608, f1_score = 0.8588235, false_negatives = 362.0, false_positives = 334.0, global_step = 468, loss = 0.529705, precision = 0.8637291, recall = 0.8539734, true_negatives = 2187.0, true_positives = 2117.0


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 468: bert_model/model.ckpt-468


I0423 02:24:13.226771 139939410937728 estimator.py:2039] Saving 'checkpoint_path' summary for global step 468: bert_model/model.ckpt-468


{'auc': 0.8607431,
 'eval_accuracy': 0.8608,
 'f1_score': 0.8588235,
 'false_negatives': 362.0,
 'false_positives': 334.0,
 'global_step': 468,
 'loss': 0.529705,
 'precision': 0.8637291,
 'recall': 0.8539734,
 'true_negatives': 2187.0,
 'true_positives': 2117.0}

Now let's write code to make predictions on new sentences:

In [0]:
def getPrediction(in_sentences):
  labels = ["Negative", "Positive"]
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

In [0]:
pred_sentences = [
  "That movie was absolutely awful",
  "The acting was a bit lacking",
  "The film was creative and surprising",
  "Absolutely fantastic!"
]

In [0]:
predictions = getPrediction(pred_sentences)

INFO:tensorflow:Writing example 0 of 4
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: 
INFO:tensorflow:tokens: [CLS] that movie was absolutely awful [SEP]
INFO:tensorflow:input_ids: 101 2008 3185 2001 7078 9643 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Voila! We have a sentiment classifier!

In [0]:
predictions

[('That movie was absolutely awful',
  array([-4.9142293e-03, -5.3180690e+00], dtype=float32),
  'Negative'),
 ('The acting was a bit lacking',
  array([-0.03325794, -3.4200459 ], dtype=float32),
  'Negative'),
 ('The film was creative and surprising',
  array([-5.3589125e+00, -4.7171740e-03], dtype=float32),
  'Positive'),
 ('Absolutely fantastic!',
  array([-5.0434084 , -0.00647258], dtype=float32),
  'Positive')]