In [1]:
# Copyright 2019 Google Inc.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#Predicting Economic News Sentiment with BERT on TF Hub

If you’ve been following Natural Language Processing over the past year, you’ve probably heard of BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

Now that BERT's been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. In an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. Alternatively, [finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a model to predict whether a piece of economic news is positive or negative using BERT in Tensorflow with tf hub. Some code was adapted from [this colab notebook](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb). Let's get started!

In [2]:
!pip install --upgrade tensorflow
!pip uninstall protobuf -y
!pip install protobuf -y
!pip install tensorflow_hub
!pip install bert-tensorflow

Collecting tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/77/63/a9fa76de8dffe7455304c4ed635be4aa9c0bacef6e0633d87d5f54530c5c/tensorflow-1.13.1-cp36-cp36m-manylinux1_x86_64.whl (92.5MB)
[K    100% |████████████████████████████████| 92.5MB 758kB/s  eta 0:00:01
[?25hCollecting keras-preprocessing>=1.0.5 (from tensorflow)
[?25l  Downloading https://files.pythonhosted.org/packages/c0/bf/0315ef6a9fd3fc2346e85b0ff1f5f83ca17073f2c31ac719ab2e4da0d4a3/Keras_Preprocessing-1.0.9-py2.py3-none-any.whl (59kB)
[K    100% |████████████████████████████████| 61kB 38.2MB/s ta 0:00:01
Collecting protobuf>=3.6.1 (from tensorflow)
[?25l  Downloading https://files.pythonhosted.org/packages/c5/60/ca38e967360212ddbb004141a70f5f6d47296e1fba37964d8ac6cb631921/protobuf-3.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.2MB)
[K    100% |████████████████████████████████| 1.2MB 25.5MB/s eta 0:00:01
[?25hCollecting tensorboard<1.14.0,>=1.13.0 (from tensorflow)
[?25l  Downloading https://files.py

In [3]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

  from ._conv import register_converters as _register_converters
W0305 15:30:17.590888 140018860873536 __init__.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14


In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [4]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [5]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'temp_out'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = True #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = 'economic_news_sentiment' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: temp_out *****


#Data

First, let's download the dataset, hosted by Stanford. The code below, which downloads, extracts, and imports the IMDB Large Movie Review Dataset, is borrowed from [this Tensorflow tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub).

In [6]:
# from google.colab import drive
# drive.mount('/content/gdrive')
import os
os.getcwd()

'/mnt/notebook/modeling'

In [7]:
from tensorflow import keras
import os
import re

# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
  
  # data = load_directory_data(os.path.join(directory, "economic_sentiment_data.csv"))

  data = pd.read_csv(os.path.join(directory, "economic_sentiment_data.csv"))
  
  data = data[['sentence','sentiment','polarity']]
  
  print(data.shape)

  return data

# # Merge positive and negative examples, add a polarity column and shuffle.
# def load_dataset(directory):
#   data_df = load_directory_data(os.path.join(directory, "economic_sentiment_data.csv"))

#   return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# Download and process the dataset files.
def download_and_load_datasets(force_download=False):
#   dataset = tf.keras.utils.get_file(
#       fname="Full-Economic-News-DFE-839861.csv", 
#       origin="https://d1p17r2m4rzlbo.cloudfront.net/wp-content/uploads/2016/03/Full-Economic-News-DFE-839861.csv", 
#       extract=False)
  
#   print(os.path.dirname(dataset))
  
  full_data_df = load_directory_data(os.path.join('../../data/','raw'))
  
  train_df = full_data_df.iloc[0:3000]
  test_df = full_data_df.iloc[3000:]
  
  print(train_df.shape)
  print(test_df.shape)

  
  return train_df, test_df


To keep training fast, we'll take a sample of 5000 train and test examples, respectively.

In [8]:
train, test = download_and_load_datasets()

(3750, 3)
(3000, 3)
(750, 3)


In [9]:
# train = train.sample(5000)
# test = test.sample(5000)

In [10]:
train.columns

Index(['sentence', 'sentiment', 'polarity'], dtype='object')

For us, our input data is the 'sentence' column and our label is the 'polarity' column (0, 1 for negative and positive, respecitvely)

In [11]:
DATA_COLUMN = 'sentence'
LABEL_COLUMN = 'polarity'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = [0, 1]

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [12]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [14]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

Instructions for updating:
Colocations handled automatically by placer.


W0305 15:31:00.080643 140018860873536 deprecation.py:323] From /usr/local/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py:3632: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0305 15:31:01.704126 140018860873536 saver.py:1483] Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [15]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")
#tokenizer.tokenize(pred_sentences[0])


['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [16]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

INFO:tensorflow:Writing example 0 of 3000


I0305 15:31:10.484239 140018860873536 run_classifier.py:774] Writing example 0 of 3000


INFO:tensorflow:*** Example ***


I0305 15:31:10.488049 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:10.489075 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] new york - - yields on most certificates of deposit offered by major banks dropped more than a tenth of a percentage point in the latest week , reflecting the overall decline in short - term interest rates . < / br > < / br > on small - denomination , or " consumer , " cds sold directly by banks , the average yield on six - month deposits fell to 5 . 49 % from 5 . 62 % in the week ended yesterday , according to an 18 - bank survey by ban ##x ##qu ##ote money markets , a wilmington , del . , information service . < / br > < / br > on three - month " consumer [SEP]


I0305 15:31:10.489964 140018860873536 run_classifier.py:464] tokens: [CLS] new york - - yields on most certificates of deposit offered by major banks dropped more than a tenth of a percentage point in the latest week , reflecting the overall decline in short - term interest rates . < / br > < / br > on small - denomination , or " consumer , " cds sold directly by banks , the average yield on six - month deposits fell to 5 . 49 % from 5 . 62 % in the week ended yesterday , according to an 18 - bank survey by ban ##x ##qu ##ote money markets , a wilmington , del . , information service . < / br > < / br > on three - month " consumer [SEP]


INFO:tensorflow:input_ids: 101 2047 2259 1011 1011 16189 2006 2087 17987 1997 12816 3253 2011 2350 5085 3333 2062 2084 1037 7891 1997 1037 7017 2391 1999 1996 6745 2733 1010 10842 1996 3452 6689 1999 2460 1011 2744 3037 6165 1012 1026 1013 7987 1028 1026 1013 7987 1028 2006 2235 1011 18683 1010 2030 1000 7325 1010 1000 14340 2853 3495 2011 5085 1010 1996 2779 10750 2006 2416 1011 3204 10042 3062 2000 1019 1012 4749 1003 2013 1019 1012 5786 1003 1999 1996 2733 3092 7483 1010 2429 2000 2019 2324 1011 2924 5002 2011 7221 2595 28940 12184 2769 6089 1010 1037 17025 1010 3972 1012 1010 2592 2326 1012 1026 1013 7987 1028 1026 1013 7987 1028 2006 2093 1011 3204 1000 7325 102


I0305 15:31:10.490862 140018860873536 run_classifier.py:465] input_ids: 101 2047 2259 1011 1011 16189 2006 2087 17987 1997 12816 3253 2011 2350 5085 3333 2062 2084 1037 7891 1997 1037 7017 2391 1999 1996 6745 2733 1010 10842 1996 3452 6689 1999 2460 1011 2744 3037 6165 1012 1026 1013 7987 1028 1026 1013 7987 1028 2006 2235 1011 18683 1010 2030 1000 7325 1010 1000 14340 2853 3495 2011 5085 1010 1996 2779 10750 2006 2416 1011 3204 10042 3062 2000 1019 1012 4749 1003 2013 1019 1012 5786 1003 1999 1996 2733 3092 7483 1010 2429 2000 2019 2324 1011 2924 5002 2011 7221 2595 28940 12184 2769 6089 1010 1037 17025 1010 3972 1012 1010 2592 2326 1012 1026 1013 7987 1028 1026 1013 7987 1028 2006 2093 1011 3204 1000 7325 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0305 15:31:10.491736 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:10.492627 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:10.493472 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:10.496944 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:10.497846 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] new york - - ind ##ec ##ision marked the dollar ' s tone , as traders paused for breath , awaiting a critical monthly u . s . employment report for release today . < / br > < / br > the dollar ended the new york day slightly weaker against both the euro and the yen . < / br > < / br > market participants were also reluctant to make major bets on the yen , following a stream of more rigorous - sounding statements from top japanese officials about the issue of bank reforms . < / br > < / br > late yesterday afternoon in new york , the euro was at 98 . 77 cents , slightly stronger [SEP]


I0305 15:31:10.498705 140018860873536 run_classifier.py:464] tokens: [CLS] new york - - ind ##ec ##ision marked the dollar ' s tone , as traders paused for breath , awaiting a critical monthly u . s . employment report for release today . < / br > < / br > the dollar ended the new york day slightly weaker against both the euro and the yen . < / br > < / br > market participants were also reluctant to make major bets on the yen , following a stream of more rigorous - sounding statements from top japanese officials about the issue of bank reforms . < / br > < / br > late yesterday afternoon in new york , the euro was at 98 . 77 cents , slightly stronger [SEP]


INFO:tensorflow:input_ids: 101 2047 2259 1011 1011 27427 8586 19969 4417 1996 7922 1005 1055 4309 1010 2004 13066 5864 2005 3052 1010 15497 1037 4187 7058 1057 1012 1055 1012 6107 3189 2005 2713 2651 1012 1026 1013 7987 1028 1026 1013 7987 1028 1996 7922 3092 1996 2047 2259 2154 3621 15863 2114 2119 1996 9944 1998 1996 18371 1012 1026 1013 7987 1028 1026 1013 7987 1028 3006 6818 2020 2036 11542 2000 2191 2350 29475 2006 1996 18371 1010 2206 1037 5460 1997 2062 20001 1011 9391 8635 2013 2327 2887 4584 2055 1996 3277 1997 2924 8818 1012 1026 1013 7987 1028 1026 1013 7987 1028 2397 7483 5027 1999 2047 2259 1010 1996 9944 2001 2012 5818 1012 6255 16653 1010 3621 6428 102


I0305 15:31:10.499621 140018860873536 run_classifier.py:465] input_ids: 101 2047 2259 1011 1011 27427 8586 19969 4417 1996 7922 1005 1055 4309 1010 2004 13066 5864 2005 3052 1010 15497 1037 4187 7058 1057 1012 1055 1012 6107 3189 2005 2713 2651 1012 1026 1013 7987 1028 1026 1013 7987 1028 1996 7922 3092 1996 2047 2259 2154 3621 15863 2114 2119 1996 9944 1998 1996 18371 1012 1026 1013 7987 1028 1026 1013 7987 1028 3006 6818 2020 2036 11542 2000 2191 2350 29475 2006 1996 18371 1010 2206 1037 5460 1997 2062 20001 1011 9391 8635 2013 2327 2887 4584 2055 1996 3277 1997 2924 8818 1012 1026 1013 7987 1028 1026 1013 7987 1028 2397 7483 5027 1999 2047 2259 1010 1996 9944 2001 2012 5818 1012 6255 16653 1010 3621 6428 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0305 15:31:10.500525 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:10.501417 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:10.502252 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:10.505049 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:10.505931 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] stocks declined , as investors weighed slower - than - expected domestic economic growth and continued euro - zone concerns against signs the federal reserve may take new steps to bo ##lster the economy . < / br > < / br > the dow jones industrial average fell 53 . 59 points , or 0 . 5 % , to 114 ##9 ##3 . 72 , its lowest close since oct . 17 . < / br > < / br > weighing on the downs ##ide were al ##co ##a , which dropped 21 cents , or 2 . 2 % , to $ 9 . 26 , and bank of america , which fell 12 cents , or 2 . 2 % , [SEP]


I0305 15:31:10.506823 140018860873536 run_classifier.py:464] tokens: [CLS] stocks declined , as investors weighed slower - than - expected domestic economic growth and continued euro - zone concerns against signs the federal reserve may take new steps to bo ##lster the economy . < / br > < / br > the dow jones industrial average fell 53 . 59 points , or 0 . 5 % , to 114 ##9 ##3 . 72 , its lowest close since oct . 17 . < / br > < / br > weighing on the downs ##ide were al ##co ##a , which dropped 21 cents , or 2 . 2 % , to $ 9 . 26 , and bank of america , which fell 12 cents , or 2 . 2 % , [SEP]


INFO:tensorflow:input_ids: 101 15768 6430 1010 2004 9387 12781 12430 1011 2084 1011 3517 4968 3171 3930 1998 2506 9944 1011 4224 5936 2114 5751 1996 2976 3914 2089 2202 2047 4084 2000 8945 29576 1996 4610 1012 1026 1013 7987 1028 1026 1013 7987 1028 1996 23268 3557 3919 2779 3062 5187 1012 5354 2685 1010 2030 1014 1012 1019 1003 1010 2000 12457 2683 2509 1012 5824 1010 2049 7290 2485 2144 13323 1012 2459 1012 1026 1013 7987 1028 1026 1013 7987 1028 15243 2006 1996 12482 5178 2020 2632 3597 2050 1010 2029 3333 2538 16653 1010 2030 1016 1012 1016 1003 1010 2000 1002 1023 1012 2656 1010 1998 2924 1997 2637 1010 2029 3062 2260 16653 1010 2030 1016 1012 1016 1003 1010 102


I0305 15:31:10.507695 140018860873536 run_classifier.py:465] input_ids: 101 15768 6430 1010 2004 9387 12781 12430 1011 2084 1011 3517 4968 3171 3930 1998 2506 9944 1011 4224 5936 2114 5751 1996 2976 3914 2089 2202 2047 4084 2000 8945 29576 1996 4610 1012 1026 1013 7987 1028 1026 1013 7987 1028 1996 23268 3557 3919 2779 3062 5187 1012 5354 2685 1010 2030 1014 1012 1019 1003 1010 2000 12457 2683 2509 1012 5824 1010 2049 7290 2485 2144 13323 1012 2459 1012 1026 1013 7987 1028 1026 1013 7987 1028 15243 2006 1996 12482 5178 2020 2632 3597 2050 1010 2029 3333 2538 16653 1010 2030 1016 1012 1016 1003 1010 2000 1002 1023 1012 2656 1010 1998 2924 1997 2637 1010 2029 3062 2260 16653 1010 2030 1016 1012 1016 1003 1010 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0305 15:31:10.508608 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:10.509486 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:10.510331 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:10.513935 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:10.514815 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] the u . s . dollar declined against most major foreign cu ##rre ##ncies yesterday , although the drop was softened when bond prices failed to advance tuesday ' s rally . < / br > < / br > the dollar began weakening in europe as interest rates fell there for dollar deposits . the decline continued in new york trading , which was thin , although the dollar recovered slightly when bond prices began falling . lower bond prices translate into higher long - term interest yields , which make dollar den ##omi ##nated investments more attractive . the bond market later closed little - changed from tuesday . < / br > < / br > " this is the first time in [SEP]


I0305 15:31:10.515716 140018860873536 run_classifier.py:464] tokens: [CLS] the u . s . dollar declined against most major foreign cu ##rre ##ncies yesterday , although the drop was softened when bond prices failed to advance tuesday ' s rally . < / br > < / br > the dollar began weakening in europe as interest rates fell there for dollar deposits . the decline continued in new york trading , which was thin , although the dollar recovered slightly when bond prices began falling . lower bond prices translate into higher long - term interest yields , which make dollar den ##omi ##nated investments more attractive . the bond market later closed little - changed from tuesday . < / br > < / br > " this is the first time in [SEP]


INFO:tensorflow:input_ids: 101 1996 1057 1012 1055 1012 7922 6430 2114 2087 2350 3097 12731 14343 14767 7483 1010 2348 1996 4530 2001 16573 2043 5416 7597 3478 2000 5083 9857 1005 1055 8320 1012 1026 1013 7987 1028 1026 1013 7987 1028 1996 7922 2211 22031 1999 2885 2004 3037 6165 3062 2045 2005 7922 10042 1012 1996 6689 2506 1999 2047 2259 6202 1010 2029 2001 4857 1010 2348 1996 7922 6757 3621 2043 5416 7597 2211 4634 1012 2896 5416 7597 17637 2046 3020 2146 1011 2744 3037 16189 1010 2029 2191 7922 7939 20936 23854 10518 2062 8702 1012 1996 5416 3006 2101 2701 2210 1011 2904 2013 9857 1012 1026 1013 7987 1028 1026 1013 7987 1028 1000 2023 2003 1996 2034 2051 1999 102


I0305 15:31:10.516619 140018860873536 run_classifier.py:465] input_ids: 101 1996 1057 1012 1055 1012 7922 6430 2114 2087 2350 3097 12731 14343 14767 7483 1010 2348 1996 4530 2001 16573 2043 5416 7597 3478 2000 5083 9857 1005 1055 8320 1012 1026 1013 7987 1028 1026 1013 7987 1028 1996 7922 2211 22031 1999 2885 2004 3037 6165 3062 2045 2005 7922 10042 1012 1996 6689 2506 1999 2047 2259 6202 1010 2029 2001 4857 1010 2348 1996 7922 6757 3621 2043 5416 7597 2211 4634 1012 2896 5416 7597 17637 2046 3020 2146 1011 2744 3037 16189 1010 2029 2191 7922 7939 20936 23854 10518 2062 8702 1012 1996 5416 3006 2101 2701 2210 1011 2904 2013 9857 1012 1026 1013 7987 1028 1026 1013 7987 1028 1000 2023 2003 1996 2034 2051 1999 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0305 15:31:10.517484 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:10.518336 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:10.519143 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:10.522371 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:10.523231 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] author : james b . stewart < / br > < / br > the dread ##ed " d " word is back in circulation , and i don ' t mean " depression . " having skirt ##ed that potential cal ##ami ##ty , the worry for policy makers and investors now is def ##lation . < / br > < / br > on the face of it , def ##lation - - falling prices - - doesn ' t seem like it would be so bad . who wouldn ' t welcome discount ##s that just keep getting better , like those sales at file ##ne ' s basement where prices got lower the longer merchandise stayed on the racks ? < / [SEP]


I0305 15:31:10.524117 140018860873536 run_classifier.py:464] tokens: [CLS] author : james b . stewart < / br > < / br > the dread ##ed " d " word is back in circulation , and i don ' t mean " depression . " having skirt ##ed that potential cal ##ami ##ty , the worry for policy makers and investors now is def ##lation . < / br > < / br > on the face of it , def ##lation - - falling prices - - doesn ' t seem like it would be so bad . who wouldn ' t welcome discount ##s that just keep getting better , like those sales at file ##ne ' s basement where prices got lower the longer merchandise stayed on the racks ? < / [SEP]


INFO:tensorflow:input_ids: 101 3166 1024 2508 1038 1012 5954 1026 1013 7987 1028 1026 1013 7987 1028 1996 14436 2098 1000 1040 1000 2773 2003 2067 1999 9141 1010 1998 1045 2123 1005 1056 2812 1000 6245 1012 1000 2383 9764 2098 2008 4022 10250 10631 3723 1010 1996 4737 2005 3343 11153 1998 9387 2085 2003 13366 13490 1012 1026 1013 7987 1028 1026 1013 7987 1028 2006 1996 2227 1997 2009 1010 13366 13490 1011 1011 4634 7597 1011 1011 2987 1005 1056 4025 2066 2009 2052 2022 2061 2919 1012 2040 2876 1005 1056 6160 19575 2015 2008 2074 2562 2893 2488 1010 2066 2216 4341 2012 5371 2638 1005 1055 8102 2073 7597 2288 2896 1996 2936 16359 4370 2006 1996 27259 1029 1026 1013 102


I0305 15:31:10.525017 140018860873536 run_classifier.py:465] input_ids: 101 3166 1024 2508 1038 1012 5954 1026 1013 7987 1028 1026 1013 7987 1028 1996 14436 2098 1000 1040 1000 2773 2003 2067 1999 9141 1010 1998 1045 2123 1005 1056 2812 1000 6245 1012 1000 2383 9764 2098 2008 4022 10250 10631 3723 1010 1996 4737 2005 3343 11153 1998 9387 2085 2003 13366 13490 1012 1026 1013 7987 1028 1026 1013 7987 1028 2006 1996 2227 1997 2009 1010 13366 13490 1011 1011 4634 7597 1011 1011 2987 1005 1056 4025 2066 2009 2052 2022 2061 2919 1012 2040 2876 1005 1056 6160 19575 2015 2008 2074 2562 2893 2488 1010 2066 2216 4341 2012 5371 2638 1005 1055 8102 2073 7597 2288 2896 1996 2936 16359 4370 2006 1996 27259 1029 1026 1013 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


I0305 15:31:10.525931 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:10.526833 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:10.527679 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:Writing example 0 of 750


I0305 15:31:15.724125 140018860873536 run_classifier.py:774] Writing example 0 of 750


INFO:tensorflow:*** Example ***


I0305 15:31:15.725888 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:15.726775 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] interest rates rose slightly yesterday in quiet trading as investors and securities dealers digest ##ed new issues and waited for today ' s report on november employment to provide he ##w clues to future rate changes . [SEP]


I0305 15:31:15.727662 140018860873536 run_classifier.py:464] tokens: [CLS] interest rates rose slightly yesterday in quiet trading as investors and securities dealers digest ##ed new issues and waited for today ' s report on november employment to provide he ##w clues to future rate changes . [SEP]


INFO:tensorflow:input_ids: 101 3037 6165 3123 3621 7483 1999 4251 6202 2004 9387 1998 12012 16743 17886 2098 2047 3314 1998 4741 2005 2651 1005 1055 3189 2006 2281 6107 2000 3073 2002 2860 15774 2000 2925 3446 3431 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.728537 140018860873536 run_classifier.py:465] input_ids: 101 3037 6165 3123 3621 7483 1999 4251 6202 2004 9387 1998 12012 16743 17886 2098 2047 3314 1998 4741 2005 2651 1005 1055 3189 2006 2281 6107 2000 3073 2002 2860 15774 2000 2925 3446 3431 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.729370 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.730219 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:15.731069 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:15.732275 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:15.733145 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] many anal ##yt ##s expect a weak employment report to be the catalyst for another move by the federal reserve to push short - term rates lower . [SEP]


I0305 15:31:15.733984 140018860873536 run_classifier.py:464] tokens: [CLS] many anal ##yt ##s expect a weak employment report to be the catalyst for another move by the federal reserve to push short - term rates lower . [SEP]


INFO:tensorflow:input_ids: 101 2116 20302 22123 2015 5987 1037 5410 6107 3189 2000 2022 1996 16771 2005 2178 2693 2011 1996 2976 3914 2000 5245 2460 1011 2744 6165 2896 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.734855 140018860873536 run_classifier.py:465] input_ids: 101 2116 20302 22123 2015 5987 1037 5410 6107 3189 2000 2022 1996 16771 2005 2178 2693 2011 1996 2976 3914 2000 5245 2460 1011 2744 6165 2896 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.735767 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.736675 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:15.737484 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:15.738921 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:15.739778 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] although long - term bond yields are unusually high compared to short ##ter ##m rates , two of the nation ##u ##ª ##s largest corporations u ##o a . t . & t . and general motors u ##o have decided that conditions are favorable for large new long - term issues . [SEP]


I0305 15:31:15.740642 140018860873536 run_classifier.py:464] tokens: [CLS] although long - term bond yields are unusually high compared to short ##ter ##m rates , two of the nation ##u ##ª ##s largest corporations u ##o a . t . & t . and general motors u ##o have decided that conditions are favorable for large new long - term issues . [SEP]


INFO:tensorflow:input_ids: 101 2348 2146 1011 2744 5416 16189 2024 12890 2152 4102 2000 2460 3334 2213 6165 1010 2048 1997 1996 3842 2226 29653 2015 2922 11578 1057 2080 1037 1012 1056 1012 1004 1056 1012 1998 2236 9693 1057 2080 2031 2787 2008 3785 2024 11119 2005 2312 2047 2146 1011 2744 3314 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.741528 140018860873536 run_classifier.py:465] input_ids: 101 2348 2146 1011 2744 5416 16189 2024 12890 2152 4102 2000 2460 3334 2213 6165 1010 2048 1997 1996 3842 2226 29653 2015 2922 11578 1057 2080 1037 1012 1056 1012 1004 1056 1012 1998 2236 9693 1057 2080 2031 2787 2008 3785 2024 11119 2005 2312 2047 2146 1011 2744 3314 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.742405 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.743274 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:15.744117 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 15:31:15.745550 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:15.746393 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] the american telephone and telegraph company offered $ 67 ##6 million of 40 - year de ##ben ##tures with a yield of 8 . 65 percent yesterday , while g . m . plans to offer $ 1 billion of preferred stock with a yield of 9 . 125 percent today . [SEP]


I0305 15:31:15.747229 140018860873536 run_classifier.py:464] tokens: [CLS] the american telephone and telegraph company offered $ 67 ##6 million of 40 - year de ##ben ##tures with a yield of 8 . 65 percent yesterday , while g . m . plans to offer $ 1 billion of preferred stock with a yield of 9 . 125 percent today . [SEP]


INFO:tensorflow:input_ids: 101 1996 2137 7026 1998 10013 2194 3253 1002 6163 2575 2454 1997 2871 1011 2095 2139 10609 22662 2007 1037 10750 1997 1022 1012 3515 3867 7483 1010 2096 1043 1012 1049 1012 3488 2000 3749 1002 1015 4551 1997 6871 4518 2007 1037 10750 1997 1023 1012 8732 3867 2651 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.748109 140018860873536 run_classifier.py:465] input_ids: 101 1996 2137 7026 1998 10013 2194 3253 1002 6163 2575 2454 1997 2871 1011 2095 2139 10609 22662 2007 1037 10750 1997 1022 1012 3515 3867 7483 1010 2096 1043 1012 1049 1012 3488 2000 3749 1002 1015 4551 1997 6871 4518 2007 1037 10750 1997 1023 1012 8732 3867 2651 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.748969 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.749826 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


I0305 15:31:15.750668 140018860873536 run_classifier.py:468] label: 1 (id = 1)


INFO:tensorflow:*** Example ***


I0305 15:31:15.751820 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0305 15:31:15.752696 140018860873536 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] a fee ##ble stock - market rally yesterday morning gave way to afternoon selling pressure that produced a mild set ##back . [SEP]


I0305 15:31:15.753545 140018860873536 run_classifier.py:464] tokens: [CLS] a fee ##ble stock - market rally yesterday morning gave way to afternoon selling pressure that produced a mild set ##back . [SEP]


INFO:tensorflow:input_ids: 101 1037 7408 3468 4518 1011 3006 8320 7483 2851 2435 2126 2000 5027 4855 3778 2008 2550 1037 10256 2275 5963 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.754398 140018860873536 run_classifier.py:465] input_ids: 101 1037 7408 3468 4518 1011 3006 8320 7483 2851 2435 2126 2000 5027 4855 3778 2008 2550 1037 10256 2275 5963 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.755255 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 15:31:15.756147 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 15:31:15.756992 140018860873536 run_classifier.py:468] label: 0 (id = 0)


#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [17]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            "f1_score": f1_score,
            "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [18]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [19]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 6
# Warmup is a period of time where the learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [20]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [21]:
# Specify output directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [22]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'temp_out', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f585444aac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


I0305 15:31:16.449522 140018860873536 estimator.py:201] Using config: {'_model_dir': 'temp_out', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f585444aac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [23]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [24]:
!pip install dask --upgrade

Collecting dask
[?25l  Downloading https://files.pythonhosted.org/packages/a3/79/41d27ad703e782a422636dc8e0ce2f7624ef541b7219bd93a4af0b0d799c/dask-1.1.3-py2.py3-none-any.whl (703kB)
[K    100% |████████████████████████████████| 706kB 33.7MB/s ta 0:00:01
[31mdistributed 1.18.1 requires msgpack-python, which is not installed.[0m
[?25hInstalling collected packages: dask
  Found existing installation: dask 0.15.2
    Uninstalling dask-0.15.2:
      Successfully uninstalled dask-0.15.2
Successfully installed dask-1.1.3
[33mYou are using pip version 19.0.2, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [25]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Calling model_fn.


I0305 15:31:21.618894 140018860873536 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0305 15:31:24.026103 140018860873536 saver.py:1483] Saver not created because there are no variables in the graph to restore


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


W0305 15:31:24.125799 140018860873536 deprecation.py:506] From <ipython-input-18-e2dca8fc1283>:34: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


W0305 15:31:24.162363 140018860873536 deprecation.py:323] From /usr/local/anaconda/lib/python3.6/site-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Instructions for updating:
Use tf.cast instead.


W0305 15:31:24.219636 140018860873536 deprecation.py:323] From /usr/local/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Instructions for updating:
Use tf.cast instead.


W0305 15:31:30.491624 140018860873536 deprecation.py:323] From /usr/local/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/metrics_impl.py:455: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.



For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:Done calling model_fn.


I0305 15:31:32.069180 140018860873536 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


I0305 15:31:32.071269 140018860873536 basic_session_run_hooks.py:527] Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


I0305 15:31:35.322885 140018860873536 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Running local_init_op.


I0305 15:31:39.162439 140018860873536 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0305 15:31:39.393613 140018860873536 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into temp_out/model.ckpt.


I0305 15:31:46.657508 140018860873536 basic_session_run_hooks.py:594] Saving checkpoints for 0 into temp_out/model.ckpt.


INFO:tensorflow:loss = 0.7084825, step = 1


I0305 15:31:58.837381 140018860873536 basic_session_run_hooks.py:249] loss = 0.7084825, step = 1


INFO:tensorflow:global_step/sec: 0.216289


I0305 15:39:41.182151 140018860873536 basic_session_run_hooks.py:680] global_step/sec: 0.216289


INFO:tensorflow:loss = 0.5955755, step = 101 (462.347 sec)


I0305 15:39:41.184025 140018860873536 basic_session_run_hooks.py:247] loss = 0.5955755, step = 101 (462.347 sec)


INFO:tensorflow:global_step/sec: 0.218532


I0305 15:47:18.780807 140018860873536 basic_session_run_hooks.py:680] global_step/sec: 0.218532


INFO:tensorflow:loss = 0.45998886, step = 201 (457.599 sec)


I0305 15:47:18.782703 140018860873536 basic_session_run_hooks.py:247] loss = 0.45998886, step = 201 (457.599 sec)


INFO:tensorflow:global_step/sec: 0.218256


I0305 15:54:56.958614 140018860873536 basic_session_run_hooks.py:680] global_step/sec: 0.218256


INFO:tensorflow:loss = 0.05211179, step = 301 (458.178 sec)


I0305 15:54:56.960471 140018860873536 basic_session_run_hooks.py:247] loss = 0.05211179, step = 301 (458.178 sec)


INFO:tensorflow:global_step/sec: 0.218391


I0305 16:02:34.853186 140018860873536 basic_session_run_hooks.py:680] global_step/sec: 0.218391


INFO:tensorflow:loss = 0.040447406, step = 401 (457.895 sec)


I0305 16:02:34.855016 140018860873536 basic_session_run_hooks.py:247] loss = 0.040447406, step = 401 (457.895 sec)


INFO:tensorflow:Saving checkpoints for 500 into temp_out/model.ckpt.


I0305 16:10:08.832126 140018860873536 basic_session_run_hooks.py:594] Saving checkpoints for 500 into temp_out/model.ckpt.


INFO:tensorflow:global_step/sec: 0.217043


I0305 16:10:15.591091 140018860873536 basic_session_run_hooks.py:680] global_step/sec: 0.217043


INFO:tensorflow:loss = 0.15861872, step = 501 (460.738 sec)


I0305 16:10:15.592879 140018860873536 basic_session_run_hooks.py:247] loss = 0.15861872, step = 501 (460.738 sec)


INFO:tensorflow:Saving checkpoints for 562 into temp_out/model.ckpt.


I0305 16:14:55.337699 140018860873536 basic_session_run_hooks.py:594] Saving checkpoints for 562 into temp_out/model.ckpt.


INFO:tensorflow:Loss for final step: 0.0028626407.


I0305 16:14:57.731242 140018860873536 estimator.py:359] Loss for final step: 0.0028626407.


Training took time  0:43:37.242970


Now let's use our test data to see how well our model did:

In [26]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=True)

In [27]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


I0305 16:14:58.025079 140018860873536 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0305 16:15:00.647240 140018860873536 saver.py:1483] Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


I0305 16:15:07.010851 140018860873536 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-03-05T16:15:07Z


I0305 16:15:07.026895 140018860873536 evaluation.py:257] Starting evaluation at 2019-03-05T16:15:07Z


INFO:tensorflow:Graph was finalized.


I0305 16:15:08.367920 140018860873536 monitored_session.py:222] Graph was finalized.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


W0305 16:15:08.369490 140018860873536 deprecation.py:323] From /usr/local/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from temp_out/model.ckpt-562


I0305 16:15:08.371016 140018860873536 saver.py:1270] Restoring parameters from temp_out/model.ckpt-562


INFO:tensorflow:Running local_init_op.


I0305 16:15:10.273664 140018860873536 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0305 16:15:10.497281 140018860873536 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2019-03-05-16:15:54


I0305 16:15:54.850268 140018860873536 evaluation.py:277] Finished evaluation at 2019-03-05-16:15:54


INFO:tensorflow:Saving dict for global step 562: auc = 0.7219401, eval_accuracy = 0.7350544, f1_score = 0.64864856, false_negatives = 87.0, false_positives = 108.0, global_step = 562, loss = 1.205908, precision = 0.625, recall = 0.6741573, true_negatives = 361.0, true_positives = 180.0


I0305 16:15:54.851687 140018860873536 estimator.py:1979] Saving dict for global step 562: auc = 0.7219401, eval_accuracy = 0.7350544, f1_score = 0.64864856, false_negatives = 87.0, false_positives = 108.0, global_step = 562, loss = 1.205908, precision = 0.625, recall = 0.6741573, true_negatives = 361.0, true_positives = 180.0


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 562: temp_out/model.ckpt-562


I0305 16:15:56.764895 140018860873536 estimator.py:2039] Saving 'checkpoint_path' summary for global step 562: temp_out/model.ckpt-562


{'auc': 0.7219401,
 'eval_accuracy': 0.7350544,
 'f1_score': 0.64864856,
 'false_negatives': 87.0,
 'false_positives': 108.0,
 'global_step': 562,
 'loss': 1.205908,
 'precision': 0.625,
 'recall': 0.6741573,
 'true_negatives': 361.0,
 'true_positives': 180.0}

Now let's write code to make predictions on new sentences:

In [28]:
def getPrediction(in_sentences):
  labels = ["Negative", "Positive"]
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

In [29]:
pred_sentences = [
  '''While the RMB in 2017 was broadly in line with economic fundamentals and desirable policies, the current account surplus was moderately
stronger. This reflects structural distortions and policies that cause excessive savings, such as low social spending. Addressing these distortions and the resulting external imbalance would benefit
both China and the global economy.''',
  '''Favorable domestic and external conditions reduced capital outflows and exchange rate pressure. The RMB was broadly stable against the basket published by the China Foreign
Exchange Trade System (CFETS) in 2017, but with more fluctuation versus the dollar, and it has appreciated by about 2 percent in real effective terms in the first half of 2018. The current account
surplus continued to decline but, reflecting distortions and policy gaps that encourage excessive savings, the external position for 2017 is assessed as moderately stronger than the level consistent
with medium-term fundamentals and desirable policies, with the exchange rate broadly in line(Appendix I).''',
    '''Large outflows and pressure on the exchange rate could resume due to tighter and more volatile global financial conditions, especially a surging dollar. Investor sentiment
towards emerging markets has recently weakened, and this could intensify, potentially spreading to China.''',
  '''. Uncoordinated financial and local government regulatory action could have unintended consequences that trigger disorderly repricing of corporate/LGFV credit risks, losses
for investors, and rollover risks for financial institutions''',
  '''But a lack of decisive reforms in deleveraging and rebalancing would add to the Faster reform progress could pave the way for higher and
more sustainable GDP growth, already-high stock of vulnerabilities and worsen resource allocation, leading to more rapidly
diminishing returns over the medium term. This scenario also raises the probability of a disruptive adjustment to Chinese demand which would result in a contractionary impulse to the global
economy, as well as spillovers through commodity prices and financial markets. '''
]

In [30]:
predictions = getPrediction(pred_sentences)

INFO:tensorflow:Writing example 0 of 5


I0305 16:15:56.785655 140018860873536 run_classifier.py:774] Writing example 0 of 5


INFO:tensorflow:*** Example ***


I0305 16:15:56.787692 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 


I0305 16:15:56.788574 140018860873536 run_classifier.py:462] guid: 


INFO:tensorflow:tokens: [CLS] while the rm ##b in 2017 was broadly in line with economic fundamental ##s and desirable policies , the current account surplus was moderately stronger . this reflects structural distortion ##s and policies that cause excessive savings , such as low social spending . addressing these distortion ##s and the resulting external im ##balance would benefit both china and the global economy . [SEP]


I0305 16:15:56.789556 140018860873536 run_classifier.py:464] tokens: [CLS] while the rm ##b in 2017 was broadly in line with economic fundamental ##s and desirable policies , the current account surplus was moderately stronger . this reflects structural distortion ##s and policies that cause excessive savings , such as low social spending . addressing these distortion ##s and the resulting external im ##balance would benefit both china and the global economy . [SEP]


INFO:tensorflow:input_ids: 101 2096 1996 28549 2497 1999 2418 2001 13644 1999 2240 2007 3171 8050 2015 1998 16166 6043 1010 1996 2783 4070 15726 2001 17844 6428 1012 2023 11138 8332 20870 2015 1998 6043 2008 3426 11664 10995 1010 2107 2004 2659 2591 5938 1012 12786 2122 20870 2015 1998 1996 4525 6327 10047 26657 2052 5770 2119 2859 1998 1996 3795 4610 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.790429 140018860873536 run_classifier.py:465] input_ids: 101 2096 1996 28549 2497 1999 2418 2001 13644 1999 2240 2007 3171 8050 2015 1998 16166 6043 1010 1996 2783 4070 15726 2001 17844 6428 1012 2023 11138 8332 20870 2015 1998 6043 2008 3426 11664 10995 1010 2107 2004 2659 2591 5938 1012 12786 2122 20870 2015 1998 1996 4525 6327 10047 26657 2052 5770 2119 2859 1998 1996 3795 4610 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.791305 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.792163 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 16:15:56.793023 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 16:15:56.795469 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 


I0305 16:15:56.796399 140018860873536 run_classifier.py:462] guid: 


INFO:tensorflow:tokens: [CLS] favorable domestic and external conditions reduced capital out ##flow ##s and exchange rate pressure . the rm ##b was broadly stable against the basket published by the china foreign exchange trade system ( cf ##ets ) in 2017 , but with more flu ##ct ##uation versus the dollar , and it has appreciated by about 2 percent in real effective terms in the first half of 2018 . the current account surplus continued to decline but , reflecting distortion ##s and policy gaps that encourage excessive savings , the external position for 2017 is assessed as moderately stronger than the level consistent with medium - term fundamental ##s and desirable policies , with the exchange rate broadly in line ( appendix i ) . [SEP]


I0305 16:15:56.797281 140018860873536 run_classifier.py:464] tokens: [CLS] favorable domestic and external conditions reduced capital out ##flow ##s and exchange rate pressure . the rm ##b was broadly stable against the basket published by the china foreign exchange trade system ( cf ##ets ) in 2017 , but with more flu ##ct ##uation versus the dollar , and it has appreciated by about 2 percent in real effective terms in the first half of 2018 . the current account surplus continued to decline but , reflecting distortion ##s and policy gaps that encourage excessive savings , the external position for 2017 is assessed as moderately stronger than the level consistent with medium - term fundamental ##s and desirable policies , with the exchange rate broadly in line ( appendix i ) . [SEP]


INFO:tensorflow:input_ids: 101 11119 4968 1998 6327 3785 4359 3007 2041 12314 2015 1998 3863 3446 3778 1012 1996 28549 2497 2001 13644 6540 2114 1996 10810 2405 2011 1996 2859 3097 3863 3119 2291 1006 12935 8454 1007 1999 2418 1010 2021 2007 2062 19857 6593 14505 6431 1996 7922 1010 1998 2009 2038 12315 2011 2055 1016 3867 1999 2613 4621 3408 1999 1996 2034 2431 1997 2760 1012 1996 2783 4070 15726 2506 2000 6689 2021 1010 10842 20870 2015 1998 3343 16680 2008 8627 11664 10995 1010 1996 6327 2597 2005 2418 2003 14155 2004 17844 6428 2084 1996 2504 8335 2007 5396 1011 2744 8050 2015 1998 16166 6043 1010 2007 1996 3863 3446 13644 1999 2240 1006 22524 1045 1007 1012 102 0 0


I0305 16:15:56.798144 140018860873536 run_classifier.py:465] input_ids: 101 11119 4968 1998 6327 3785 4359 3007 2041 12314 2015 1998 3863 3446 3778 1012 1996 28549 2497 2001 13644 6540 2114 1996 10810 2405 2011 1996 2859 3097 3863 3119 2291 1006 12935 8454 1007 1999 2418 1010 2021 2007 2062 19857 6593 14505 6431 1996 7922 1010 1998 2009 2038 12315 2011 2055 1016 3867 1999 2613 4621 3408 1999 1996 2034 2431 1997 2760 1012 1996 2783 4070 15726 2506 2000 6689 2021 1010 10842 20870 2015 1998 3343 16680 2008 8627 11664 10995 1010 1996 6327 2597 2005 2418 2003 14155 2004 17844 6428 2084 1996 2504 8335 2007 5396 1011 2744 8050 2015 1998 16166 6043 1010 2007 1996 3863 3446 13644 1999 2240 1006 22524 1045 1007 1012 102 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0


I0305 16:15:56.799187 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.800112 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 16:15:56.800967 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 16:15:56.802518 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 


I0305 16:15:56.803414 140018860873536 run_classifier.py:462] guid: 


INFO:tensorflow:tokens: [CLS] large out ##flow ##s and pressure on the exchange rate could resume due to tighter and more volatile global financial conditions , especially a sur ##ging dollar . investor sentiment towards emerging markets has recently weakened , and this could int ##ens ##ify , potentially spreading to china . [SEP]


I0305 16:15:56.804299 140018860873536 run_classifier.py:464] tokens: [CLS] large out ##flow ##s and pressure on the exchange rate could resume due to tighter and more volatile global financial conditions , especially a sur ##ging dollar . investor sentiment towards emerging markets has recently weakened , and this could int ##ens ##ify , potentially spreading to china . [SEP]


INFO:tensorflow:input_ids: 101 2312 2041 12314 2015 1998 3778 2006 1996 3863 3446 2071 13746 2349 2000 12347 1998 2062 20606 3795 3361 3785 1010 2926 1037 7505 4726 7922 1012 14316 15792 2875 8361 6089 2038 3728 11855 1010 1998 2023 2071 20014 6132 8757 1010 9280 9359 2000 2859 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.805178 140018860873536 run_classifier.py:465] input_ids: 101 2312 2041 12314 2015 1998 3778 2006 1996 3863 3446 2071 13746 2349 2000 12347 1998 2062 20606 3795 3361 3785 1010 2926 1037 7505 4726 7922 1012 14316 15792 2875 8361 6089 2038 3728 11855 1010 1998 2023 2071 20014 6132 8757 1010 9280 9359 2000 2859 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.806096 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.807002 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 16:15:56.807863 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 16:15:56.809368 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 


I0305 16:15:56.810235 140018860873536 run_classifier.py:462] guid: 


INFO:tensorflow:tokens: [CLS] . un ##co ##ord ##inated financial and local government regulatory action could have un ##int ##ended consequences that trigger disorder ##ly rep ##ric ##ing of corporate / l ##gf ##v credit risks , losses for investors , and roll ##over risks for financial institutions [SEP]


I0305 16:15:56.811082 140018860873536 run_classifier.py:464] tokens: [CLS] . un ##co ##ord ##inated financial and local government regulatory action could have un ##int ##ended consequences that trigger disorder ##ly rep ##ric ##ing of corporate / l ##gf ##v credit risks , losses for investors , and roll ##over risks for financial institutions [SEP]


INFO:tensorflow:input_ids: 101 1012 4895 3597 8551 15833 3361 1998 2334 2231 10738 2895 2071 2031 4895 18447 21945 8465 2008 9495 8761 2135 16360 7277 2075 1997 5971 1013 1048 25708 2615 4923 10831 1010 6409 2005 9387 1010 1998 4897 7840 10831 2005 3361 4896 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.812003 140018860873536 run_classifier.py:465] input_ids: 101 1012 4895 3597 8551 15833 3361 1998 2334 2231 10738 2895 2071 2031 4895 18447 21945 8465 2008 9495 8761 2135 16360 7277 2075 1997 5971 1013 1048 25708 2615 4923 10831 1010 6409 2005 9387 1010 1998 4897 7840 10831 2005 3361 4896 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.812875 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.813737 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 16:15:56.814616 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:*** Example ***


I0305 16:15:56.816798 140018860873536 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 


I0305 16:15:56.817661 140018860873536 run_classifier.py:462] guid: 


INFO:tensorflow:tokens: [CLS] but a lack of decisive reforms in del ##ever ##aging and re ##bala ##nc ##ing would add to the faster reform progress could pa ##ve the way for higher and more sustainable gdp growth , already - high stock of vu ##ln ##era ##bilities and worse ##n resource allocation , leading to more rapidly dim ##ini ##shing returns over the medium term . this scenario also raises the probability of a disrupt ##ive adjustment to chinese demand which would result in a contraction ##ary impulse to the global economy , as well as spill ##overs through commodity prices and financial markets . [SEP]


I0305 16:15:56.818578 140018860873536 run_classifier.py:464] tokens: [CLS] but a lack of decisive reforms in del ##ever ##aging and re ##bala ##nc ##ing would add to the faster reform progress could pa ##ve the way for higher and more sustainable gdp growth , already - high stock of vu ##ln ##era ##bilities and worse ##n resource allocation , leading to more rapidly dim ##ini ##shing returns over the medium term . this scenario also raises the probability of a disrupt ##ive adjustment to chinese demand which would result in a contraction ##ary impulse to the global economy , as well as spill ##overs through commodity prices and financial markets . [SEP]


INFO:tensorflow:input_ids: 101 2021 1037 3768 1997 13079 8818 1999 3972 22507 16594 1998 2128 25060 12273 2075 2052 5587 2000 1996 5514 5290 5082 2071 6643 3726 1996 2126 2005 3020 1998 2062 9084 14230 3930 1010 2525 1011 2152 4518 1997 24728 19666 6906 14680 1998 4788 2078 7692 16169 1010 2877 2000 2062 5901 11737 5498 12227 5651 2058 1996 5396 2744 1012 2023 11967 2036 13275 1996 9723 1997 1037 23217 3512 19037 2000 2822 5157 2029 2052 2765 1999 1037 21963 5649 14982 2000 1996 3795 4610 1010 2004 2092 2004 14437 24302 2083 19502 7597 1998 3361 6089 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.819473 140018860873536 run_classifier.py:465] input_ids: 101 2021 1037 3768 1997 13079 8818 1999 3972 22507 16594 1998 2128 25060 12273 2075 2052 5587 2000 1996 5514 5290 5082 2071 6643 3726 1996 2126 2005 3020 1998 2062 9084 14230 3930 1010 2525 1011 2152 4518 1997 24728 19666 6906 14680 1998 4788 2078 7692 16169 1010 2877 2000 2062 5901 11737 5498 12227 5651 2058 1996 5396 2744 1012 2023 11967 2036 13275 1996 9723 1997 1037 23217 3512 19037 2000 2822 5157 2029 2052 2765 1999 1037 21963 5649 14982 2000 1996 3795 4610 1010 2004 2092 2004 14437 24302 2083 19502 7597 1998 3361 6089 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.820376 140018860873536 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0305 16:15:56.821250 140018860873536 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


I0305 16:15:56.822105 140018860873536 run_classifier.py:468] label: 0 (id = 0)


INFO:tensorflow:Calling model_fn.


I0305 16:15:56.837929 140018860873536 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0305 16:15:59.413136 140018860873536 saver.py:1483] Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


I0305 16:15:59.540780 140018860873536 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0305 16:15:59.923078 140018860873536 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from temp_out/model.ckpt-562


I0305 16:15:59.925048 140018860873536 saver.py:1270] Restoring parameters from temp_out/model.ckpt-562


INFO:tensorflow:Running local_init_op.


I0305 16:16:00.616605 140018860873536 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0305 16:16:00.702073 140018860873536 session_manager.py:493] Done running local_init_op.


Voila! We have a sentiment classifier!

In [31]:
predictions

[('While the RMB in 2017 was broadly in line with economic fundamentals and desirable policies, the current account surplus was moderately\nstronger. This reflects structural distortions and policies that cause excessive savings, such as low social spending. Addressing these distortions and the resulting external imbalance would benefit\nboth China and the global economy.',
  array([-1.3700870e-03, -6.5935264e+00], dtype=float32),
  'Negative'),
 ('Favorable domestic and external conditions reduced capital outflows and exchange rate pressure. The RMB was broadly stable against the basket published by the China Foreign\nExchange Trade System (CFETS) in 2017, but with more fluctuation versus the dollar, and it has appreciated by about 2 percent in real effective terms in the first half of 2018. The current account\nsurplus continued to decline but, reflecting distortions and policy gaps that encourage excessive savings, the external position for 2017 is assessed as moderately stronger th