<a href="https://colab.research.google.com/github/kod11/bert/blob/master/hillary_stance_all_3_labels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Copyright 2019 Google Inc.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#Predicting Movie Review Sentiment with BERT on TF Hub

If you’ve been following Natural Language Processing over the past year, you’ve probably heard of BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

Now that BERT's been added to [TF Hub](https://www.tensorflow.org/hub) as a loadable module, it's easy(ish) to add into existing Tensorflow text pipelines. In an existing pipeline, BERT can replace text embedding layers like ELMO and GloVE. Alternatively, [finetuning](http://wiki.fast.ai/index.php/Fine_tuning) BERT can provide both an accuracy boost and faster training time in many cases.

Here, we'll train a model to predict whether an IMDB movie review is positive or negative using BERT in Tensorflow with tf hub. Some code was adapted from [this colab notebook](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb). Let's get started!

EDIT: all 3 labels

In [0]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [39]:
!pip install bert-tensorflow



In [0]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [41]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'output_directory'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = True #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = 'BUCKET_NAME' #@param {type:"string"}

if USE_BUCKET:
  OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
  from google.colab import auth
  auth.authenticate_user()

if DO_DELETE:
  try:
    tf.gfile.DeleteRecursively(OUTPUT_DIR)
  except:
    # Doesn't matter if the directory didn't exist
    pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Model output directory: output_directory *****


#Data

First, let's download the dataset, hosted by Stanford. The code below, which downloads, extracts, and imports the IMDB Large Movie Review Dataset, is borrowed from [this Tensorflow tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub).

In [0]:
from tensorflow import keras
import os
import re
BERT_CASE = "uncased"

# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
  data = {}
  data["tweet"] = []
  data["stance"] = []
  for file_path in os.listdir(directory):
    with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
      data["tweet"].append(f.read())
      data["stance"].append(re.match("\d+_(\d+)\.txt", file_path).group(1))
  return pd.DataFrame.from_dict(data)

# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
  #pos_df = load_directory_data(os.path.join(directory, "pos"))
  #neg_df = load_directory_data(os.path.join(directory, "neg"))
  #pos_df["polarity"] = 1
  #neg_df["polarity"] = 0
  return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

#split the id row to get the clean ids
def split_id_row(id):
  elements = [re.split(':',e)[1] for e in id]
  return elements
  
def download_and_load_datasets():
  target = "hillary"
  #train_df = load_dataset("content/hillaryTrain.tsv")
  #test_df = load_dataset("content/hillaryTest.tsv")
  #train_df = pd.read_csv("hillaryTrain.tsv", sep="\t")
  #test_df = pd.read_csv("hillaryTest.tsv", sep="\t")
  train_df = pd.read_csv(target+"Train.tsv", sep="\t",encoding='latin1')
  test_df = pd.read_csv(target+"Test.tsv", sep="\t",encoding='latin1')
  train_df.columns = ["id","target","tweets","stance","againstwho","sentiment"]
  test_df.columns = ["id","target","tweets","stance","againstwho","sentiment"]
  test_df.id = [re.split(':',e)[1] for e in test_df.id]
  print(test_df)
  #remove rows with no stance
  #train_df = train_df[train_df.stance != "NONE"]
  #test_df = test_df[test_df.stance != "NONE"]
  #shuffle
  train_df = train_df.sample(frac=1)
  #test_df = test_df.sample(frac=1)
  return train_df, test_df


In [43]:
train, test = download_and_load_datasets()

        id           target  \
0    10676  Hillary Clinton   
1    10677  Hillary Clinton   
2    10678  Hillary Clinton   
3    10679  Hillary Clinton   
4    10680  Hillary Clinton   
5    10681  Hillary Clinton   
6    10682  Hillary Clinton   
7    10683  Hillary Clinton   
8    10684  Hillary Clinton   
9    10685  Hillary Clinton   
10   10686  Hillary Clinton   
11   10687  Hillary Clinton   
12   10688  Hillary Clinton   
13   10689  Hillary Clinton   
14   10690  Hillary Clinton   
15   10691  Hillary Clinton   
16   10692  Hillary Clinton   
17   10693  Hillary Clinton   
18   10694  Hillary Clinton   
19   10695  Hillary Clinton   
20   10696  Hillary Clinton   
21   10697  Hillary Clinton   
22   10698  Hillary Clinton   
23   10699  Hillary Clinton   
24   10700  Hillary Clinton   
25   10701  Hillary Clinton   
26   10702  Hillary Clinton   
27   10703  Hillary Clinton   
28   10704  Hillary Clinton   
29   10705  Hillary Clinton   
..     ...              ...   
264  109

To keep training fast, we'll take a sample of 5000 train and test examples, respectively.

In [0]:
#train = train.sample(5000)
#test = test.sample(5000)

In [45]:
train.columns
test.columns
print(train.to_string())

            id           target                                             tweets   stance againstwho sentiment
68   1643:1742  Hillary Clinton  marijuana? marijuaage equality?? coincidence i...     NONE      OTHER  NEGATIVE
489  2065:2164  Hillary Clinton  @HillaryClinton will be the nominee for the De...  AGAINST      OTHER  POSITIVE
620  2196:2295  Hillary Clinton  Hilly & Barry's Bloody Benghazi Bamboozle #Hil...  AGAINST     TARGET  NEGATIVE
456  2032:2131  Hillary Clinton  How anyone can believe a single word that come...  AGAINST      OTHER  NEGATIVE
357  1933:2032  Hillary Clinton  Dick Morris: @HillaryClinton is "fundamentally...  AGAINST     TARGET  NEGATIVE
117  1692:1791  Hillary Clinton  @NatureGuy101 @MaryMorientes   Support encoura...    FAVOR     TARGET  POSITIVE
40   1615:1714  Hillary Clinton  @thehill @evanperez are you kidding me...leave...    FAVOR     TARGET  NEGATIVE
88   1663:1762  Hillary Clinton  Gov. Chafee, why are you trying to make fetch ...     NONE     

For us, our input data is the 'sentence' column and our label is the 'polarity' column (0, 1 for negative and positive, respecitvely)

In [0]:
DATA_COLUMN = 'tweets'
LABEL_COLUMN = 'stance'
ID_COLUMN = 'id'
# label_list is the list of labels, i.e. True, False or 0, 1 or 'dog', 'cat'
label_list = ["FAVOR","AGAINST", "NONE"]

#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [0]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=x[ID_COLUMN], 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis = 1)

Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [48]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_"+BERT_CASE+"_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0319 11:27:06.415161 139823961655168 saver.py:1483] Saver not created because there are no variables in the graph to restore


Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [49]:
tokenizer.tokenize("Hillary Clinton is a big fat hag")

['hillary', 'clinton', 'is', 'a', 'big', 'fat', 'ha', '##g']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [50]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

INFO:tensorflow:Writing example 0 of 637


I0319 11:27:07.080463 139823961655168 run_classifier.py:774] Writing example 0 of 637


INFO:tensorflow:*** Example ***


I0319 11:27:07.085315 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0319 11:27:07.088952 139823961655168 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] marijuana ? mari ##ju ##aa ##ge equality ? ? coincidence i think not ! ! 11 ! [SEP]


I0319 11:27:07.092368 139823961655168 run_classifier.py:464] tokens: [CLS] marijuana ? mari ##ju ##aa ##ge equality ? ? coincidence i think not ! ! 11 ! [SEP]


INFO:tensorflow:input_ids: 101 16204 1029 16266 9103 11057 3351 9945 1029 1029 16507 1045 2228 2025 999 999 2340 999 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.094817 139823961655168 run_classifier.py:465] input_ids: 101 16204 1029 16266 9103 11057 3351 9945 1029 1029 16507 1045 2228 2025 999 999 2340 999 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.097559 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.100196 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: NONE (id = 2)


I0319 11:27:07.102838 139823961655168 run_classifier.py:468] label: NONE (id = 2)


INFO:tensorflow:*** Example ***


I0319 11:27:07.106488 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0319 11:27:07.109223 139823961655168 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] @ hillary ##cl ##inton will be the nominee for the democratic party . period . and she ' s gonna win the presidency too . # sorry ##ber ##nie [SEP]


I0319 11:27:07.111837 139823961655168 run_classifier.py:464] tokens: [CLS] @ hillary ##cl ##inton will be the nominee for the democratic party . period . and she ' s gonna win the presidency too . # sorry ##ber ##nie [SEP]


INFO:tensorflow:input_ids: 101 1030 18520 20464 27028 2097 2022 1996 9773 2005 1996 3537 2283 1012 2558 1012 1998 2016 1005 1055 6069 2663 1996 8798 2205 1012 1001 3374 5677 8034 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.114425 139823961655168 run_classifier.py:465] input_ids: 101 1030 18520 20464 27028 2097 2022 1996 9773 2005 1996 3537 2283 1012 2558 1012 1998 2016 1005 1055 6069 2663 1996 8798 2205 1012 1001 3374 5677 8034 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.117154 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.119772 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.122191 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.125357 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0319 11:27:07.127726 139823961655168 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] hilly & barry ' s bloody ben ##gh ##azi bamboo ##zle # hillary ##ice ##cre ##am ##fl ##av ##ors # tc ##ot # wake ##up ##ame ##rica [SEP]


I0319 11:27:07.130137 139823961655168 run_classifier.py:464] tokens: [CLS] hilly & barry ' s bloody ben ##gh ##azi bamboo ##zle # hillary ##ice ##cre ##am ##fl ##av ##ors # tc ##ot # wake ##up ##ame ##rica [SEP]


INFO:tensorflow:input_ids: 101 22800 1004 6287 1005 1055 6703 3841 5603 16103 15216 29247 1001 18520 6610 16748 3286 10258 11431 5668 1001 22975 4140 1001 5256 6279 14074 14735 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.132567 139823961655168 run_classifier.py:465] input_ids: 101 22800 1004 6287 1005 1055 6703 3841 5603 16103 15216 29247 1001 18520 6610 16748 3286 10258 11431 5668 1001 22975 4140 1001 5256 6279 14074 14735 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.135039 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.137504 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.139880 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.142984 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0319 11:27:07.145413 139823961655168 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] how anyone can believe a single word that comes out of that woman ' s mouth is beyond me . . . [SEP]


I0319 11:27:07.147730 139823961655168 run_classifier.py:464] tokens: [CLS] how anyone can believe a single word that comes out of that woman ' s mouth is beyond me . . . [SEP]


INFO:tensorflow:input_ids: 101 2129 3087 2064 2903 1037 2309 2773 2008 3310 2041 1997 2008 2450 1005 1055 2677 2003 3458 2033 1012 1012 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.150485 139823961655168 run_classifier.py:465] input_ids: 101 2129 3087 2064 2903 1037 2309 2773 2008 3310 2041 1997 2008 2450 1005 1055 2677 2003 3458 2033 1012 1012 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.152803 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.155237 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.157615 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.162140 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: None


I0319 11:27:07.164563 139823961655168 run_classifier.py:462] guid: None


INFO:tensorflow:tokens: [CLS] dick morris : @ hillary ##cl ##inton is " fundamentally corrupt " . he should know ! # cop ##oli ##tics # wc ##s ##15 [SEP]


I0319 11:27:07.166970 139823961655168 run_classifier.py:464] tokens: [CLS] dick morris : @ hillary ##cl ##inton is " fundamentally corrupt " . he should know ! # cop ##oli ##tics # wc ##s ##15 [SEP]


INFO:tensorflow:input_ids: 101 5980 6384 1024 1030 18520 20464 27028 2003 1000 24670 13593 1000 1012 2002 2323 2113 999 1001 8872 10893 14606 1001 15868 2015 16068 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.169342 139823961655168 run_classifier.py:465] input_ids: 101 5980 6384 1024 1030 18520 20464 27028 2003 1000 24670 13593 1000 1012 2002 2323 2113 999 1001 8872 10893 14606 1001 15868 2015 16068 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.171637 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.173666 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.176672 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:Writing example 0 of 294


I0319 11:27:07.667038 139823961655168 run_classifier.py:774] Writing example 0 of 294


INFO:tensorflow:*** Example ***


I0319 11:27:07.671277 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 10676


I0319 11:27:07.675172 139823961655168 run_classifier.py:462] guid: 10676


INFO:tensorflow:tokens: [CLS] @ jd ##son ##7 ##8 @ andrew ##bro ##ering andrew ##w ##hy ##do ##you ##care ##ab ##out ##w ##hat ##ith ##ink ? i didn ##ot ##real ##ize ##tha ##ti ##was ##thi ##si ##mp ##ort ##ant . sir ##t ##wee ##t andrew ##isa ##pa ##id ##tro ##ll [SEP]


I0319 11:27:07.678176 139823961655168 run_classifier.py:464] tokens: [CLS] @ jd ##son ##7 ##8 @ andrew ##bro ##ering andrew ##w ##hy ##do ##you ##care ##ab ##out ##w ##hat ##ith ##ink ? i didn ##ot ##real ##ize ##tha ##ti ##was ##thi ##si ##mp ##ort ##ant . sir ##t ##wee ##t andrew ##isa ##pa ##id ##tro ##ll [SEP]


INFO:tensorflow:input_ids: 101 1030 26219 3385 2581 2620 1030 4080 12618 7999 4080 2860 10536 3527 29337 16302 7875 5833 2860 12707 8939 19839 1029 1045 2134 4140 22852 4697 8322 3775 17311 15222 5332 8737 11589 4630 1012 2909 2102 28394 2102 4080 14268 4502 3593 13181 3363 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.682425 139823961655168 run_classifier.py:465] input_ids: 101 1030 26219 3385 2581 2620 1030 4080 12618 7999 4080 2860 10536 3527 29337 16302 7875 5833 2860 12707 8939 19839 1029 1045 2134 4140 22852 4697 8322 3775 17311 15222 5332 8737 11589 4630 1012 2909 2102 28394 2102 4080 14268 4502 3593 13181 3363 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.686954 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.690719 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.694503 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.703368 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 10677


I0319 11:27:07.706024 139823961655168 run_classifier.py:462] guid: 10677


INFO:tensorflow:tokens: [CLS] the white male vote is solid ##ly go ##p . the black vote is solid ##ly dem . that leaves white females and brown pp ##l . # feel ##the ##ber ##n [SEP]


I0319 11:27:07.710390 139823961655168 run_classifier.py:464] tokens: [CLS] the white male vote is solid ##ly go ##p . the black vote is solid ##ly dem . that leaves white females and brown pp ##l . # feel ##the ##ber ##n [SEP]


INFO:tensorflow:input_ids: 101 1996 2317 3287 3789 2003 5024 2135 2175 2361 1012 1996 2304 3789 2003 5024 2135 17183 1012 2008 3727 2317 3801 1998 2829 4903 2140 1012 1001 2514 10760 5677 2078 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.715074 139823961655168 run_classifier.py:465] input_ids: 101 1996 2317 3287 3789 2003 5024 2135 2175 2361 1012 1996 2304 3789 2003 5024 2135 17183 1012 2008 3727 2317 3801 1998 2829 4903 2140 1012 1001 2514 10760 5677 2078 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.719353 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.722743 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.727013 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.732005 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 10678


I0319 11:27:07.736786 139823961655168 run_classifier.py:462] guid: 10678


INFO:tensorflow:tokens: [CLS] @ ny ##in ##ves ##ting big banker buds need to rat ##chet up their " hillary cares about the little people " propaganda [SEP]


I0319 11:27:07.739924 139823961655168 run_classifier.py:464] tokens: [CLS] @ ny ##in ##ves ##ting big banker buds need to rat ##chet up their " hillary cares about the little people " propaganda [SEP]


INFO:tensorflow:input_ids: 101 1030 6396 2378 6961 3436 2502 13448 26734 2342 2000 9350 20318 2039 2037 1000 18520 14977 2055 1996 2210 2111 1000 10398 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.745399 139823961655168 run_classifier.py:465] input_ids: 101 1030 6396 2378 6961 3436 2502 13448 26734 2342 2000 9350 20318 2039 2037 1000 18520 14977 2055 1996 2210 2111 1000 10398 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.749266 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.752757 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.756477 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.761116 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 10679


I0319 11:27:07.765353 139823961655168 run_classifier.py:462] guid: 10679


INFO:tensorflow:tokens: [CLS] @ go ##p why should i believe you on this ? the go ##p leaders in congress won ' t fight obama now ! ! ! # tc ##ot [SEP]


I0319 11:27:07.768479 139823961655168 run_classifier.py:464] tokens: [CLS] @ go ##p why should i believe you on this ? the go ##p leaders in congress won ' t fight obama now ! ! ! # tc ##ot [SEP]


INFO:tensorflow:input_ids: 101 1030 2175 2361 2339 2323 1045 2903 2017 2006 2023 1029 1996 2175 2361 4177 1999 3519 2180 1005 1056 2954 8112 2085 999 999 999 1001 22975 4140 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.771730 139823961655168 run_classifier.py:465] input_ids: 101 1030 2175 2361 2339 2323 1045 2903 2017 2006 2023 1029 1996 2175 2361 4177 1999 3519 2180 1005 1056 2954 8112 2085 999 999 999 1001 22975 4140 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.774557 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.778321 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.782002 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


INFO:tensorflow:*** Example ***


I0319 11:27:07.786876 139823961655168 run_classifier.py:461] *** Example ***


INFO:tensorflow:guid: 10680


I0319 11:27:07.791017 139823961655168 run_classifier.py:462] guid: 10680


INFO:tensorflow:tokens: [CLS] @ rush ##ette ##ny @ twitch ##yte ##am hillary to press : " curt ##sy while you ' re thinking what to say , it saves time " # through ##the ##lo ##oki ##ng ##glass [SEP]


I0319 11:27:07.794645 139823961655168 run_classifier.py:464] tokens: [CLS] @ rush ##ette ##ny @ twitch ##yte ##am hillary to press : " curt ##sy while you ' re thinking what to say , it saves time " # through ##the ##lo ##oki ##ng ##glass [SEP]


INFO:tensorflow:input_ids: 101 1030 5481 7585 4890 1030 19435 17250 3286 18520 2000 2811 1024 1000 20099 6508 2096 2017 1005 2128 3241 2054 2000 2360 1010 2009 13169 2051 1000 1001 2083 10760 4135 23212 3070 15621 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.798377 139823961655168 run_classifier.py:465] input_ids: 101 1030 5481 7585 4890 1030 19435 17250 3286 18520 2000 2811 1024 1000 20099 6508 2096 2017 1005 2128 3241 2054 2000 2360 1010 2009 13169 2051 1000 1001 2083 10760 4135 23212 3070 15621 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.802562 139823961655168 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


I0319 11:27:07.806481 139823961655168 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: AGAINST (id = 1)


I0319 11:27:07.810227 139823961655168 run_classifier.py:468] label: AGAINST (id = 1)


#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [0]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for politeness data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        '''f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)'''
        return {"eval_accuracy": accuracy}
      
            
      
      #try to only consider labels for and against
      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [0]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 3.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [0]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [56]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': 'output_directory', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2ade1fc898>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


I0319 11:27:08.278253 139823961655168 estimator.py:201] Using config: {'_model_dir': 'output_directory', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2ade1fc898>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [58]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Calling model_fn.


I0319 11:27:08.830127 139823961655168 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0319 11:27:12.645106 139823961655168 saver.py:1483] Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


I0319 11:27:24.066751 139823961655168 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


I0319 11:27:24.079625 139823961655168 basic_session_run_hooks.py:527] Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


I0319 11:27:26.291028 139823961655168 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Running local_init_op.


I0319 11:27:31.498679 139823961655168 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0319 11:27:31.780450 139823961655168 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into output_directory/model.ckpt.


I0319 11:27:42.073662 139823961655168 basic_session_run_hooks.py:594] Saving checkpoints for 0 into output_directory/model.ckpt.


INFO:tensorflow:loss = 1.1186547, step = 0


I0319 11:27:59.143070 139823961655168 basic_session_run_hooks.py:249] loss = 1.1186547, step = 0


INFO:tensorflow:Saving checkpoints for 59 into output_directory/model.ckpt.


I0319 11:29:33.149887 139823961655168 basic_session_run_hooks.py:594] Saving checkpoints for 59 into output_directory/model.ckpt.


INFO:tensorflow:Loss for final step: 0.24594301.


I0319 11:29:42.704812 139823961655168 estimator.py:359] Loss for final step: 0.24594301.


Training took time  0:02:34.398665


Now let's use our test data to see how well our model did:

In [0]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [60]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


I0319 11:29:43.094405 139823961655168 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0319 11:29:47.389819 139823961655168 saver.py:1483] Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


I0319 11:29:58.950342 139823961655168 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-03-19T11:29:58Z


I0319 11:29:58.984103 139823961655168 evaluation.py:257] Starting evaluation at 2019-03-19T11:29:58Z


INFO:tensorflow:Graph was finalized.


I0319 11:30:00.942346 139823961655168 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from output_directory/model.ckpt-59


I0319 11:30:00.954650 139823961655168 saver.py:1270] Restoring parameters from output_directory/model.ckpt-59


INFO:tensorflow:Running local_init_op.


I0319 11:30:03.372358 139823961655168 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0319 11:30:03.631359 139823961655168 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2019-03-19-11:30:10


I0319 11:30:10.032144 139823961655168 evaluation.py:277] Finished evaluation at 2019-03-19-11:30:10


INFO:tensorflow:Saving dict for global step 59: eval_accuracy = 0.70408165, global_step = 59, loss = 0.70109594


I0319 11:30:10.035103 139823961655168 estimator.py:1979] Saving dict for global step 59: eval_accuracy = 0.70408165, global_step = 59, loss = 0.70109594


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 59: output_directory/model.ckpt-59


I0319 11:30:10.040964 139823961655168 estimator.py:2039] Saving 'checkpoint_path' summary for global step 59: output_directory/model.ckpt-59


{'eval_accuracy': 0.70408165, 'global_step': 59, 'loss': 0.70109594}

Now let's write code to make predictions on new sentences:

In [0]:
def getPrediction(in_sentences):
  labels = ["FAVOR","AGAINST", "NONE"]
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = "NONE") for x in in_sentences] # here, "" is just a dummy label
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

Want to print the test results in a format that can be evaluated by the .pearl script supplied by semeval competition

In [0]:
def printTestDataForCompScript():
  labels = ["FAVOR","AGAINST", "NONE"]
  predictions = estimator.predict(test_input_fn)
  #print(test.tweets)
  pred = [(testId,testTarget,sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence,testId,testTarget, prediction in zip(test.tweets,test.id,test.target, predictions)]
  return(pred)

In [0]:
pred_sentences = [
  "our country is ready for a female prez, not ever hillary",
  "my vote is for hillary",
  "where are the emails hillary?",
  "she is a fraud",
  "retribution for benghazi",
  "#hillaryclinton have you ever told the truth?",
  "million bogus followers on twitter #hillaryclinton",
  "I like my hamburgers rare"
]

In [64]:
#predictions = getPrediction(pred_sentences)
predictions = printTestDataForCompScript()

INFO:tensorflow:Calling model_fn.


I0319 11:30:10.323535 139823961655168 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


I0319 11:30:14.513635 139823961655168 saver.py:1483] Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


I0319 11:30:14.768410 139823961655168 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0319 11:30:15.646814 139823961655168 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from output_directory/model.ckpt-59


I0319 11:30:15.660292 139823961655168 saver.py:1270] Restoring parameters from output_directory/model.ckpt-59


INFO:tensorflow:Running local_init_op.


I0319 11:30:16.514425 139823961655168 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0319 11:30:16.605342 139823961655168 session_manager.py:493] Done running local_init_op.


Voila! We have a sentiment classifier!

In [65]:
for x in predictions:
  print(x[0] + "\t" + x[1] + "\t" + x[2] + "\t" + x[4])

10676	Hillary Clinton	@JDSon78 @AndrewBroering AndrewWhyDoYouCareAboutWhatIThink? I DidNotRealizeThatIWasThisImportant. SirTweet AndrewIsAPaidTroll 	AGAINST
10677	Hillary Clinton	The white male vote is solidly GOP. The black vote is solidly DEM.  That leaves white females and brown ppl. #FeelTheBern 	AGAINST
10678	Hillary Clinton	@nyinvesting big banker buds need to ratchet up their "Hillary cares about the little people" propaganda  	AGAINST
10679	Hillary Clinton	@GOP Why should I believe you on this? The GOP leaders in congress won't fight Obama now!!! #tcot 	AGAINST
10680	Hillary Clinton	@RushetteNY @TwitchyTeam Hillary to press: "Curtsy while you're thinking what to say, it saves time" #throughthelookingglass  	AGAINST
10681	Hillary Clinton	@GovtsTheProblem This is what I see: Make way 4 ur queen peasants! Don'ttouch or talk 2 her U filth! #NoHillary2016 #Benghazi  	AGAINST
10682	Hillary Clinton	@CNNSotu @jaketapper - Can we get back to the issues? That's what #Bernie2016 wants to 