Google Colab link: https://colab.research.google.com/drive/1PLyNxB430viZId2-pEFFNWUkYfYsv2s9

In [0]:
# Copyright 2019 Google Inc.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [0]:
# Modifications copyright (C) 2013 Hieu Phan

#Document Classification by Reading Difficulty with BERT #

Import the neccessary libraries.  
Note: the BERT library below only works with Tensorflow version under 2.0.0. 

In [0]:
import numpy as np 
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In addition to the standard libraries we imported above, we'll need to install BERT's python package from Tensorflow

In [0]:
!pip install bert-tensorflow

Collecting bert-tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)
[K     |████▉                           | 10kB 22.9MB/s eta 0:00:01[K     |█████████▊                      | 20kB 2.1MB/s eta 0:00:01[K     |██████████████▋                 | 30kB 3.1MB/s eta 0:00:01[K     |███████████████████▍            | 40kB 2.0MB/s eta 0:00:01[K     |████████████████████████▎       | 51kB 2.5MB/s eta 0:00:01[K     |█████████████████████████████▏  | 61kB 3.0MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.8MB/s 
Installing collected packages: bert-tensorflow
Successfully installed bert-tensorflow-1.0.1


In [0]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization




Mount the drive.

In [0]:
from google.colab import drive
drive.mount('/content/gdrive/')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive/


In [0]:
OUTPUT_DIR = F"/content/gdrive/My Drive/585/" 

#Data

Load the Newsela dataset from the mounted drive. These two datasets are generated from the Newsela corpus using stratified sampling with training/test ratio of 9:1.

In [0]:
import pandas as pd
from tensorflow import keras
import os
import re


# Load both training and test set from Google Drive into DataFrames.
def load_dataset(path='/content/gdrive/My Drive/585/'):
  training = pd.read_csv(path + 'training_newsela.csv')
  test = pd.read_csv(path + 'test_newsela.csv')
  return training, test



In [0]:
train, test = load_dataset()

Print out the sizes of our training and test sets and also an example.

In [0]:
print('Training set contains %d examples' % len(train))
print('Test set contains %d examples' % len(test))

Training set contains 8570 examples
Test set contains 953 examples


In [0]:
idx = np.random.randint(0, len(train))
print('Example from the training set:')
print('Text:' + train.iloc[idx].text)
print('Difficulty (between 0 and 4): %d' % train.iloc[idx].label) 

Example from the training set:
Text:Maricopa County Sheriff Joe Arpaio calls himself "the toughest sheriff in America." The Arizona lawman has made national headlines multiple times. He faces lawsuits from the U.S. Justice Department and private citizens for allegedly singling out Latinos, a practice known as racial profiling. He launched an investigation into President Obama's birth certificate, and he published a biography called "Joe's Law" that included some controversial statements about Mexican-Americans.

Following the Sandy Hook Elementary School tragedy, officials all over the country are trying to figure out the best way to keep schools safe. Last week, Sheriff Arpaio put his own new school safety plan into action. He sent out his volunteer posse to patrol 59 schools in Maricopa County.

The posse is a group the sheriff assembled in 1993 to help with shopping mall thefts. Now, it is best known for its role in assisting the sheriff's workplace raids. The raids targeted immigra

In [0]:
train.columns

Index(['Unnamed: 0', 'text', 'label'], dtype='object')

In [0]:
DATA_COLUMN = 'text'
LABEL_COLUMN = 'label'
label_list = [0, 1, 2, 3, 4]

#Data Preprocessing


In [0]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis=1)

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis=1)

To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [0]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
  """Get the vocab file and casing info from the Hub module."""
  with tf.Graph().as_default():
    bert_module = hub.Module(BERT_MODEL_HUB)
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
    with tf.Session() as sess:
      vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                            tokenization_info["do_lower_case"]])
      
  return bert.tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore








Tokenized example:

In [0]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Convert to features for BERT.

In [0]:
# We'll set sequences to be at most 128 tokens long.
MAX_SEQ_LENGTH = 128
# Convert our train and test features to InputFeatures that BERT understands.
train_features = bert.run_classifier.convert_examples_to_features(train_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)
test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)







INFO:tensorflow:Writing example 0 of 8570


INFO:tensorflow:Writing example 0 of 8570


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] lo ##me , togo — soccer officials in the african country of togo are afraid to let their team travel to guinea for a match . guinea is where the outbreak of the deadly e ##bola disease started . to protect their players , togo is asking that the match be moved to another country . the soccer match is part of the african cup . it is the biggest soccer tournament in africa and it happens every year . soccer officials fear the spread of the painful and con ##tag ##ious disease could ruin the african cup ' s final qualifying round . the winners of the qualifying round will get to play in the finals in morocco . e ##bola has killed nearly 1 [SEP]


INFO:tensorflow:tokens: [CLS] lo ##me , togo — soccer officials in the african country of togo are afraid to let their team travel to guinea for a match . guinea is where the outbreak of the deadly e ##bola disease started . to protect their players , togo is asking that the match be moved to another country . the soccer match is part of the african cup . it is the biggest soccer tournament in africa and it happens every year . soccer officials fear the spread of the painful and con ##tag ##ious disease could ruin the african cup ' s final qualifying round . the winners of the qualifying round will get to play in the finals in morocco . e ##bola has killed nearly 1 [SEP]


INFO:tensorflow:input_ids: 101 8840 4168 1010 23588 1517 4715 4584 1999 1996 3060 2406 1997 23588 2024 4452 2000 2292 2037 2136 3604 2000 7102 2005 1037 2674 1012 7102 2003 2073 1996 8293 1997 1996 9252 1041 24290 4295 2318 1012 2000 4047 2037 2867 1010 23588 2003 4851 2008 1996 2674 2022 2333 2000 2178 2406 1012 1996 4715 2674 2003 2112 1997 1996 3060 2452 1012 2009 2003 1996 5221 4715 2977 1999 3088 1998 2009 6433 2296 2095 1012 4715 4584 3571 1996 3659 1997 1996 9145 1998 9530 15900 6313 4295 2071 10083 1996 3060 2452 1005 1055 2345 6042 2461 1012 1996 4791 1997 1996 6042 2461 2097 2131 2000 2377 1999 1996 4399 1999 9835 1012 1041 24290 2038 2730 3053 1015 102


INFO:tensorflow:input_ids: 101 8840 4168 1010 23588 1517 4715 4584 1999 1996 3060 2406 1997 23588 2024 4452 2000 2292 2037 2136 3604 2000 7102 2005 1037 2674 1012 7102 2003 2073 1996 8293 1997 1996 9252 1041 24290 4295 2318 1012 2000 4047 2037 2867 1010 23588 2003 4851 2008 1996 2674 2022 2333 2000 2178 2406 1012 1996 4715 2674 2003 2112 1997 1996 3060 2452 1012 2009 2003 1996 5221 4715 2977 1999 3088 1998 2009 6433 2296 2095 1012 4715 4584 3571 1996 3659 1997 1996 9145 1998 9530 15900 6313 4295 2071 10083 1996 3060 2452 1005 1055 2345 6042 2461 1012 1996 4791 1997 1996 6042 2461 2097 2131 2000 2377 1999 1996 4399 1999 9835 1012 1041 24290 2038 2730 3053 1015 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] los angeles — the sar ##dine fishing boat eileen glided through moon ##lit waters as it traveled from san pedro to santa catalina island off the california coast . its tired - eyed captain had grown more desperate as the evening wore on . but after 12 hours and $ 1 , 000 worth of fuel , corbin hanson and his crew returned to port . the hadn ' t caught a single small , silvery sar ##dine . " tonight ' s pretty reflective of how things have been going , " hanson said . " not very well . " to blame is the biggest sar ##dine crash in generations , which has made schools of the fish rare on the west coast . [SEP]


INFO:tensorflow:tokens: [CLS] los angeles — the sar ##dine fishing boat eileen glided through moon ##lit waters as it traveled from san pedro to santa catalina island off the california coast . its tired - eyed captain had grown more desperate as the evening wore on . but after 12 hours and $ 1 , 000 worth of fuel , corbin hanson and his crew returned to port . the hadn ' t caught a single small , silvery sar ##dine . " tonight ' s pretty reflective of how things have been going , " hanson said . " not very well . " to blame is the biggest sar ##dine crash in generations , which has made schools of the fish rare on the west coast . [SEP]


INFO:tensorflow:input_ids: 101 3050 3349 1517 1996 18906 10672 5645 4049 20495 26936 2083 4231 15909 5380 2004 2009 6158 2013 2624 7707 2000 4203 22326 2479 2125 1996 2662 3023 1012 2049 5458 1011 7168 2952 2018 4961 2062 7143 2004 1996 3944 5078 2006 1012 2021 2044 2260 2847 1998 1002 1015 1010 2199 4276 1997 4762 1010 24003 17179 1998 2010 3626 2513 2000 3417 1012 1996 2910 1005 1056 3236 1037 2309 2235 1010 21666 18906 10672 1012 1000 3892 1005 1055 3492 21346 1997 2129 2477 2031 2042 2183 1010 1000 17179 2056 1012 1000 2025 2200 2092 1012 1000 2000 7499 2003 1996 5221 18906 10672 5823 1999 8213 1010 2029 2038 2081 2816 1997 1996 3869 4678 2006 1996 2225 3023 1012 102


INFO:tensorflow:input_ids: 101 3050 3349 1517 1996 18906 10672 5645 4049 20495 26936 2083 4231 15909 5380 2004 2009 6158 2013 2624 7707 2000 4203 22326 2479 2125 1996 2662 3023 1012 2049 5458 1011 7168 2952 2018 4961 2062 7143 2004 1996 3944 5078 2006 1012 2021 2044 2260 2847 1998 1002 1015 1010 2199 4276 1997 4762 1010 24003 17179 1998 2010 3626 2513 2000 3417 1012 1996 2910 1005 1056 3236 1037 2309 2235 1010 21666 18906 10672 1012 1000 3892 1005 1055 3492 21346 1997 2129 2477 2031 2042 2183 1010 1000 17179 2056 1012 1000 2025 2200 2092 1012 1000 2000 7499 2003 1996 5221 18906 10672 5823 1999 8213 1010 2029 2038 2081 2816 1997 1996 3869 4678 2006 1996 2225 3023 1012 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 2 (id = 2)


INFO:tensorflow:label: 2 (id = 2)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] nasa ' s newest robotic explorer rocket ##ed into space late friday from virginia , dazzling sky watch ##ers along the east coast . but the lad ##ee spacecraft quickly ran into equipment trouble , and while nasa assured everyone early saturday that the lunar probe was safe and on a perfect track for the moon , officials acknowledged the problem needs to be resolved in the next two to three weeks . s . peter word ##en , director of nasa ' s ames research center in california , told reporters he ' s confident everything will be working properly in the next few days . the spacecraft was developed at ames . lad ##ee ' s reaction wheels were turned on to orient and [SEP]


INFO:tensorflow:tokens: [CLS] nasa ' s newest robotic explorer rocket ##ed into space late friday from virginia , dazzling sky watch ##ers along the east coast . but the lad ##ee spacecraft quickly ran into equipment trouble , and while nasa assured everyone early saturday that the lunar probe was safe and on a perfect track for the moon , officials acknowledged the problem needs to be resolved in the next two to three weeks . s . peter word ##en , director of nasa ' s ames research center in california , told reporters he ' s confident everything will be working properly in the next few days . the spacecraft was developed at ames . lad ##ee ' s reaction wheels were turned on to orient and [SEP]


INFO:tensorflow:input_ids: 101 9274 1005 1055 14751 20478 10566 7596 2098 2046 2686 2397 5958 2013 3448 1010 28190 3712 3422 2545 2247 1996 2264 3023 1012 2021 1996 14804 4402 12076 2855 2743 2046 3941 4390 1010 1998 2096 9274 8916 3071 2220 5095 2008 1996 11926 15113 2001 3647 1998 2006 1037 3819 2650 2005 1996 4231 1010 4584 8969 1996 3291 3791 2000 2022 10395 1999 1996 2279 2048 2000 2093 3134 1012 1055 1012 2848 2773 2368 1010 2472 1997 9274 1005 1055 19900 2470 2415 1999 2662 1010 2409 12060 2002 1005 1055 9657 2673 2097 2022 2551 7919 1999 1996 2279 2261 2420 1012 1996 12076 2001 2764 2012 19900 1012 14804 4402 1005 1055 4668 7787 2020 2357 2006 2000 16865 1998 102


INFO:tensorflow:input_ids: 101 9274 1005 1055 14751 20478 10566 7596 2098 2046 2686 2397 5958 2013 3448 1010 28190 3712 3422 2545 2247 1996 2264 3023 1012 2021 1996 14804 4402 12076 2855 2743 2046 3941 4390 1010 1998 2096 9274 8916 3071 2220 5095 2008 1996 11926 15113 2001 3647 1998 2006 1037 3819 2650 2005 1996 4231 1010 4584 8969 1996 3291 3791 2000 2022 10395 1999 1996 2279 2048 2000 2093 3134 1012 1055 1012 2848 2773 2368 1010 2472 1997 9274 1005 1055 19900 2470 2415 1999 2662 1010 2409 12060 2002 1005 1055 9657 2673 2097 2022 2551 7919 1999 1996 2279 2261 2420 1012 1996 12076 2001 2764 2012 19900 1012 14804 4402 1005 1055 4668 7787 2020 2357 2006 2000 16865 1998 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] washington — police in ferguson , missouri , used u . s . military equipment during protests this summer . they sent officers with armored vehicles , assault rifles and body armor to at times peaceful marches . now , president barack obama is ordering up new rules for giving local police agencies such weapons . obama also proposed to spend $ 263 million over three years to expand training . it will increase the use of body cameras for recording police interaction with the public . the proposal includes $ 75 million that would help buy as many as 50 , 000 cameras . cameras like these might have provided more information in michael brown ' s death . in august , the unarmed black [SEP]


INFO:tensorflow:tokens: [CLS] washington — police in ferguson , missouri , used u . s . military equipment during protests this summer . they sent officers with armored vehicles , assault rifles and body armor to at times peaceful marches . now , president barack obama is ordering up new rules for giving local police agencies such weapons . obama also proposed to spend $ 263 million over three years to expand training . it will increase the use of body cameras for recording police interaction with the public . the proposal includes $ 75 million that would help buy as many as 50 , 000 cameras . cameras like these might have provided more information in michael brown ' s death . in august , the unarmed black [SEP]


INFO:tensorflow:input_ids: 101 2899 1517 2610 1999 11262 1010 5284 1010 2109 1057 1012 1055 1012 2510 3941 2076 8090 2023 2621 1012 2027 2741 3738 2007 10612 4683 1010 6101 9494 1998 2303 8177 2000 2012 2335 9379 20691 1012 2085 1010 2343 13857 8112 2003 13063 2039 2047 3513 2005 3228 2334 2610 6736 2107 4255 1012 8112 2036 3818 2000 5247 1002 25246 2454 2058 2093 2086 2000 7818 2731 1012 2009 2097 3623 1996 2224 1997 2303 8629 2005 3405 2610 8290 2007 1996 2270 1012 1996 6378 2950 1002 4293 2454 2008 2052 2393 4965 2004 2116 2004 2753 1010 2199 8629 1012 8629 2066 2122 2453 2031 3024 2062 2592 1999 2745 2829 1005 1055 2331 1012 1999 2257 1010 1996 23206 2304 102


INFO:tensorflow:input_ids: 101 2899 1517 2610 1999 11262 1010 5284 1010 2109 1057 1012 1055 1012 2510 3941 2076 8090 2023 2621 1012 2027 2741 3738 2007 10612 4683 1010 6101 9494 1998 2303 8177 2000 2012 2335 9379 20691 1012 2085 1010 2343 13857 8112 2003 13063 2039 2047 3513 2005 3228 2334 2610 6736 2107 4255 1012 8112 2036 3818 2000 5247 1002 25246 2454 2058 2093 2086 2000 7818 2731 1012 2009 2097 3623 1996 2224 1997 2303 8629 2005 3405 2610 8290 2007 1996 2270 1012 1996 6378 2950 1002 4293 2454 2008 2052 2393 4965 2004 2116 2004 2753 1010 2199 8629 1012 8629 2066 2122 2453 2031 3024 2062 2592 1999 2745 2829 1005 1055 2331 1012 1999 2257 1010 1996 23206 2304 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 2 (id = 2)


INFO:tensorflow:label: 2 (id = 2)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] a series of recent discoveries has scientists rev ##ising their image of mars : they are now beginning to suspect that the red planet ' s geology may be more complex and earth - like than previously imagined . new data suggests that the planet may have small bits of water spread around its surface . and it now seems that the planet ' s interior could once have been more geological ##ly mature than previously thought . the new data is courtesy of nasa ' s curiosity , a car - sized robotic rover that has been exploring parts of the surface of mars . launched in 2011 , the highly sophisticated vehicle is armed with an impressive arsenal of scientific instruments . in 2012 [SEP]


INFO:tensorflow:tokens: [CLS] a series of recent discoveries has scientists rev ##ising their image of mars : they are now beginning to suspect that the red planet ' s geology may be more complex and earth - like than previously imagined . new data suggests that the planet may have small bits of water spread around its surface . and it now seems that the planet ' s interior could once have been more geological ##ly mature than previously thought . the new data is courtesy of nasa ' s curiosity , a car - sized robotic rover that has been exploring parts of the surface of mars . launched in 2011 , the highly sophisticated vehicle is armed with an impressive arsenal of scientific instruments . in 2012 [SEP]


INFO:tensorflow:input_ids: 101 1037 2186 1997 3522 15636 2038 6529 7065 9355 2037 3746 1997 7733 1024 2027 2024 2085 2927 2000 8343 2008 1996 2417 4774 1005 1055 13404 2089 2022 2062 3375 1998 3011 1011 2066 2084 3130 8078 1012 2047 2951 6083 2008 1996 4774 2089 2031 2235 9017 1997 2300 3659 2105 2049 3302 1012 1998 2009 2085 3849 2008 1996 4774 1005 1055 4592 2071 2320 2031 2042 2062 9843 2135 9677 2084 3130 2245 1012 1996 2047 2951 2003 14571 1997 9274 1005 1055 10628 1010 1037 2482 1011 7451 20478 13631 2008 2038 2042 11131 3033 1997 1996 3302 1997 7733 1012 3390 1999 2249 1010 1996 3811 12138 4316 2003 4273 2007 2019 8052 9433 1997 4045 5693 1012 1999 2262 102


INFO:tensorflow:input_ids: 101 1037 2186 1997 3522 15636 2038 6529 7065 9355 2037 3746 1997 7733 1024 2027 2024 2085 2927 2000 8343 2008 1996 2417 4774 1005 1055 13404 2089 2022 2062 3375 1998 3011 1011 2066 2084 3130 8078 1012 2047 2951 6083 2008 1996 4774 2089 2031 2235 9017 1997 2300 3659 2105 2049 3302 1012 1998 2009 2085 3849 2008 1996 4774 1005 1055 4592 2071 2320 2031 2042 2062 9843 2135 9677 2084 3130 2245 1012 1996 2047 2951 2003 14571 1997 9274 1005 1055 10628 1010 1037 2482 1011 7451 20478 13631 2008 2038 2042 11131 3033 1997 1996 3302 1997 7733 1012 3390 1999 2249 1010 1996 3811 12138 4316 2003 4273 2007 2019 8052 9433 1997 4045 5693 1012 1999 2262 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:Writing example 0 of 953


INFO:tensorflow:Writing example 0 of 953


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] washington — before job seekers fill out an application for work making foam products for the aerospace industry at general plastics manufacturing co . in tacoma , washington , they have to take a math test . eighteen questions , 30 minutes , and using a cal ##cula ##tor is ok . they are asked how to convert inches to feet , read a tape measure and find the density of a block of foam ( mass divided by volume ) . basic middle school math , right ? it ' s supposed to be . but what troubles general plastics executive eric hahn is that although the company considers only prospective workers who have a high school education , only 1 in 10 who take [SEP]


INFO:tensorflow:tokens: [CLS] washington — before job seekers fill out an application for work making foam products for the aerospace industry at general plastics manufacturing co . in tacoma , washington , they have to take a math test . eighteen questions , 30 minutes , and using a cal ##cula ##tor is ok . they are asked how to convert inches to feet , read a tape measure and find the density of a block of foam ( mass divided by volume ) . basic middle school math , right ? it ' s supposed to be . but what troubles general plastics executive eric hahn is that although the company considers only prospective workers who have a high school education , only 1 in 10 who take [SEP]


INFO:tensorflow:input_ids: 101 2899 1517 2077 3105 24071 6039 2041 2019 4646 2005 2147 2437 17952 3688 2005 1996 13395 3068 2012 2236 26166 5814 2522 1012 1999 22954 1010 2899 1010 2027 2031 2000 2202 1037 8785 3231 1012 7763 3980 1010 2382 2781 1010 1998 2478 1037 10250 19879 4263 2003 7929 1012 2027 2024 2356 2129 2000 10463 5282 2000 2519 1010 3191 1037 6823 5468 1998 2424 1996 4304 1997 1037 3796 1997 17952 1006 3742 4055 2011 3872 1007 1012 3937 2690 2082 8785 1010 2157 1029 2009 1005 1055 4011 2000 2022 1012 2021 2054 13460 2236 26166 3237 4388 24266 2003 2008 2348 1996 2194 10592 2069 17464 3667 2040 2031 1037 2152 2082 2495 1010 2069 1015 1999 2184 2040 2202 102


INFO:tensorflow:input_ids: 101 2899 1517 2077 3105 24071 6039 2041 2019 4646 2005 2147 2437 17952 3688 2005 1996 13395 3068 2012 2236 26166 5814 2522 1012 1999 22954 1010 2899 1010 2027 2031 2000 2202 1037 8785 3231 1012 7763 3980 1010 2382 2781 1010 1998 2478 1037 10250 19879 4263 2003 7929 1012 2027 2024 2356 2129 2000 10463 5282 2000 2519 1010 3191 1037 6823 5468 1998 2424 1996 4304 1997 1037 3796 1997 17952 1006 3742 4055 2011 3872 1007 1012 3937 2690 2082 8785 1010 2157 1029 2009 1005 1055 4011 2000 2022 1012 2021 2054 13460 2236 26166 3237 4388 24266 2003 2008 2348 1996 2194 10592 2069 17464 3667 2040 2031 1037 2152 2082 2495 1010 2069 1015 1999 2184 2040 2202 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] a new issue is growing out of the drive for a higher minimum wage for fast - food employees : wage theft , a term for failing to pay workers what they ' re legally owed . in recent months , lawsuits charging wage theft abuses have been filed on behalf of fast - food workers in three states . public attorneys in some states have obtained he ##ft ##y settlements from employers charged with violations . the issue came to the forefront thursday in front of three mcdonald ' s and burger king restaurants in kansas city , mo . signs there pro ##claiming " wage theft " and " stolen wages " dotted a midday rally by the stand ##up ##k ##c coalition . [SEP]


INFO:tensorflow:tokens: [CLS] a new issue is growing out of the drive for a higher minimum wage for fast - food employees : wage theft , a term for failing to pay workers what they ' re legally owed . in recent months , lawsuits charging wage theft abuses have been filed on behalf of fast - food workers in three states . public attorneys in some states have obtained he ##ft ##y settlements from employers charged with violations . the issue came to the forefront thursday in front of three mcdonald ' s and burger king restaurants in kansas city , mo . signs there pro ##claiming " wage theft " and " stolen wages " dotted a midday rally by the stand ##up ##k ##c coalition . [SEP]


INFO:tensorflow:input_ids: 101 1037 2047 3277 2003 3652 2041 1997 1996 3298 2005 1037 3020 6263 11897 2005 3435 1011 2833 5126 1024 11897 11933 1010 1037 2744 2005 7989 2000 3477 3667 2054 2027 1005 2128 10142 12232 1012 1999 3522 2706 1010 20543 13003 11897 11933 21078 2031 2042 6406 2006 6852 1997 3435 1011 2833 3667 1999 2093 2163 1012 2270 16214 1999 2070 2163 2031 4663 2002 6199 2100 7617 2013 12433 5338 2007 13302 1012 1996 3277 2234 2000 1996 22870 9432 1999 2392 1997 2093 9383 1005 1055 1998 15890 2332 7884 1999 5111 2103 1010 9587 1012 5751 2045 4013 27640 1000 11897 11933 1000 1998 1000 7376 12678 1000 20384 1037 22878 8320 2011 1996 3233 6279 2243 2278 6056 1012 102


INFO:tensorflow:input_ids: 101 1037 2047 3277 2003 3652 2041 1997 1996 3298 2005 1037 3020 6263 11897 2005 3435 1011 2833 5126 1024 11897 11933 1010 1037 2744 2005 7989 2000 3477 3667 2054 2027 1005 2128 10142 12232 1012 1999 3522 2706 1010 20543 13003 11897 11933 21078 2031 2042 6406 2006 6852 1997 3435 1011 2833 3667 1999 2093 2163 1012 2270 16214 1999 2070 2163 2031 4663 2002 6199 2100 7617 2013 12433 5338 2007 13302 1012 1996 3277 2234 2000 1996 22870 9432 1999 2392 1997 2093 9383 1005 1055 1998 15890 2332 7884 1999 5111 2103 1010 9587 1012 5751 2045 4013 27640 1000 11897 11933 1000 1998 1000 7376 12678 1000 20384 1037 22878 8320 2011 1996 3233 6279 2243 2278 6056 1012 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] seoul , south korea — many people in south korea ' s capital are wearing face ##mas ##ks this week . they are trying to protect themselves from a deadly disease : middle east respiratory syndrome ( mer ##s ) . an outbreak of the disease began here in may . a man who caught mer ##s in the middle east brought it with him when he returned to seoul . # # 108 cases , lots of questions so far , south korea is the only country outside the middle east to have a mer ##s outbreak . the country now has 108 cases of mer ##s . many questions — and fears — surround the disease . how easy is it to catch ? [SEP]


INFO:tensorflow:tokens: [CLS] seoul , south korea — many people in south korea ' s capital are wearing face ##mas ##ks this week . they are trying to protect themselves from a deadly disease : middle east respiratory syndrome ( mer ##s ) . an outbreak of the disease began here in may . a man who caught mer ##s in the middle east brought it with him when he returned to seoul . # # 108 cases , lots of questions so far , south korea is the only country outside the middle east to have a mer ##s outbreak . the country now has 108 cases of mer ##s . many questions — and fears — surround the disease . how easy is it to catch ? [SEP]


INFO:tensorflow:input_ids: 101 10884 1010 2148 4420 1517 2116 2111 1999 2148 4420 1005 1055 3007 2024 4147 2227 9335 5705 2023 2733 1012 2027 2024 2667 2000 4047 3209 2013 1037 9252 4295 1024 2690 2264 16464 8715 1006 21442 2015 1007 1012 2019 8293 1997 1996 4295 2211 2182 1999 2089 1012 1037 2158 2040 3236 21442 2015 1999 1996 2690 2264 2716 2009 2007 2032 2043 2002 2513 2000 10884 1012 1001 1001 10715 3572 1010 7167 1997 3980 2061 2521 1010 2148 4420 2003 1996 2069 2406 2648 1996 2690 2264 2000 2031 1037 21442 2015 8293 1012 1996 2406 2085 2038 10715 3572 1997 21442 2015 1012 2116 3980 1517 1998 10069 1517 15161 1996 4295 1012 2129 3733 2003 2009 2000 4608 1029 102


INFO:tensorflow:input_ids: 101 10884 1010 2148 4420 1517 2116 2111 1999 2148 4420 1005 1055 3007 2024 4147 2227 9335 5705 2023 2733 1012 2027 2024 2667 2000 4047 3209 2013 1037 9252 4295 1024 2690 2264 16464 8715 1006 21442 2015 1007 1012 2019 8293 1997 1996 4295 2211 2182 1999 2089 1012 1037 2158 2040 3236 21442 2015 1999 1996 2690 2264 2716 2009 2007 2032 2043 2002 2513 2000 10884 1012 1001 1001 10715 3572 1010 7167 1997 3980 2061 2521 1010 2148 4420 2003 1996 2069 2406 2648 1996 2690 2264 2000 2031 1037 21442 2015 8293 1012 1996 2406 2085 2038 10715 3572 1997 21442 2015 1012 2116 3980 1517 1998 10069 1517 15161 1996 4295 1012 2129 3733 2003 2009 2000 4608 1029 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] michael mc ##ker ##nan ' s heart pounded as he approached the community college in gall ##atin , tennessee , and the exam that would decide his future . he had recently turned 18 , which aged him out of foster care . mc ##ker ##nan , a lan ##ky jokes ##ter who speaks with a southern t ##wang , had dropped out of high school and needed to test for the ge ##d that day or risk losing a shot at state - funded scholarships for college . he knew he could not afford tuition as an overnight stock ##er at wal - mart . it was dec . 11 , the last opportunity that year to take the test in his county . he [SEP]


INFO:tensorflow:tokens: [CLS] michael mc ##ker ##nan ' s heart pounded as he approached the community college in gall ##atin , tennessee , and the exam that would decide his future . he had recently turned 18 , which aged him out of foster care . mc ##ker ##nan , a lan ##ky jokes ##ter who speaks with a southern t ##wang , had dropped out of high school and needed to test for the ge ##d that day or risk losing a shot at state - funded scholarships for college . he knew he could not afford tuition as an overnight stock ##er at wal - mart . it was dec . 11 , the last opportunity that year to take the test in his county . he [SEP]


INFO:tensorflow:input_ids: 101 2745 11338 5484 7229 1005 1055 2540 13750 2004 2002 5411 1996 2451 2267 1999 26033 20363 1010 5298 1010 1998 1996 11360 2008 2052 5630 2010 2925 1012 2002 2018 3728 2357 2324 1010 2029 4793 2032 2041 1997 6469 2729 1012 11338 5484 7229 1010 1037 17595 4801 13198 3334 2040 8847 2007 1037 2670 1056 16600 1010 2018 3333 2041 1997 2152 2082 1998 2734 2000 3231 2005 1996 16216 2094 2008 2154 2030 3891 3974 1037 2915 2012 2110 1011 6787 15691 2005 2267 1012 2002 2354 2002 2071 2025 8984 15413 2004 2019 11585 4518 2121 2012 24547 1011 20481 1012 2009 2001 11703 1012 2340 1010 1996 2197 4495 2008 2095 2000 2202 1996 3231 1999 2010 2221 1012 2002 102


INFO:tensorflow:input_ids: 101 2745 11338 5484 7229 1005 1055 2540 13750 2004 2002 5411 1996 2451 2267 1999 26033 20363 1010 5298 1010 1998 1996 11360 2008 2052 5630 2010 2925 1012 2002 2018 3728 2357 2324 1010 2029 4793 2032 2041 1997 6469 2729 1012 11338 5484 7229 1010 1037 17595 4801 13198 3334 2040 8847 2007 1037 2670 1056 16600 1010 2018 3333 2041 1997 2152 2082 1998 2734 2000 3231 2005 1996 16216 2094 2008 2154 2030 3891 3974 1037 2915 2012 2110 1011 6787 15691 2005 2267 1012 2002 2354 2002 2071 2025 8984 15413 2004 2019 11585 4518 2121 2012 24547 1011 20481 1012 2009 2001 11703 1012 2340 1010 1996 2197 4495 2008 2095 2000 2202 1996 3231 1999 2010 2221 1012 2002 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] # # # pro : fight efforts to water them down washington — as a new school year begins , american parents should enthusiastically join first lady michelle obama ' s campaign for health ##ier school lunch ##es . her drive is based on sound nutritional science with the goal of health ##ier , happier kids . the first lady has made improving childhood health through better eating and more exercise her signature issue . that ' s a wise choice , since childhood obesity reached epidemic proportions : in 2012 , 1 in 3 american children were over ##weight or obe ##se . over ##weight children are at higher risk of developing a variety of ai ##lm ##ents , including cardiovascular disease and diabetes that [SEP]


INFO:tensorflow:tokens: [CLS] # # # pro : fight efforts to water them down washington — as a new school year begins , american parents should enthusiastically join first lady michelle obama ' s campaign for health ##ier school lunch ##es . her drive is based on sound nutritional science with the goal of health ##ier , happier kids . the first lady has made improving childhood health through better eating and more exercise her signature issue . that ' s a wise choice , since childhood obesity reached epidemic proportions : in 2012 , 1 in 3 american children were over ##weight or obe ##se . over ##weight children are at higher risk of developing a variety of ai ##lm ##ents , including cardiovascular disease and diabetes that [SEP]


INFO:tensorflow:input_ids: 101 1001 1001 1001 4013 1024 2954 4073 2000 2300 2068 2091 2899 1517 2004 1037 2047 2082 2095 4269 1010 2137 3008 2323 24935 3693 2034 3203 9393 8112 1005 1055 3049 2005 2740 3771 2082 6265 2229 1012 2014 3298 2003 2241 2006 2614 28268 2671 2007 1996 3125 1997 2740 3771 1010 19366 4268 1012 1996 2034 3203 2038 2081 9229 5593 2740 2083 2488 5983 1998 2062 6912 2014 8085 3277 1012 2008 1005 1055 1037 7968 3601 1010 2144 5593 24552 2584 16311 19173 1024 1999 2262 1010 1015 1999 1017 2137 2336 2020 2058 11179 2030 15578 3366 1012 2058 11179 2336 2024 2012 3020 3891 1997 4975 1037 3528 1997 9932 13728 11187 1010 2164 22935 4295 1998 14671 2008 102


INFO:tensorflow:input_ids: 101 1001 1001 1001 4013 1024 2954 4073 2000 2300 2068 2091 2899 1517 2004 1037 2047 2082 2095 4269 1010 2137 3008 2323 24935 3693 2034 3203 9393 8112 1005 1055 3049 2005 2740 3771 2082 6265 2229 1012 2014 3298 2003 2241 2006 2614 28268 2671 2007 1996 3125 1997 2740 3771 1010 19366 4268 1012 1996 2034 3203 2038 2081 9229 5593 2740 2083 2488 5983 1998 2062 6912 2014 8085 3277 1012 2008 1005 1055 1037 7968 3601 1010 2144 5593 24552 2584 16311 19173 1024 1999 2262 1010 1015 1999 1017 2137 2336 2020 2058 11179 2030 15578 3366 1012 2058 11179 2336 2024 2012 3020 3891 1997 4975 1037 3528 1997 9932 13728 11187 1010 2164 22935 4295 1998 14671 2008 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


#Creating a model


In [0]:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
                 num_labels):
  """Creates a classification model."""

  bert_module = hub.Module(
      BERT_MODEL_HUB,
      trainable=True)
  bert_inputs = dict(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids)
  bert_outputs = bert_module(
      inputs=bert_inputs,
      signature="tokens",
      as_dict=True)

  # Use "pooled_output" for classification tasks on an entire sentence.
  # Use "sequence_outputs" for token-level output.
  output_layer = bert_outputs["pooled_output"]

  hidden_size = output_layer.shape[-1].value

  # Create our own layer to tune for our data.
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):

    # Dropout helps prevent overfitting
    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    # Convert labels into one-hot encoding
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
    # If we're predicting, we want predicted labels and the probabiltiies.
    if is_predicting:
      return (predicted_labels, log_probs)

    # If we're train/eval, compute loss between predicted and actual label
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
    return (loss, predicted_labels, log_probs)


In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps,
                     num_warmup_steps):
  """Returns `model_fn` closure for TPUEstimator."""
  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
    
    # TRAIN and EVAL
    if not is_predicting:

      (loss, predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      train_op = bert.optimization.create_optimizer(
          loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)

      # Calculate evaluation metrics. 
      def metric_fn(label_ids, predicted_labels):
        accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
        """
        Removed since they are only 
        f1_score = tf.contrib.metrics.f1_score(
            label_ids,
            predicted_labels)
        auc = tf.metrics.auc(
            label_ids,
            predicted_labels)
        """
        recall = tf.metrics.recall(
            label_ids,
            predicted_labels)
        precision = tf.metrics.precision(
            label_ids,
            predicted_labels) 
        true_pos = tf.metrics.true_positives(
            label_ids,
            predicted_labels)
        true_neg = tf.metrics.true_negatives(
            label_ids,
            predicted_labels)   
        false_pos = tf.metrics.false_positives(
            label_ids,
            predicted_labels)  
        false_neg = tf.metrics.false_negatives(
            label_ids,
            predicted_labels)
        return {
            "eval_accuracy": accuracy,
            # "f1_score": f1_score,
            # "auc": auc,
            "precision": precision,
            "recall": recall,
            "true_positives": true_pos,
            "true_negatives": true_neg,
            "false_positives": false_pos,
            "false_negatives": false_neg
        }

      eval_metrics = metric_fn(label_ids, predicted_labels)

      if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode=mode,
          loss=loss,
          train_op=train_op)
      else:
          return tf.estimator.EstimatorSpec(mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics)
    else:
      (predicted_labels, log_probs) = create_model(
        is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

      predictions = {
          'probabilities': log_probs,
          'labels': predicted_labels
      }
      return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  # Return the actual model function in the closure
  return model_fn


In [0]:
### These hyperparameters are recommended by the authors for fine-tuning BERT ###

# Compute train and warmup steps from batch size
BATCH_SIZE = 32
LEARNING_RATE = 3e-5
NUM_TRAIN_EPOCHS = 3.0

# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1

# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)
print('Number of training steps: %d' % num_train_steps)
print('Number of warmup steps: %d' % num_warmup_steps)

Number of training steps: 803
Number of warmup steps: 80


In [0]:
# Specify outpit directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [0]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)


estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


INFO:tensorflow:Using config: {'_model_dir': '/content/gdrive/My Drive/585/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb5afea5908>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': '/content/gdrive/My Drive/585/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb5afea5908>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = bert.run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=False)

Finish setting up. Let's start training!

In [0]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

Beginning Training!
INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into /content/gdrive/My Drive/585/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into /content/gdrive/My Drive/585/model.ckpt.


INFO:tensorflow:loss = 1.6769224, step = 0


INFO:tensorflow:loss = 1.6769224, step = 0


INFO:tensorflow:global_step/sec: 1.56523


INFO:tensorflow:global_step/sec: 1.56523


INFO:tensorflow:loss = 0.6915046, step = 100 (63.895 sec)


INFO:tensorflow:loss = 0.6915046, step = 100 (63.895 sec)


INFO:tensorflow:global_step/sec: 2.10008


INFO:tensorflow:global_step/sec: 2.10008


INFO:tensorflow:loss = 0.948285, step = 200 (47.612 sec)


INFO:tensorflow:loss = 0.948285, step = 200 (47.612 sec)


INFO:tensorflow:global_step/sec: 2.09813


INFO:tensorflow:global_step/sec: 2.09813


INFO:tensorflow:loss = 0.6294203, step = 300 (47.662 sec)


INFO:tensorflow:loss = 0.6294203, step = 300 (47.662 sec)


INFO:tensorflow:global_step/sec: 2.09994


INFO:tensorflow:global_step/sec: 2.09994


INFO:tensorflow:loss = 0.69733214, step = 400 (47.620 sec)


INFO:tensorflow:loss = 0.69733214, step = 400 (47.620 sec)


INFO:tensorflow:Saving checkpoints for 500 into /content/gdrive/My Drive/585/model.ckpt.


INFO:tensorflow:Saving checkpoints for 500 into /content/gdrive/My Drive/585/model.ckpt.


INFO:tensorflow:global_step/sec: 1.83309


INFO:tensorflow:global_step/sec: 1.83309


INFO:tensorflow:loss = 0.24030101, step = 500 (54.557 sec)


INFO:tensorflow:loss = 0.24030101, step = 500 (54.557 sec)


INFO:tensorflow:global_step/sec: 2.10164


INFO:tensorflow:global_step/sec: 2.10164


INFO:tensorflow:loss = 0.38211882, step = 600 (47.578 sec)


INFO:tensorflow:loss = 0.38211882, step = 600 (47.578 sec)


INFO:tensorflow:global_step/sec: 2.09908


INFO:tensorflow:global_step/sec: 2.09908


INFO:tensorflow:loss = 0.18672721, step = 700 (47.644 sec)


INFO:tensorflow:loss = 0.18672721, step = 700 (47.644 sec)


INFO:tensorflow:global_step/sec: 2.10182


INFO:tensorflow:global_step/sec: 2.10182


INFO:tensorflow:loss = 0.08411118, step = 800 (47.574 sec)


INFO:tensorflow:loss = 0.08411118, step = 800 (47.574 sec)


INFO:tensorflow:Saving checkpoints for 803 into /content/gdrive/My Drive/585/model.ckpt.


INFO:tensorflow:Saving checkpoints for 803 into /content/gdrive/My Drive/585/model.ckpt.


INFO:tensorflow:Loss for final step: 0.19443211.


INFO:tensorflow:Loss for final step: 0.19443211.


Training took time  0:07:59.604802


Evaluation with our test data.

In [0]:
test_input_fn = run_classifier.input_fn_builder(
    features=test_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

In [0]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2019-12-09T05:55:29Z


INFO:tensorflow:Starting evaluation at 2019-12-09T05:55:29Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from /content/gdrive/My Drive/585/model.ckpt-803


INFO:tensorflow:Restoring parameters from /content/gdrive/My Drive/585/model.ckpt-803


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2019-12-09-05:55:43


INFO:tensorflow:Finished evaluation at 2019-12-09-05:55:43


INFO:tensorflow:Saving dict for global step 803: eval_accuracy = 0.6904512, false_negatives = 32.0, false_positives = 60.0, global_step = 803, loss = 0.8381214, precision = 0.9240506, recall = 0.95800525, true_negatives = 131.0, true_positives = 730.0


INFO:tensorflow:Saving dict for global step 803: eval_accuracy = 0.6904512, false_negatives = 32.0, false_positives = 60.0, global_step = 803, loss = 0.8381214, precision = 0.9240506, recall = 0.95800525, true_negatives = 131.0, true_positives = 730.0


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 803: /content/gdrive/My Drive/585/model.ckpt-803


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 803: /content/gdrive/My Drive/585/model.ckpt-803


{'eval_accuracy': 0.6904512,
 'false_negatives': 32.0,
 'false_positives': 60.0,
 'global_step': 803,
 'loss': 0.8381214,
 'precision': 0.9240506,
 'recall': 0.95800525,
 'true_negatives': 131.0,
 'true_positives': 730.0}

Now let's examine some of the examples that we got wrong:

In [0]:
def get_wrong_examples(test_set):
  input_examples = test_set.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                   text_a = x[DATA_COLUMN], 
                                                                   text_b = None, 
                                                                   label = x[LABEL_COLUMN]), axis=1)
  input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  
  pred_labels = [pred['labels'] for pred in list(predictions)]

  wrong_out = [] 
  for i in range(len(test_set)):
    pred_label = pred_labels[i]
    gt_label = test_set.iloc[i].label
    if pred_label != gt_label:
      wrong_out.append([gt_label, pred_label, test_set.iloc[i].text])
  
  return wrong_out



In [0]:
wrong_examples = get_wrong_examples(test)

INFO:tensorflow:Writing example 0 of 953


INFO:tensorflow:Writing example 0 of 953


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] washington — before job seekers fill out an application for work making foam products for the aerospace industry at general plastics manufacturing co . in tacoma , washington , they have to take a math test . eighteen questions , 30 minutes , and using a cal ##cula ##tor is ok . they are asked how to convert inches to feet , read a tape measure and find the density of a block of foam ( mass divided by volume ) . basic middle school math , right ? it ' s supposed to be . but what troubles general plastics executive eric hahn is that although the company considers only prospective workers who have a high school education , only 1 in 10 who take [SEP]


INFO:tensorflow:tokens: [CLS] washington — before job seekers fill out an application for work making foam products for the aerospace industry at general plastics manufacturing co . in tacoma , washington , they have to take a math test . eighteen questions , 30 minutes , and using a cal ##cula ##tor is ok . they are asked how to convert inches to feet , read a tape measure and find the density of a block of foam ( mass divided by volume ) . basic middle school math , right ? it ' s supposed to be . but what troubles general plastics executive eric hahn is that although the company considers only prospective workers who have a high school education , only 1 in 10 who take [SEP]


INFO:tensorflow:input_ids: 101 2899 1517 2077 3105 24071 6039 2041 2019 4646 2005 2147 2437 17952 3688 2005 1996 13395 3068 2012 2236 26166 5814 2522 1012 1999 22954 1010 2899 1010 2027 2031 2000 2202 1037 8785 3231 1012 7763 3980 1010 2382 2781 1010 1998 2478 1037 10250 19879 4263 2003 7929 1012 2027 2024 2356 2129 2000 10463 5282 2000 2519 1010 3191 1037 6823 5468 1998 2424 1996 4304 1997 1037 3796 1997 17952 1006 3742 4055 2011 3872 1007 1012 3937 2690 2082 8785 1010 2157 1029 2009 1005 1055 4011 2000 2022 1012 2021 2054 13460 2236 26166 3237 4388 24266 2003 2008 2348 1996 2194 10592 2069 17464 3667 2040 2031 1037 2152 2082 2495 1010 2069 1015 1999 2184 2040 2202 102


INFO:tensorflow:input_ids: 101 2899 1517 2077 3105 24071 6039 2041 2019 4646 2005 2147 2437 17952 3688 2005 1996 13395 3068 2012 2236 26166 5814 2522 1012 1999 22954 1010 2899 1010 2027 2031 2000 2202 1037 8785 3231 1012 7763 3980 1010 2382 2781 1010 1998 2478 1037 10250 19879 4263 2003 7929 1012 2027 2024 2356 2129 2000 10463 5282 2000 2519 1010 3191 1037 6823 5468 1998 2424 1996 4304 1997 1037 3796 1997 17952 1006 3742 4055 2011 3872 1007 1012 3937 2690 2082 8785 1010 2157 1029 2009 1005 1055 4011 2000 2022 1012 2021 2054 13460 2236 26166 3237 4388 24266 2003 2008 2348 1996 2194 10592 2069 17464 3667 2040 2031 1037 2152 2082 2495 1010 2069 1015 1999 2184 2040 2202 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] a new issue is growing out of the drive for a higher minimum wage for fast - food employees : wage theft , a term for failing to pay workers what they ' re legally owed . in recent months , lawsuits charging wage theft abuses have been filed on behalf of fast - food workers in three states . public attorneys in some states have obtained he ##ft ##y settlements from employers charged with violations . the issue came to the forefront thursday in front of three mcdonald ' s and burger king restaurants in kansas city , mo . signs there pro ##claiming " wage theft " and " stolen wages " dotted a midday rally by the stand ##up ##k ##c coalition . [SEP]


INFO:tensorflow:tokens: [CLS] a new issue is growing out of the drive for a higher minimum wage for fast - food employees : wage theft , a term for failing to pay workers what they ' re legally owed . in recent months , lawsuits charging wage theft abuses have been filed on behalf of fast - food workers in three states . public attorneys in some states have obtained he ##ft ##y settlements from employers charged with violations . the issue came to the forefront thursday in front of three mcdonald ' s and burger king restaurants in kansas city , mo . signs there pro ##claiming " wage theft " and " stolen wages " dotted a midday rally by the stand ##up ##k ##c coalition . [SEP]


INFO:tensorflow:input_ids: 101 1037 2047 3277 2003 3652 2041 1997 1996 3298 2005 1037 3020 6263 11897 2005 3435 1011 2833 5126 1024 11897 11933 1010 1037 2744 2005 7989 2000 3477 3667 2054 2027 1005 2128 10142 12232 1012 1999 3522 2706 1010 20543 13003 11897 11933 21078 2031 2042 6406 2006 6852 1997 3435 1011 2833 3667 1999 2093 2163 1012 2270 16214 1999 2070 2163 2031 4663 2002 6199 2100 7617 2013 12433 5338 2007 13302 1012 1996 3277 2234 2000 1996 22870 9432 1999 2392 1997 2093 9383 1005 1055 1998 15890 2332 7884 1999 5111 2103 1010 9587 1012 5751 2045 4013 27640 1000 11897 11933 1000 1998 1000 7376 12678 1000 20384 1037 22878 8320 2011 1996 3233 6279 2243 2278 6056 1012 102


INFO:tensorflow:input_ids: 101 1037 2047 3277 2003 3652 2041 1997 1996 3298 2005 1037 3020 6263 11897 2005 3435 1011 2833 5126 1024 11897 11933 1010 1037 2744 2005 7989 2000 3477 3667 2054 2027 1005 2128 10142 12232 1012 1999 3522 2706 1010 20543 13003 11897 11933 21078 2031 2042 6406 2006 6852 1997 3435 1011 2833 3667 1999 2093 2163 1012 2270 16214 1999 2070 2163 2031 4663 2002 6199 2100 7617 2013 12433 5338 2007 13302 1012 1996 3277 2234 2000 1996 22870 9432 1999 2392 1997 2093 9383 1005 1055 1998 15890 2332 7884 1999 5111 2103 1010 9587 1012 5751 2045 4013 27640 1000 11897 11933 1000 1998 1000 7376 12678 1000 20384 1037 22878 8320 2011 1996 3233 6279 2243 2278 6056 1012 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] seoul , south korea — many people in south korea ' s capital are wearing face ##mas ##ks this week . they are trying to protect themselves from a deadly disease : middle east respiratory syndrome ( mer ##s ) . an outbreak of the disease began here in may . a man who caught mer ##s in the middle east brought it with him when he returned to seoul . # # 108 cases , lots of questions so far , south korea is the only country outside the middle east to have a mer ##s outbreak . the country now has 108 cases of mer ##s . many questions — and fears — surround the disease . how easy is it to catch ? [SEP]


INFO:tensorflow:tokens: [CLS] seoul , south korea — many people in south korea ' s capital are wearing face ##mas ##ks this week . they are trying to protect themselves from a deadly disease : middle east respiratory syndrome ( mer ##s ) . an outbreak of the disease began here in may . a man who caught mer ##s in the middle east brought it with him when he returned to seoul . # # 108 cases , lots of questions so far , south korea is the only country outside the middle east to have a mer ##s outbreak . the country now has 108 cases of mer ##s . many questions — and fears — surround the disease . how easy is it to catch ? [SEP]


INFO:tensorflow:input_ids: 101 10884 1010 2148 4420 1517 2116 2111 1999 2148 4420 1005 1055 3007 2024 4147 2227 9335 5705 2023 2733 1012 2027 2024 2667 2000 4047 3209 2013 1037 9252 4295 1024 2690 2264 16464 8715 1006 21442 2015 1007 1012 2019 8293 1997 1996 4295 2211 2182 1999 2089 1012 1037 2158 2040 3236 21442 2015 1999 1996 2690 2264 2716 2009 2007 2032 2043 2002 2513 2000 10884 1012 1001 1001 10715 3572 1010 7167 1997 3980 2061 2521 1010 2148 4420 2003 1996 2069 2406 2648 1996 2690 2264 2000 2031 1037 21442 2015 8293 1012 1996 2406 2085 2038 10715 3572 1997 21442 2015 1012 2116 3980 1517 1998 10069 1517 15161 1996 4295 1012 2129 3733 2003 2009 2000 4608 1029 102


INFO:tensorflow:input_ids: 101 10884 1010 2148 4420 1517 2116 2111 1999 2148 4420 1005 1055 3007 2024 4147 2227 9335 5705 2023 2733 1012 2027 2024 2667 2000 4047 3209 2013 1037 9252 4295 1024 2690 2264 16464 8715 1006 21442 2015 1007 1012 2019 8293 1997 1996 4295 2211 2182 1999 2089 1012 1037 2158 2040 3236 21442 2015 1999 1996 2690 2264 2716 2009 2007 2032 2043 2002 2513 2000 10884 1012 1001 1001 10715 3572 1010 7167 1997 3980 2061 2521 1010 2148 4420 2003 1996 2069 2406 2648 1996 2690 2264 2000 2031 1037 21442 2015 8293 1012 1996 2406 2085 2038 10715 3572 1997 21442 2015 1012 2116 3980 1517 1998 10069 1517 15161 1996 4295 1012 2129 3733 2003 2009 2000 4608 1029 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:label: 3 (id = 3)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] michael mc ##ker ##nan ' s heart pounded as he approached the community college in gall ##atin , tennessee , and the exam that would decide his future . he had recently turned 18 , which aged him out of foster care . mc ##ker ##nan , a lan ##ky jokes ##ter who speaks with a southern t ##wang , had dropped out of high school and needed to test for the ge ##d that day or risk losing a shot at state - funded scholarships for college . he knew he could not afford tuition as an overnight stock ##er at wal - mart . it was dec . 11 , the last opportunity that year to take the test in his county . he [SEP]


INFO:tensorflow:tokens: [CLS] michael mc ##ker ##nan ' s heart pounded as he approached the community college in gall ##atin , tennessee , and the exam that would decide his future . he had recently turned 18 , which aged him out of foster care . mc ##ker ##nan , a lan ##ky jokes ##ter who speaks with a southern t ##wang , had dropped out of high school and needed to test for the ge ##d that day or risk losing a shot at state - funded scholarships for college . he knew he could not afford tuition as an overnight stock ##er at wal - mart . it was dec . 11 , the last opportunity that year to take the test in his county . he [SEP]


INFO:tensorflow:input_ids: 101 2745 11338 5484 7229 1005 1055 2540 13750 2004 2002 5411 1996 2451 2267 1999 26033 20363 1010 5298 1010 1998 1996 11360 2008 2052 5630 2010 2925 1012 2002 2018 3728 2357 2324 1010 2029 4793 2032 2041 1997 6469 2729 1012 11338 5484 7229 1010 1037 17595 4801 13198 3334 2040 8847 2007 1037 2670 1056 16600 1010 2018 3333 2041 1997 2152 2082 1998 2734 2000 3231 2005 1996 16216 2094 2008 2154 2030 3891 3974 1037 2915 2012 2110 1011 6787 15691 2005 2267 1012 2002 2354 2002 2071 2025 8984 15413 2004 2019 11585 4518 2121 2012 24547 1011 20481 1012 2009 2001 11703 1012 2340 1010 1996 2197 4495 2008 2095 2000 2202 1996 3231 1999 2010 2221 1012 2002 102


INFO:tensorflow:input_ids: 101 2745 11338 5484 7229 1005 1055 2540 13750 2004 2002 5411 1996 2451 2267 1999 26033 20363 1010 5298 1010 1998 1996 11360 2008 2052 5630 2010 2925 1012 2002 2018 3728 2357 2324 1010 2029 4793 2032 2041 1997 6469 2729 1012 11338 5484 7229 1010 1037 17595 4801 13198 3334 2040 8847 2007 1037 2670 1056 16600 1010 2018 3333 2041 1997 2152 2082 1998 2734 2000 3231 2005 1996 16216 2094 2008 2154 2030 3891 3974 1037 2915 2012 2110 1011 6787 15691 2005 2267 1012 2002 2354 2002 2071 2025 8984 15413 2004 2019 11585 4518 2121 2012 24547 1011 20481 1012 2009 2001 11703 1012 2340 1010 1996 2197 4495 2008 2095 2000 2202 1996 3231 1999 2010 2221 1012 2002 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:label: 0 (id = 0)


INFO:tensorflow:*** Example ***


INFO:tensorflow:*** Example ***


INFO:tensorflow:guid: None


INFO:tensorflow:guid: None


INFO:tensorflow:tokens: [CLS] # # # pro : fight efforts to water them down washington — as a new school year begins , american parents should enthusiastically join first lady michelle obama ' s campaign for health ##ier school lunch ##es . her drive is based on sound nutritional science with the goal of health ##ier , happier kids . the first lady has made improving childhood health through better eating and more exercise her signature issue . that ' s a wise choice , since childhood obesity reached epidemic proportions : in 2012 , 1 in 3 american children were over ##weight or obe ##se . over ##weight children are at higher risk of developing a variety of ai ##lm ##ents , including cardiovascular disease and diabetes that [SEP]


INFO:tensorflow:tokens: [CLS] # # # pro : fight efforts to water them down washington — as a new school year begins , american parents should enthusiastically join first lady michelle obama ' s campaign for health ##ier school lunch ##es . her drive is based on sound nutritional science with the goal of health ##ier , happier kids . the first lady has made improving childhood health through better eating and more exercise her signature issue . that ' s a wise choice , since childhood obesity reached epidemic proportions : in 2012 , 1 in 3 american children were over ##weight or obe ##se . over ##weight children are at higher risk of developing a variety of ai ##lm ##ents , including cardiovascular disease and diabetes that [SEP]


INFO:tensorflow:input_ids: 101 1001 1001 1001 4013 1024 2954 4073 2000 2300 2068 2091 2899 1517 2004 1037 2047 2082 2095 4269 1010 2137 3008 2323 24935 3693 2034 3203 9393 8112 1005 1055 3049 2005 2740 3771 2082 6265 2229 1012 2014 3298 2003 2241 2006 2614 28268 2671 2007 1996 3125 1997 2740 3771 1010 19366 4268 1012 1996 2034 3203 2038 2081 9229 5593 2740 2083 2488 5983 1998 2062 6912 2014 8085 3277 1012 2008 1005 1055 1037 7968 3601 1010 2144 5593 24552 2584 16311 19173 1024 1999 2262 1010 1015 1999 1017 2137 2336 2020 2058 11179 2030 15578 3366 1012 2058 11179 2336 2024 2012 3020 3891 1997 4975 1037 3528 1997 9932 13728 11187 1010 2164 22935 4295 1998 14671 2008 102


INFO:tensorflow:input_ids: 101 1001 1001 1001 4013 1024 2954 4073 2000 2300 2068 2091 2899 1517 2004 1037 2047 2082 2095 4269 1010 2137 3008 2323 24935 3693 2034 3203 9393 8112 1005 1055 3049 2005 2740 3771 2082 6265 2229 1012 2014 3298 2003 2241 2006 2614 28268 2671 2007 1996 3125 1997 2740 3771 1010 19366 4268 1012 1996 2034 3203 2038 2081 9229 5593 2740 2083 2488 5983 1998 2062 6912 2014 8085 3277 1012 2008 1005 1055 1037 7968 3601 1010 2144 5593 24552 2584 16311 19173 1024 1999 2262 1010 1015 1999 1017 2137 2336 2020 2058 11179 2030 15578 3366 1012 2058 11179 2336 2024 2012 3020 3891 1997 4975 1037 3528 1997 9932 13728 11187 1010 2164 22935 4295 1998 14671 2008 102


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:label: 1 (id = 1)


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from /content/gdrive/My Drive/585/model.ckpt-803


INFO:tensorflow:Restoring parameters from /content/gdrive/My Drive/585/model.ckpt-803


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


In [0]:
more_than_two_lvl = 0
for gt_label, pred_label, text in wrong_examples:
  if abs(gt_label > pred_label) > 1:
    more_than_two_lvl += 1
  print('Ground-truth label: %d. Predicted label: %d. Text:' % (gt_label, pred_label))
  # print(text)
  print()

print('BERT got %d examples wrong in total, in which %d were off by at least 2 levels.' % (len(wrong_examples), more_than_two_lvl))

Ground-truth label: 1. Predicted label: 0. Text:

Ground-truth label: 2. Predicted label: 1. Text:

Ground-truth label: 0. Predicted label: 2. Text:

Ground-truth label: 4. Predicted label: 3. Text:

Ground-truth label: 0. Predicted label: 1. Text:

Ground-truth label: 4. Predicted label: 3. Text:

Ground-truth label: 0. Predicted label: 1. Text:

Ground-truth label: 2. Predicted label: 3. Text:

Ground-truth label: 3. Predicted label: 2. Text:

Ground-truth label: 3. Predicted label: 2. Text:

Ground-truth label: 1. Predicted label: 0. Text:

Ground-truth label: 2. Predicted label: 1. Text:

Ground-truth label: 2. Predicted label: 1. Text:

Ground-truth label: 0. Predicted label: 1. Text:

Ground-truth label: 0. Predicted label: 2. Text:

Ground-truth label: 4. Predicted label: 3. Text:

Ground-truth label: 3. Predicted label: 4. Text:

Ground-truth label: 0. Predicted label: 1. Text:

Ground-truth label: 2. Predicted label: 1. Text:

Ground-truth label: 2. Predicted label: 3. Text:

