# Movie dataset NER
Source: https://aihub.cloud.google.com/p/products%2F2290fc65-0041-4c87-a898-0289f59aa8ba

The notebook is broken down into the following three sections:
   * NER packages: An overview of the language, license, and methodology of commercially available NER packages
   * NER with SpaCy : Code examples for training and serving a custom NER model in SpaCy
   * NER with Tensorflow : Code examples for creating, training, and serving a custom deep-learning NER model with Tensorflow

NER may be implemented with a variety of statistical and rule-based methods with varying amounts of feature engineering. All production-ready NER methods are at least semi-supervised, though unsupervised approaches are an emerging research topic.

#### Supervised statistical

Supervised statistical approaches to NER typically use either Hidden Markov Models (HMM), Maximum Entropy (ME), or Conditional Random Fields (CRF). OpenNLP's statistical NER relies on ME. GATE relies on HMM.
Typical feature engineering approaches for NER include such approaches as orthography, n-grams, lexicons, suffixes and prefixes, unsupervised cluster features, and trigger words for named entities (such as river or lake). These features are generated algorithmically in a rule-based manner.
    
#### Supervised rule-based

OpenNLP contains rule based (as well as statistical) NER. The rule-based approach relies on a series of regular expression matches. The feature generation seems to be done with a beam search to determine the word context.
DBPedia spotlight performs NER with substring matching using the Aho-Corasick algorithm. The approach only uses tokenization with no other feature engineering. The two-step approach first involves generating all possible candidate annotations that form known labels. This is rule-based in that it involves identifying nouns, prepositions, capitalized words, and known entities. This is based on OpenNLP under the hood. The second step selects the best candidates from the proposed candidates. Each candidate is scored based on annotation probability using a version of tf-idf with article links and anchor texts instead of documents and terms.

#### Supervised deep learning

**[SpaCy](https://spacy.io),** which is one of the most popular productionized NER environments, **uses residual convolutional neural networks (CNN) and incremental parsing with Bloom embeddings for NER.** See [this](https://www.youtube.com/watch?v=sqDHBH9IjRU) Youtube explanation from the developers for more detail. To summarize the algorithm, 1D convolutional filters are applied over the input text to predict how the upcoming words may change the current entity tags. Upcoming words may either shift (change the entity), reduce (make the entity more granular), or output the entity. The input sequence is embedded with bloom embeddings, which model the characters, prefix, suffix, and part of speech of each word. Residual blocks are used for the CNNs, andn the filter sizes are chosen with beam search.

Recurrent neural network (RNN) approaches to NER also exist, typically comprising long short term memory networks (LSTM) at either the word- or character-level, relying on word or character embeddings, respectively (e.g. word2vec, gloVe, FASTtext).
    
## NER Evaluation metrics

NER is most commonly evaluated with precision, recall, and F1-score. F1-score can either be relaxed or strict, with the latter requiring the character offsets to match exactly. 

## Movie NER
While these pretrained models are often sufficient for general applications, we will consider a domain-specific application of NER on the [MIT Movies corpus](https://groups.csail.mit.edu/sls/downloads/movie/), which contains 10,000 queries about various aspects of movies, with the following entity labels:

| Type | Example |
------- | ------- |
| ACTOR | Matt Damon |
| YEAR | 1980s |
| TITLE | Pulp Fiction
| GENRE | science fiction
| DIRECTOR | George Lucas |
| SONG | Aerosmith |
| PLOT | Flying cars |
| REVIEW | must see |
| CHARACTER | Queen Elizabeth |
|RATING | PG-13 |
|RATINGS_AVERAGE | best rated |
| TRAILER | preview

As these tables show, the pretrained SpaCy models would not be sufficient to identify entities to help answer a question such as "did george clooney make a science fiction movie in the 1980s?" While the pre-trained entities may identify the presence of `PERSON`, `DATE`, and `PRODUCT`, a custom model should be able to detect `ACTOR`, `GENRE`, and `DATE`. In the following sections, we will compare the results of applying a pre-trained and a custom-trained model to the MIT movies corpus.

### Install and import required packages

In [53]:
import sys
!{sys.executable} -m pip install spacy # !{sys.executable} ensures package installation in conda env

import random
import time
import numpy as np



### Load and transform data

Create the data directory if it doesn't exist

In [1]:
from os import path, mkdir
if not path.isdir("data/"):
    mkdir("data/")
if not path.isdir("models/"):
    mkdir("models/")

Download the test and training dataset from MIT's Computer Science and Aritficial Intelligence Laboratory (CSAIL)

In [3]:
!curl https://groups.csail.mit.edu/sls/downloads/movie/engtest.bio -o data/test.txt
!curl https://groups.csail.mit.edu/sls/downloads/movie/engtrain.bio -o data/train.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  246k  100  246k    0     0  90713      0  0:00:02  0:00:02 --:--:-- 90680
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  989k  100  989k    0     0   162k      0  0:00:06  0:00:06 --:--:--  213k


<img src="https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.3/kfp-components/notebooks/entity_extraction/assets/fig1.png" width="700" align = "left"/>

SpaCy requires training data to be in the format of `TRAIN_DATA = [(Sentence, {entities: [(start, end, label)]}, ...]`. The `load_data` function parses and transforms the input data into the required format for spaCy.

In [54]:
def load_data_spacy(file_path):
    ''' Converts data from:
    label \t word \n label \t word \n \n label \t word
    to: sentence, {entities : [(start, end, label), (stard, end, label)]}
    '''
    file = open(file_path, 'r')
    training_data, entities, sentence, unique_labels = [], [], [], []
    current_annotation = None
    end = 0 # initialize counter to keep track of start and end characters
    for line in file:
        line = line.strip("\n").split("\t")
        # lines with len > 1 are words
        if len(line) > 1:
            label = line[0][2:]     # the .txt is formatted: label \t word, label[0:2] = label_type
            label_type = line[0][0] # beginning of annotations - "B", intermediate - "I"
            word = line[1]
            sentence.append(word)
            end += (len(word) + 1)  # length of the word + trailing space
            
            if label_type != 'I' and current_annotation:  # if at the end of an annotation
                entities.append((start, end - 2 - len(word), current_annotation))  # append the annotation
                current_annotation = None                 # reset the annotation
            if label_type == 'B':                         # if beginning new annotation
                start = end - len(word) - 1  # start annotation at beginning of word
                current_annotation = label   # append the word to the current annotation
            if label_type == 'I':            # if the annotation is multi-word
                current_annotation = label   # append the word
            
            if label != 'O' and label not in unique_labels:
                unique_labels.append(label)
 
        # lines with len == 1 are breaks between sentences
        if len(line) == 1: 
            if current_annotation:
                entities.append((start, end - 1, current_annotation))
            sentence = " ".join(sentence)
            training_data.append([sentence, {'entities' : entities}])
            # reset the counters and temporary lists
            end = 0            
            entities, sentence = [], []
            current_annotation = None
    file.close()
    return training_data, unique_labels            
            
TRAIN_DATA, LABELS = load_data_spacy("data/train.txt")

### Data overview

Sample sentences from the training data, which contains queries about movie information

In [55]:
[x[0] for x in TRAIN_DATA[1:10]]

['show me films with drew barrymore from the 1980s',
 'what movies starred both al pacino and robert deniro',
 'find me all of the movies that starred harold ramis and bill murray',
 'find me a movie with a quote about baseball in it',
 'what movies have mississippi in the title',
 'show me science fiction films directed by steven spielberg',
 'do you have any thrillers directed by sofia coppola',
 'what leonard cohen songs have been used in a movie',
 'show me films elvis films set in hawaii']

Sample labeled annotations for the training data

In [56]:
[x[1] for x in TRAIN_DATA[1:10]]

[{'entities': [(19, 33, 'ACTOR'), (43, 48, 'YEAR')]},
 {'entities': [(25, 34, 'ACTOR'), (39, 52, 'ACTOR')]},
 {'entities': [(39, 51, 'ACTOR'), (56, 67, 'ACTOR')]},
 {'entities': []},
 {'entities': [(17, 28, 'TITLE')]},
 {'entities': [(8, 29, 'GENRE'), (42, 58, 'DIRECTOR')]},
 {'entities': [(16, 25, 'GENRE'), (38, 51, 'DIRECTOR')]},
 {'entities': [(5, 24, 'SONG')]},
 {'entities': [(14, 19, 'ACTOR'), (26, 39, 'PLOT')]}]

### Test pre-trained NER Model

First, download the pre-trained model with a subprocess call.

The pretrained model fails to identify any genres, plots, actors, directors, characters, movie titles, or ratings present in the movie queries. Interestingly, it also fails to identify persons, works of art, and products! Clearly, the pretrained model does not fit this domain application, so we will train our own model from scratch.


### Evaluation Metrics

Model performance is assessed on the entirety of the test dataset (2,443 sentences) based on the following metrics and their definitions.

   * Precision: true positives / (true positives + false positives)
   * Recall: true positives / (true positives + false negatives)
   * F1-score: harmonic average of precision and recall

In [57]:
def calc_precision(pred, true):        
    precision = len([x for x in pred if x in true]) / (len(pred) + 1e-20) # true positives / total pred
    return precision

def calc_recall(pred, true):
    recall = len([x for x in true if x in pred]) / (len(true) + 1e-20)    # true positives / total test
    return recall

def calc_f1(precision, recall):
    f1 = 2 * ((precision * recall) / (precision + recall + 1e-20))
    return f1

<img src="https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.3/kfp-components/notebooks/entity_extraction/assets/fig7.png" align = "left" width="200"/>

# 3. Named entity recognition with Tensorflow

This section focuses on developing, training, and serving a custom NER architecture with Tensorflow 1.9.0. We will implement an LSTM-CRF model as described in [Huang, Xu, and Yu, 2015](https://arxiv.org/pdf/1508.01991.pdf).

This approach can be broken down into its constituent parts as follows:
   * Embedding: Generating a dense vector representation of words
   * LSTM: Incorporating past and future features to generate a representation of each time step
   * CRF: Make use of neighboring information to predict current tags. The CRF approach has been shown to provide higher accuracy than maximum entropy models because CRF considers the entire sentence rather than relying on beam search to find optimal context sizes. 


In [58]:
#import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import tensorflow_addons as tfa

from functools import partial
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
import pickle

params = {
    'dim' : 300,            # dimension of embeddings
    'maximum_steps' : 1000, # number of training steps        
    'lstm_size' : 150,      # dimension of LSTM
    'batch_size' : 25,      # batch size
    'max_words' : 10000,    # maximum number of words to embed
    'padding_size' : 20,    # maximum sentence size
    'num_classes' : 14,     # number of unique classes
    'save_dir' : 'models/' # directory to save hash tables, model weights, etc.
}

The first step in implementing a tensorflow named entity recognition architecture is to specify the data loading and transformation process. The words and labels need to be transformed to an integer vector format that tensorflow can process. Tokenization is used to do this, where unique words and labels are mapped to integers and the mapping is stored in a hashtable for back-conversion. 

For this process to work, however, we have to see all of the training data all at once to prevent overlapping hashes. This means that this tokenization process needs to happen separately from the training process. The `make_tokenizer` function takes in the training data and labels and returns two dictionaries, `word_index`, and `labels_index`. The former specifies integer mappings for the words, and the latter specifies integer mappings for the labels. 

In [59]:
def save_obj(directory, obj, name):
    '''Helper function using pickle to save and load objects'''
    with open(directory + name + '.pkl', 'wb+') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

def load_obj(directory, name):
    '''Helper function using pickle to save and load objects'''
    with open(directory + name + ".pkl", "rb") as f:
        return pickle.load(f)
    
def load_data(file = "data/train.txt"):
    '''Helper function to load and transform inputs and labels
    included as a separate function due to NER-specific evaluation needs:
        tensorflow does not have multi-class precision/accuracy as a metric
        so data_y is needed to manually calculate evaluations'''
    file = open(file, 'r')
    sentence, labels = [], []
    data_x, data_y = [], []
    for line in file:
        line = line.strip("\n").split("\t")
        
        # lines with len > 1 are words
        if len(line) > 1:
            sentence.append(line[1])
            labels.append(line[0][2:]) if len(line[0]) > 1 else labels.append(line[0])
        
        # lins with len == 1 are sentence breaks
        if len(line) == 1: 
            data_x.append(' '.join(sentence))
            data_y.append(labels)
            sentence, labels = [], []
    return data_x, data_y

def make_tokenizer(file = "data/train.txt", params = params):
    ''' In order for one hot encoding of words and labels to work, 
    every word and label has to be seen at least once to make a hashing table.
    This function outputs hash tables for the words and the labels
    that can be used to one-hot-encode them in the generator
    '''
    # Load parameters and data
    max_words = params['max_words']
    padding_size = params['padding_size']
    save_dir = params['save_dir']
    data_x, data_y = load_data(file)
            
    # Use the Keras tokenizer API to generate hashing table for data_x
    tokenizer = Tokenizer(num_words = max_words)
    
    tokenizer.fit_on_texts(data_x)
    word_index = tokenizer.word_index
    
    # Flatten data_y and create hashing table using set logic
    data_y_flattened = [item for sublist in data_y for item in sublist]
    data_x_flattened = [item for sublist in data_x for item in sublist]
    
    labels_index = dict([(y, x + 1) for x, y in enumerate(sorted(set(data_y_flattened)))])
    labels = []
    for item in data_y:
        labels.append([labels_index.get(i) for i in item])
    labels_lookup = {v : k for k, v in labels_index.items()} # reverse dictionary for lookup
    # save hash tables to disk for model serving
    for item, name in zip([word_index, labels_index, labels_lookup],
                          ["word_index", "labels_index", "labels_lookup"]):
        save_obj(save_dir, item, name)
    return word_index, labels_index, labels_lookup

word_index, labels_index, labels_lookup = make_tokenizer()

In [60]:
labels_index

{'ACTOR': 1,
 'CHARACTER': 2,
 'DIRECTOR': 3,
 'GENRE': 4,
 'O': 5,
 'PLOT': 6,
 'RATING': 7,
 'RATINGS_AVERAGE': 8,
 'REVIEW': 9,
 'SONG': 10,
 'TITLE': 11,
 'TRAILER': 12,
 'YEAR': 13}

Next, we create a generator to serve as input to the tensorflow DataSet API. The `generate_batches` function takes training data in BIO format and yields batches as input to the model function. The DataSet API requires two inputs - features and labels. For a recurrent neural network, we also need to specify sequence lengths to mask variable length sequences. This length is returned as a tuple in the features, as `(batch_x, lengths)`.

In [61]:
def generate_batches(file = "data/train.txt", params = params, train = True):
    ''' Generate minibatch with dimensions:
    batch_x : (batch_size, max_len)
    lengths : (batch_size,)
    batch_y : (batch_size, num_classes)
    
    file : path to .txt containing training data in BIO format
    '''
    
    batch_size = params['batch_size']
    max_len = params['padding_size']
    save_dir = params['save_dir']
    
    # load hash tables for tokenization
    for item, name in zip([word_index, labels_index, labels_lookup],
                          ["word_index", "labels_index", "labels_lookup"]):
        item = load_obj(save_dir, name)
    
    while True:
        with open(file, 'r') as f:
            batch_x, lengths, batch_y = [], [], []
            words, labels = [], []
            for line in f:
                line = line.strip("\n").split("\t")
                # lines with len > 1 are words
                if len(line) > 1:
                    labels.append(line[0][2:]) if len(line[0]) > 1 else labels.append(line[0])
                    words.append(line[1])

                # lines with len == 1 are breaks between sentences
                if len(line) == 1: 
                    words = [word_index.get(x) if x in word_index.keys() else 0 for x in words]
                    labels = [labels_index.get(y) for y in labels]
                    batch_x.append(words)
                    batch_y.append(labels)
                    lengths.append(min(len(words), max_len))
                    words, labels = [], []

                if len(batch_x) == batch_size:
                    batch_x = pad_sequences(batch_x, maxlen = max_len, value = 0, padding = "post")
                    batch_y = pad_sequences(batch_y, maxlen = max_len, value = 0, padding = "post")
                    yield (batch_x, lengths), batch_y 
                    batch_x, lengths, batch_y = [], [], []
            if train == False:
                break

The estimator API requires an input function and a model function. The `input_fn` maps the `generate_batches` generator to a tensorflow Dataset.

In [62]:
# For model training, we need an input function that will feed a tf.Dataset
def input_fn(file, params = None, train = True):
    params = params if params is not None else {}
    shapes = (([None, None], [None]), [None, None]) # batch_x, lengths, batch_y shapes
    types = ((tf.int32, tf.int32), tf.int32)        # batch_x, lengths, batch_y data types
    
    generator = partial(generate_batches, file, train = train)
    dataset = tf.data.Dataset.from_generator(generator, types, shapes)
    return dataset

# For model serving, we need a serving function that will feed tf.placeholders
def serving_input_fn():
    words = tf.placeholder(dtype=tf.int32, shape=[None, None], name='words')
    length = tf.placeholder(dtype=tf.int32, shape=[None], name='length')
    receiver_tensors = {'words': words, 'length': length}
    features = {'words': words, 'length': length}
    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

The `model_fn` unpacks features and labels to create the specified model architecture, which is an LSTM-CRF.

In [63]:
def model_fn(features, labels, mode, params = params):
    # import the data and unpack the features
    # serving input_fn returns a dict, convert to multivalue obj
    if isinstance(features, dict):
        features = features['words'], features['length']
    
    words, length = features
    
    # Embedding
    embedding = tf.Variable(tf.random.normal([params['max_words'], params['dim']]))
    embedding_lookup_for_x = tf.nn.embedding_lookup(embedding, words)
    
    # LSTM
    lstm_cell_fw = tf.nn.rnn_cell.BasicLSTMCell(params['lstm_size'], state_is_tuple = True)
    lstm_cell_bw = tf.nn.rnn_cell.BasicLSTMCell(params['lstm_size'], state_is_tuple = True)
    states, final_state = tf.nn.bidirectional_dynamic_rnn(
                                        cell_fw = lstm_cell_fw, 
                                        cell_bw = lstm_cell_bw,
                                        inputs = embedding_lookup_for_x, 
                                        dtype = tf.float32,
                                        time_major = False,
                                        sequence_length = length)
    lstm_out = tf.concat([states[0], states[1]], axis = 2)
        
    # Conditional random fields
    logits = tf.layers.dense(lstm_out, params['num_classes'])
    crf_params = tf.get_variable("crf", [params['num_classes'], params['num_classes']], dtype=tf.float32)
    pred_ids, _ = tfa.text.crf_decode(logits, crf_params, length)
    training = (mode == tf.estimator.ModeKeys.TRAIN)
    
    # Prediction
    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = { 
            'pred_ids': pred_ids,
            'tags': words,
            'length' : length,
        }
        export_outputs = {
          'prediction': tf.estimator.export.PredictOutput(predictions)
      }
        
    return tf.estimator.EstimatorSpec(mode, predictions=predictions, export_outputs=export_outputs)
    
    # Loss functions and optimizers
    log_likelihood, _ = tfa.text.crf_log_likelihood(logits, labels, length, crf_params)
    
    loss = tf.reduce_mean(-log_likelihood)
    train_op = tf.train.AdamOptimizer().minimize(
        loss, global_step = tf.train.get_or_create_global_step())
        
    # Training
    if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(mode = mode,
                                          loss = loss,
                                          
                                          train_op = train_op)

The Estimator API requires separate "Spec" objects, through `tf.estimator.TrainSpec` and `EvalSpec` for training and evaluation configuration. We use `functools.partial` to modify the input to the `input_fn` to create separate training and evaluation inputs, and then create separate `Spec` objects for training and evaluation.

In [64]:
# Spin up the estimator
config = tf.estimator.RunConfig()
estimator = tf.estimator.Estimator(model_fn, 'models/model', config, params)

# Create train spec
train_input_fn = partial(input_fn, "data/train.txt", params = params)
train_spec = tf.estimator.TrainSpec(train_input_fn)

# Create evaluation spec
eval_input_fn = partial(input_fn, "data/test.txt", params = params, train = False)
eval_spec = tf.estimator.EvalSpec(eval_input_fn)

INFO:tensorflow:Using config: {'_model_dir': 'models/model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### Training

In [46]:
ts = time.time()
estimator.train(input_fn = train_input_fn, max_steps = 1000)
te = time.time()
print("Completed in {} seconds".format(int(te - ts)))
estimator.export_savedmodel('models/saved_model/', serving_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into models/model/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 25.085701, step = 1
INFO:tensorflow:global_step/sec: 19.995
INFO:tensorflow:loss = 5.4256563, step = 101 (5.002 sec)
INFO:tensorflow:global_step/sec: 18.9515
INFO:tensorflow:loss = 2.4849465, step = 201 (5.277 sec)
INFO:tensorflow:global_step/sec: 18.3072
INFO:tensorflow:loss = 1.2047207, step = 301 (5.462 sec)
INFO:tensorflow:global_step/sec: 18.8533
INFO:tensorflow:loss = 4.007087, step = 401 (5.304 sec)
INFO:tensorflow:global_step/sec: 20.4696
INFO:tensorflow:loss = 5.614001, step = 501 (4.885 sec)
INFO:tensorfl

b'models/saved_model/1597835460'

### Evaluation

Although one might normally use `if mode == tf.estimator.ModeKeys.EVAL` in the `model_fn` to specify evaluation metrics with `tf.metrics`, NER requires multi-class precision, recall, and F1-score which are not available in `tf.metrics`. Instead, we load the true test labels and calculate precision, recall, and F1-score based upon the model predictions for each sentence at the entity-level (discarding non-entity words).

In [47]:
# Generate predictions
predictions = estimator.predict(eval_input_fn)

# Load hash tables and true labels
labels_index = load_obj(params['save_dir'], "labels_index")
_, true = load_data("data/test.txt")

# Specify which label_index is non-entity
dummy_label = labels_index.get("O") 

# Convert [[string, string], [string, string] ...] to [[int, int], [int, int]]
# with hashing table for label indexes
labels = []
for row in true:
    labels.append([labels_index.get(y) for y in row])
    
# Loop through preds, labels and calculate metrics
precisions, recalls, f1s = [], [], []
for pred, true in zip(predictions, labels):
    pred = pred['pred_ids'][:pred['length']] # undo pad_sequences
    pred = [x for x in pred if x != dummy_label] # remove preds that aren't entities
    true = np.asarray([x for x in true if x != dummy_label])
    recall = calc_recall(true, pred)
    recalls.append(recall)
    precision = calc_precision(true, pred)
    precisions.append(precision)
    f1s.append(calc_f1(precision, recall))
    
print("Precision: {} \nRecall: {} \nF1-score: {}".format(np.around(np.mean(precisions), 3),
                                                         np.around(np.mean(recalls), 3),
                                                         np.around(np.mean(f1s), 3)))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models/model/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Precision: 0.881 
Recall: 0.895 
F1-score: 0.878


### Serving model for on-the-fly predictions


In [65]:
from pathlib import Path
from tensorflow.contrib import predictor

LINE = 'did george clooney make a science fiction movie in the 1980s'


def predict(line, export_dir = 'models/saved_model/', params = params):
    # Load hash tables
    word_index = load_obj(params['save_dir'], "word_index")
    labels_lookup = load_obj(params['save_dir'], "labels_lookup")
   
    # Identify and load model weights
    subdirs = [x for x in Path(export_dir).iterdir()
                   if x.is_dir() and 'temp' not in str(x)]
    latest_model = str(sorted(subdirs)[-1])
    predict_fn = predictor.from_saved_model(latest_model)
               
    # Preprocess sentence input
    line = line.strip().split()
    vector = [word_index.get(x) if x in word_index.keys() else 0 for x in line] # tokenize
    vector[len(vector):20] = [0] * (20 - len(vector)) # pad prediction
       
    # Calculate precision and transform for display
    predictions = predict_fn({'words': [vector], 'length': [len(line)]})
    tags = predictions.get('tags')
    preds = predictions.get('pred_ids')
    for tag, pred in zip(tags, preds):
        tag = [word for word in tag if word != 0] # unpad
        pred = pred[:len(tag)]
        pred = [labels_lookup.get(num) for num in pred] #untokenize
        print(line, "\n", pred)
   
predict(LINE)

ModuleNotFoundError: No module named 'tensorflow.contrib'

<img src="https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.3/kfp-components/notebooks/entity_extraction/assets/fig10.png" align = "left" width="800"/>

## Conclusion

This notebook shows two end-to-end approaches of training and serving custom NER models including loading and transforming data, creating an NER pipeline, and calculating performance metrics. The custom architecture developed in Tensorflow was competitive with out-of-the-box algorithms while being an order of magnitude faster to train and being able to use the powerful high level APIs in tensorflow like Dataset and Estimator for scalable serving.

Both approaches perform well on diverse queries about movies with spelling mistakes and complicated query structures. NER pipelines like the ones presented in this notebook can be integrated into recommender systems, search engines, NLP feature engineering, and customer support / chatbots, among many other business applications.