### Recognizing named entities off tweets using LSTM

The Idea of Named Entity Recognition (NER) is to extract "named entities" from a text corpus. Examples of these entities may be person names, location, calendar months, course ID specific to a department etc. <br>
So if you were to extract named entities - *person (PER)*, *course ID (C_ID)* of the sentence below: <br>
**L645 is taught by Francis Tyers** <br>
The NER model that you're going to be building would output a sequence of tags associated with the sentence as shown below: <br>
**B-C_ID    O    O    O    B-PER    I-PER**

THE B, I and O that you see above represent a prefix scheme known as *BIO markup* The B represents Beginning, I represents Inside and O represents Outside/Out-of. 




In this notebook, you will use a recurrent neural network (RNN) model to carry out the aforementioned NER task. <br>
The notebook has been broken down into 15 different sections that you will run sequentially to complete this exercise. <br> 
There are few lines of code that you're expected to complete in those different sections to keep you engaged. You'll find them labeled as either **COMPLETE** or **YOUR CODE HERE** in their respective places.

Specifically, you're going to want to complete the following -
1. Replace '@' with \<USR\> token in section 1
2. Create mappings from tokens to indices and vice versa in section 2
3. Create embedding variable in section 6
4. Create softmax and argmax in section 7
5. Create adam optimizer in section 8
6. Adjust hyperparameters in section 13


We recommend you use [Google Collab](https://colab.research.google.com/) for this exercise. <br>
To get started, you're going to want to run the cell below to download the data corpus you'd be using in the exercise.

In [1]:
!wget https://raw.githubusercontent.com/heisenberg967/ner_twitter/main/data/train.txt # train data
!wget https://raw.githubusercontent.com/heisenberg967/ner_twitter/main/data/validation.txt # validation data
!wget https://raw.githubusercontent.com/heisenberg967/ner_twitter/main/data/test.txt # test data
!wget https://raw.githubusercontent.com/heisenberg967/ner_twitter/main/evaluation.py # used in evaluation function to calculate f-score

--2022-09-29 21:07:45--  https://raw.githubusercontent.com/heisenberg967/ner_twitter/main/data/train.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 849548 (830K) [text/plain]
Saving to: ‘train.txt’


2022-09-29 21:07:45 (94.2 MB/s) - ‘train.txt’ saved [849548/849548]

--2022-09-29 21:07:45--  https://raw.githubusercontent.com/heisenberg967/ner_twitter/main/data/validation.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 103771 (101K) [text/plain]
Saving to: ‘validation.txt’


2022-09-29 21:07:45 (34.9 MB/s) - ‘validation.t

#### 1. Load data

In this section, you're going to load the dataset and invoke the read_data function. Within the read_data function, you're going to replace all occurences of '@' with the token \<USR\>. You may notice how URLs in tweets are replaced with the token \<URL\> for your reference.

In [2]:
### Function to read data
def read_data(file_path):
    tokens = []
    tags = []
    
    tweet_tokens = []
    tweet_tags = []
    for line in open(file_path, encoding='utf-8'):
        line = line.strip()
        if not line:
            if tweet_tokens:
                tokens.append(tweet_tokens)
                tags.append(tweet_tags)
            tweet_tokens = []
            tweet_tags = []
        else:
            token, tag = line.split()
            
            if token.lower().startswith('https://') or token.lower().startswith('http://'):
                token = '<URL>'
            # Replace all username char, i.e, '@' with <USR> token
            ######### YOUR CODE HERE #############
            elif token.startswith('@'):
              token = '<USR>'
              
            tweet_tokens.append(token)
            tweet_tags.append(tag)
            
    return tokens, tags

In [3]:
train_tokens, train_tags = read_data('train.txt')
validation_tokens, validation_tags = read_data('validation.txt')
test_tokens, test_tags = read_data('test.txt')

Run the cell below to see what your data looks like. Feel free to play around with what you're printing.

In [7]:
print(train_tokens[0])
print(train_tags[0])

['RT', '<USR>', ':', 'Online', 'ticket', 'sales', 'for', 'Ghostland', 'Observatory', 'extended', 'until', '6', 'PM', 'EST', 'due', 'to', 'high', 'demand', '.', 'Get', 'them', 'before', 'they', 'sell', 'out', '...']
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-musicartist', 'I-musicartist', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']


In [31]:
tagL = []

for i in range(len(train_tags)):
  for j in train_tags[i]:
    tagL.append(j)

print(set(tagL))

{'I-facility', 'I-geo-loc', 'I-tvshow', 'I-product', 'B-other', 'I-company', 'I-movie', 'B-movie', 'B-company', 'B-tvshow', 'I-sportsteam', 'I-other', 'B-geo-loc', 'B-person', 'I-person', 'I-musicartist', 'B-product', 'O', 'B-sportsteam', 'B-facility', 'B-musicartist'}


#### 2. Create dictionaries

In this section, you're going to be generating two dictionaries or mappings: <br>
\{token\} -> \{tokenID\} represents row in the embedding matrix for a token <br>
\{tag\} -> \{tagID\} one-hot encoded vector to help compute loss at the output of the network <br>

In [8]:
from collections import defaultdict

def build_dict(tokens_or_tags, special_tokens):
    """
        tokens_or_tags: a list of lists of tokens or tags
        special_tokens: some special tokens
    """
    # Create a dictionary with default value 0
    tok2idx = defaultdict(lambda: 0)
    idx2tok = []
    
    voc = set([x for ele in tokens_or_tags for x in ele])
    voc_size = len(voc)+len(special_tokens)
    idx2tok = ['']*voc_size
    
    # Create mappings from tokens to indices and vice versa
    # Add special tokens to dictionaries
    # The first special token must have index 0. You may want to use voc to help create tok2idx and idx2tok mappings.
    ######################################
    ######### YOUR CODE HERE #############
    ######################################
    for i,token in enumerate(special_tokens):
        tok2idx[token] = i
        idx2tok[i] = token
    
    for i, token in enumerate(voc, len(special_tokens)):
        tok2idx[token] = i
        idx2tok[i] = token
    
    return tok2idx, idx2tok

special_tokens = ['<UNK>', '<PAD>']
special_tags = ['O']

# Create dictionaries 
token2idx, idx2token = build_dict(train_tokens + validation_tokens, special_tokens)
tag2idx, idx2tag = build_dict(train_tags, special_tags)

# functions will help you to create the mapping between tokens and ids for a sentence
def words2idxs(tokens_list):
    return [token2idx[word] for word in tokens_list]

def tags2idxs(tags_list):
    return [tag2idx[tag] for tag in tags_list]

def idxs2words(idxs):
    return [idx2token[idx] for idx in idxs]

def idxs2tags(idxs):
    return [idx2tag[idx] for idx in idxs]

In [None]:
##test
#print(token2idx)
#print(idx2token)

#### 3. Generate Batches

The function below has been created to help train our model in batches. Now, since we want all sequences within a batch to have the same length, we're going to be padding a token *\<PAD\>*, as you may notice in the function.

In [10]:
# generate batches
def batches_generator(batch_size, tokens, tags,
                      shuffle=True, allow_smaller_last_batch=True):
    """Generates padded batches of tokens and tags."""
    
    n_samples = len(tokens)
    if shuffle:
        order = np.random.permutation(n_samples)
    else:
        order = np.arange(n_samples)

    n_batches = n_samples // batch_size
    if allow_smaller_last_batch and n_samples % batch_size:
        n_batches += 1

    for k in range(n_batches):
        batch_start = k * batch_size
        batch_end = min((k + 1) * batch_size, n_samples)
        current_batch_size = batch_end - batch_start
        x_list = []
        y_list = []
        max_len_token = 0
        for idx in order[batch_start: batch_end]:
            x_list.append(words2idxs(tokens[idx]))
            y_list.append(tags2idxs(tags[idx]))
            max_len_token = max(max_len_token, len(tags[idx]))
            
        x = np.ones([current_batch_size, max_len_token], dtype=np.int32) * token2idx['<PAD>'] # pad token to ensure equal length
        y = np.ones([current_batch_size, max_len_token], dtype=np.int32) * tag2idx['O']
        lengths = np.zeros(current_batch_size, dtype=np.int32)
        for n in range(current_batch_size):
            utt_len = len(x_list[n])
            x[n, :utt_len] = x_list[n]
            lengths[n] = utt_len
            y[n, :utt_len] = y_list[n]
        yield x, y, lengths

#### 4. Define empty LSTM class

To carry out our NER task, we're going to be building an LSTM model whose purpose is to output a probability distribution over tags for each token in a sentence. Since, we're concerned with both left and right contexts of the token, we're using a bi-directional LSTM. We're also using a dense layer to perform tag classification.

In [11]:
### Import tf and Init LSTM Class 

# We're using version1 of tensorflow for this exercise.
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import numpy as np

class BiLSTMModel():
    pass

Instructions for updating:
non-resource variables are not supported in the long term


#### 5. Define placeholders for network model

We're first going to declare the following [placeholders](https://www.tensorflow.org/api_docs/python/tf/compat/v1/placeholder) to specify what data is going into the network.
 - *input_batch* — sequences of words (the shape equals to [batch_size, sequence_len]);
 - *ground_truth_tags* — sequences of tags (the shape equals to [batch_size, sequence_len]);
 - *lengths* — lengths of not padded sequences (the shape equals to [batch_size]);
 - *dropout_ph* — dropout keep probability; this placeholder has a predefined value 1;
 - *learning_rate_ph* — learning rate; we need this placeholder because we want to change the value during training.

Defining shape as None lets you feed in data of variable size.

In [12]:
def declare_placeholders(self):
    """Specifies placeholders for the model."""

    # Placeholders for input and ground truth output.
    self.input_batch = tf.placeholder(dtype=tf.int32, shape=[None, None], name='input_batch') 
    self.ground_truth_tags = tf.placeholder(dtype=tf.int32, shape=[None, None], name='ground_truth_tags')
  
    # Placeholder for lengths of the sequences.
    self.lengths = tf.placeholder(dtype=tf.int32, shape=[None], name='lengths') 
    
    # Placeholder for a dropout keep probability. If we don't feed
    # a value for this placeholder, it will be equal to 1.0.
    self.dropout_ph = tf.placeholder_with_default(tf.cast(1.0, tf.float32), shape=[])
    
    # Placeholder for a learning rate (tf.float32).
    self.learning_rate_ph = tf.placeholder(dtype=tf.float32, shape=[], name='learning_rate')

BiLSTMModel.__declare_placeholders = classmethod(declare_placeholders)

#### 6. Define Layers of the network model.

In [13]:
def build_layers(self, vocabulary_size, embedding_dim, n_hidden_rnn, n_tags):
    """Specifies bi-LSTM architecture and computes logits for inputs."""
    
    # Create embedding variable (tf.Variable) with dtype tf.float32
    # https://www.tensorflow.org/api_docs/python/tf/compat/v1/Variable
    initial_embedding_matrix = np.random.randn(vocabulary_size, embedding_dim) / np.sqrt(embedding_dim)
    embedding_matrix_variable = tf.Variable(initial_value=initial_embedding_matrix, name='embedding_matrix', dtype=tf.float32) ######### Replace None with the intended value for initial_value #############
    
    # Create RNN cells (for example, tf.nn.rnn_cell.BasicLSTMCell) with n_hidden_rnn number of units 
    # and dropout (tf.nn.rnn_cell.DropoutWrapper), initializing all *_keep_prob with dropout placeholder.
    forward_cell =  tf.nn.rnn_cell.DropoutWrapper(
        tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn), input_keep_prob=self.dropout_ph, output_keep_prob=self.dropout_ph, state_keep_prob=self.dropout_ph)
    backward_cell =  tf.nn.rnn_cell.DropoutWrapper(
        tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn), input_keep_prob=self.dropout_ph, output_keep_prob=self.dropout_ph, state_keep_prob=self.dropout_ph)

    # Look up embeddings for self.input_batch (tf.nn.embedding_lookup).
    # Shape: [batch_size, sequence_len, embedding_dim].
    embeddings =  tf.nn.embedding_lookup(embedding_matrix_variable, self.input_batch)
    
    # Pass them through Bidirectional Dynamic RNN (tf.nn.bidirectional_dynamic_rnn).
    # Shape: [batch_size, sequence_len, 2 * n_hidden_rnn]. 
    # Also don't forget to initialize sequence_length as self.lengths and dtype as tf.float32.
    (rnn_output_fw, rnn_output_bw), _ =  tf.nn.bidirectional_dynamic_rnn(forward_cell, backward_cell, embeddings, self.lengths, dtype=tf.float32)
    rnn_output = tf.concat([rnn_output_fw, rnn_output_bw], axis=2)

    # Dense layer on top.
    # Shape: [batch_size, sequence_len, n_tags].   
    self.logits = tf.layers.dense(rnn_output, n_tags, activation=None)

BiLSTMModel.__build_layers = classmethod(build_layers)

#### 7. Compute Predictions and Loss

In this section, we're going to apply softmax to the last layer and use argmax to determine the most probable tags as defined in the function compute_predictions. <br>
The compute_loss function is used to create our loss function for which we're making use of tensorflow's [cross entropy with logits](https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/softmax_cross_entropy_with_logits_v2). Also, we're going to be ignoring loss coming from the \<PAD\> tokens we'd created earlier.

In [14]:
def compute_predictions(self):
    """Transforms logits to probabilities and finds the most probable tags."""
    
    # Create softmax (tf.nn.softmax) function
    # https://www.tensorflow.org/api_docs/python/tf/compat/v1/math/softmax
    # Your task is to create a softmax function and assign it to softmax_output.
    # Replace None with intended value to invoke tf.nn.softmax
    # Remember, you'd defined your logits value earlier in build_layers
    softmax_output = tf.nn.softmax(self.logits) ######### YOUR CODE HERE #############
    
    # Use argmax (tf.argmax) to get the most probable tags with axis=-1
    # https://www.tensorflow.org/api_docs/python/tf/compat/v1/argmax
    # Replace None with intended value to invoke tf.argmax
    self.predictions = tf.argmax(softmax_output, axis=-1) ######### YOUR CODE HERE #############

BiLSTMModel.__compute_predictions = classmethod(compute_predictions)

def compute_loss(self, n_tags, PAD_index):
    """Computes masked cross-entopy loss with logits."""
    
    # Create cross entropy function function (tf.nn.softmax_cross_entropy_with_logits)
    ground_truth_tags_one_hot = tf.one_hot(self.ground_truth_tags, n_tags)
    loss_tensor =  tf.nn.softmax_cross_entropy_with_logits_v2(labels=ground_truth_tags_one_hot, logits=self.logits)
    
    # Create loss function which doesn't operate with <PAD> tokens (tf.reduce_mean)
    mask = tf.cast(tf.not_equal(loss_tensor, PAD_index), tf.float32)
    self.loss =  tf.reduce_mean(tf.multiply(loss_tensor, mask))

BiLSTMModel.__compute_loss = classmethod(compute_loss)

#### 8. Optimize loss using Adam optimizer

In [15]:
def perform_optimization(self):
    """Specifies the optimizer and train_op for the model."""
    
    # Your task is to create an optimizer (tf.train.AdamOptimizer)
    # https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/AdamOptimizer
    # replace None with tf.train.AdamOptimizer
    self.optimizer =  tf.train.AdamOptimizer(learning_rate=self.learning_rate_ph) ######### YOUR CODE HERE #############
    self.grads_and_vars = self.optimizer.compute_gradients(self.loss)
    
    # Gradient clipping (tf.clip_by_norm) for self.grads_and_vars
    # Pay attention that you need to apply this operation only for gradients 
    # because self.grads_and_vars contains also variables.
    # list comprehension mught be useful in this case.
    clip_norm = tf.cast(1.0, tf.float32)
    self.grads_and_vars =[ (tf.clip_by_norm(x[0], clip_norm),x[1]) for x in self.grads_and_vars]
    
    self.train_op = self.optimizer.apply_gradients(self.grads_and_vars)

BiLSTMModel.__perform_optimization = classmethod(perform_optimization)

#### 9. Build LSTM class

Great! So, we've defined all the components in our network model, so we pass on the functions onto our LSTM class' constructor.

In [16]:
## Build LSTM class

def init_model(self, vocabulary_size, n_tags, embedding_dim, n_hidden_rnn, PAD_index):
    self.__declare_placeholders()
    self.__build_layers(vocabulary_size, embedding_dim, n_hidden_rnn, n_tags)
    self.__compute_predictions()
    self.__compute_loss(n_tags, PAD_index)
    self.__perform_optimization()

BiLSTMModel.__init__ = classmethod(init_model)

#### 10. Train Neural Network Model

In order to train the network model that we've built, we're going to be computing *self.train_op* that we'd declared within the *perform_optimization* function. We're going to feed our actual data into the placeholders we'd defined as you may observe in *feed_dict* 

In [17]:
## TRAIN NN

def train_on_batch(self, session, x_batch, y_batch, lengths, learning_rate, dropout_keep_probability):
    feed_dict = {self.input_batch: x_batch,
                 self.ground_truth_tags: y_batch,
                 self.learning_rate_ph: learning_rate,
                 self.dropout_ph: dropout_keep_probability,
                 self.lengths: lengths}
    
    # https://www.tensorflow.org/api_docs/python/tf/compat/v1/Session
    session.run(self.train_op, feed_dict=feed_dict)

BiLSTMModel.train_on_batch = classmethod(train_on_batch)

#### 11. Generate tag predictions

To predict tags, we're going to compute self.predictions

In [18]:
## PREDICT

def predict_for_batch(self, session, x_batch, lengths):    
    feed_dict = {self.input_batch: x_batch,
                 self.dropout_ph: 1.0,
                 self.lengths: lengths}

    predictions = session.run(self.predictions, feed_dict=feed_dict)
    
    return predictions

BiLSTMModel.predict_for_batch = classmethod(predict_for_batch)

#### 12. Evaluate Model

To help in evaluating our model, we're going to create two functions. <br>
The *predict_tags* gets predictions from a network model and then transforms indices to tokens and tags. <br>
The *eval_conll* function calculates precision, recall and F1 score. <br>


In [19]:
## Evaluate model 

from evaluation import precision_recall_f1

def predict_tags(model, session, token_idxs_batch, lengths):
    """Performs predictions and transforms indices to tokens and tags."""
    
    tag_idxs_batch = model.predict_for_batch(session, token_idxs_batch, lengths)
    
    tags_batch, tokens_batch = [], []
    for tag_idxs, token_idxs in zip(tag_idxs_batch, token_idxs_batch):
        tags, tokens = [], []
        for tag_idx, token_idx in zip(tag_idxs, token_idxs):
            tags.append(idx2tag[tag_idx])
            tokens.append(idx2token[token_idx])
        tags_batch.append(tags)
        tokens_batch.append(tokens)
    return tags_batch, tokens_batch
    
    
def eval_conll(model, session, tokens, tags, short_report=True):
    """Computes NER quality measures using CONLL shared task script."""
    
    y_true, y_pred = [], []
    for x_batch, y_batch, lengths in batches_generator(1, tokens, tags):
        tags_batch, tokens_batch = predict_tags(model, session, x_batch, lengths)
        if len(x_batch[0]) != len(tags_batch[0]):
            raise Exception("Incorrect length of prediction for the input, "
                            "expected length: %i, got: %i" % (len(x_batch[0]), len(tags_batch[0])))
        predicted_tags = []
        ground_truth_tags = []
        for gt_tag_idx, pred_tag, token in zip(y_batch[0], tags_batch[0], tokens_batch[0]): 
            if token != '<PAD>':
                ground_truth_tags.append(idx2tag[gt_tag_idx])
                predicted_tags.append(pred_tag)

        # We extend every prediction and ground truth sequence with 'O' tag to indicate a possible end of entity.
        y_true.extend(ground_truth_tags + ['O'])
        y_pred.extend(predicted_tags + ['O'])
        
    results = precision_recall_f1(y_true, y_pred, print_results=True, short_report=short_report)
    return results

#### 13. Set/Adjust hyperparameters for BiLSTM

So, in this section, you're going to see all the different pieces come together. <br>

Our *BiLSTMModel* model has the following parameters:
 - *vocabulary_size* — number of tokens;
 - *n_tags* — number of tags;
 - *embedding_dim* — dimension of embeddings, recommended value: 200;
 - *n_hidden_rnn* — size of hidden layers for RNN, recommended value: 200;
 - *PAD_index* — an index of the padding token (\<PAD\>).

Run your model by setting different hyperparameters. You could start with the following values
- *batch_size*: 32; (alternatively 8-128)
- 4 epochs; (In the interest of time, you may want to limit it to 8)
- starting value of *learning_rate*: 0.005 (between 0.001 and 0.1)
- *learning_rate_decay*: a square root of 2;
- *dropout_keep_probability*: try several values: 0.1, 0.5, 0.9.



In [20]:
### RUN NN by setting different hyperparameters
### BUILD the model

tf.reset_default_graph()

model = BiLSTMModel(vocabulary_size=len(token2idx), n_tags=len(tag2idx), embedding_dim=200, n_hidden_rnn=200, PAD_index=token2idx['<PAD>'])

batch_size = 32 ######### YOUR CODE HERE #############
n_epochs = 8 ######### YOUR CODE HERE #############
learning_rate = 0.04 ######### YOUR CODE HERE #############
learning_rate_decay = np.sqrt(2) ######### YOUR CODE HERE #############
dropout_keep_probability = 0.5 ######### YOUR CODE HERE #############

  if sys.path[0] == '':
  
Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
  shape=[input_depth + h_depth, 4 * self._num_units])
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
  initializer=tf.compat.v1.zeros_initializer(dtype=self.dtype))
  return layer.apply(inputs)


#### 14. Run your model with the train/validation data

Okay! We're now ready to run our model!

In [21]:
### Train model using dataset

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print('Start training... \n')
for epoch in range(n_epochs):
    # For each epoch evaluate the model on train and validation data
    print('-' * 20 + ' Epoch {} '.format(epoch+1) + 'of {} '.format(n_epochs) + '-' * 20)
    print('Train data evaluation:')
    eval_conll(model, sess, train_tokens, train_tags, short_report=True)
    print('Validation data evaluation:')
    eval_conll(model, sess, validation_tokens, validation_tags, short_report=True)
    
    # Train the model
    for x_batch, y_batch, lengths in batches_generator(batch_size, train_tokens, train_tags):
        model.train_on_batch(sess, x_batch, y_batch, lengths, learning_rate, dropout_keep_probability)
        
    # Decaying the learning rate
    learning_rate = learning_rate / learning_rate_decay
    
print('...training finished.')

Start training... 

-------------------- Epoch 1 of 8 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 67140 phrases; correct: 165.

precision:  0.25%; recall:  3.68%; F1:  0.46

Validation data evaluation:
processed 12836 tokens with 537 phrases; found: 8159 phrases; correct: 25.

precision:  0.31%; recall:  4.66%; F1:  0.57

-------------------- Epoch 2 of 8 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 1854 phrases; correct: 553.

precision:  29.83%; recall:  12.32%; F1:  17.44

Validation data evaluation:
processed 12836 tokens with 537 phrases; found: 170 phrases; correct: 56.

precision:  32.94%; recall:  10.43%; F1:  15.84

-------------------- Epoch 3 of 8 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 5127 phrases; correct: 1901.

precision:  37.08%; recall:  42.35%; F1:  39.54

Validation data evaluation:
processed 12836 tokens with 537 p

#### 15. Look at Results

Congrats on making it till the end! You may observe your results by running the cell below. If you did things right, the F1 score on your validation set should be close to 40%.

In [23]:
## RESULT

print('-' * 20 + ' Train set quality: ' + '-' * 20)
train_results = eval_conll(model, sess, train_tokens, train_tags, short_report=False)

print('-' * 20 + ' Validation set quality: ' + '-' * 20)
validation_results = eval_conll(model, sess, validation_tokens, validation_tags, short_report=False)

print('-' * 20 + ' Test set quality: ' + '-' * 20)
test_results = eval_conll(model, sess, test_tokens, test_tags, short_report=False)

-------------------- Train set quality: --------------------
processed 105778 tokens with 4489 phrases; found: 4878 phrases; correct: 3148.

precision:  64.53%; recall:  70.13%; F1:  67.21

	     company: precision:   79.15%; recall:   80.87%; F1:   80.00; predicted:   657

	    facility: precision:   56.65%; recall:   57.01%; F1:   56.83; predicted:   316

	     geo-loc: precision:   75.02%; recall:   91.67%; F1:   82.51; predicted:  1217

	       movie: precision:    9.09%; recall:    2.94%; F1:    4.44; predicted:    22

	 musicartist: precision:    1.30%; recall:    0.43%; F1:    0.65; predicted:    77

	       other: precision:   52.32%; recall:   63.94%; F1:   57.55; predicted:   925

	      person: precision:   73.94%; recall:   90.63%; F1:   81.44; predicted:  1086

	     product: precision:   39.03%; recall:   43.08%; F1:   40.96; predicted:   351

	  sportsteam: precision:   50.46%; recall:   50.23%; F1:   50.35; predicted:   216

	      tvshow: precision:    0.00%; recall:  