# End to End Sequence Labelling using BiLSTM CNN CRF for NER
This project aims to perform **End to End Sequence Labelling**  on English data from CoNLL using BiLSTM CNN CRF for Named Entity Recognition.

This project aims to implement the pytorch [model](https://https://github.com/jayavardhanr/End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial/blob/master/Named_Entity_Recognition-LSTM-CNN-CRF-Tutorial.ipynb) of BiLSTM CNN CRF in Tensorflow Keras. We will use Convolution Neural Network Encoding for Character Level Representation of words, Bidirectional LSTM for Word Level Encoding and Conditional Random Fields (CRF Layer) for output decodings.

Following are the libraries that we will import.

In [81]:
# Import Libraries
# !pip install tensorflow-gpu
!pip install git+https://www.github.com/keras-team/keras-contrib.git
# !pip install sklearn-crfsuite
import tensorflow as tf
import keras
from keras.layers import TimeDistributed, Conv1D, Dense, Embedding, Input, Dropout, LSTM, Bidirectional, MaxPooling1D,Flatten, concatenate
from keras_contrib.losses import crf_loss
from keras_contrib.metrics import crf_viterbi_accuracy
from keras_contrib.layers.crf import CRF
from keras.utils import plot_model
from keras.initializers import RandomUniform
from keras.optimizers import SGD, Nadam
import numpy as np
import os
import sys
import codecs
import re
import pickle
from sklearn_crfsuite.metrics import flat_classification_report
import matplotlib.pyplot as plt

Collecting git+https://www.github.com/keras-team/keras-contrib.git
  Cloning https://www.github.com/keras-team/keras-contrib.git to /tmp/pip-req-build-lh9thwq_
  Running command git clone --filter=blob:none --quiet https://www.github.com/keras-team/keras-contrib.git /tmp/pip-req-build-lh9thwq_
  Resolved https://www.github.com/keras-team/keras-contrib.git to commit 3fc5ef709e061416f4bc8a92ca3750c824b5d2b0
  Preparing metadata (setup.py) ... [?25l[?25hdone


## Download Data

In [2]:
# Downloading Data
!mkdir data
!wget https://raw.githubusercontent.com/mxhofer/Named-Entity-Recognition-BidirectionalLSTM-CNN-CoNLL/master/data/train.txt -P /content/data
!wget https://raw.githubusercontent.com/mxhofer/Named-Entity-Recognition-BidirectionalLSTM-CNN-CoNLL/master/data/dev.txt -P /content/data
!wget https://raw.githubusercontent.com/mxhofer/Named-Entity-Recognition-BidirectionalLSTM-CNN-CoNLL/master/data/test.txt -P /content/data

mkdir: cannot create directory ‘data’: File exists
--2025-03-21 12:39:33--  https://raw.githubusercontent.com/mxhofer/Named-Entity-Recognition-BidirectionalLSTM-CNN-CoNLL/master/data/train.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3283420 (3.1M) [text/plain]
Saving to: ‘/content/data/train.txt.1’


2025-03-21 12:39:34 (57.6 MB/s) - ‘/content/data/train.txt.1’ saved [3283420/3283420]

--2025-03-21 12:39:34--  https://raw.githubusercontent.com/mxhofer/Named-Entity-Recognition-BidirectionalLSTM-CNN-CoNLL/master/data/dev.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP r

## Data Preprocessing
Data prepocessing includes loading the data, updating the tagging scheme, create mapping for words, characters and tags and finally preparing the data that is passed into the model.

### Custom Data Loading
This step includes loading the train and validation data into list of sentences.

Following code loads the training and validation data.

In [2]:
def load_sentences(filename):
    f = open(filename)
    sentences = []
    sentence = []
    for line in f:
        if len(line) == 0 or line.startswith('-DOCSTART') or line[0] == "\n":
            if len(sentence) > 0:
                sentences.append(sentence)
                sentence = []
            continue
        splits = line.split(' ')
        sentence.append([splits[0], splits[-1]])

    if len(sentence) > 0:
        sentences.append(sentence)
        sentence = []
    return sentences

In [3]:
train_sentences = load_sentences("/content/data/train.txt")
dev_sentences = load_sentences("/content/data/dev.txt")
test_sentences = load_sentences("/content/data/test.txt")
len(train_sentences), len(dev_sentences), len(test_sentences)

(14041, 3250, 3453)

### Add Character Information

In [4]:
def add_char_info(sentences):
    for i, sentence in enumerate(sentences):
        for j, data in enumerate(sentence):
            chars = [c for c in data[0]]
            sentences[i][j] = [data[0], chars, data[1]]
    return sentences

In [5]:
train_sentences = add_char_info(train_sentences)
dev_sentences = add_char_info(dev_sentences)
test_sentences = add_char_info(test_sentences)
train_sentences[0]

[['EU', ['E', 'U'], 'B-ORG\n'],
 ['rejects', ['r', 'e', 'j', 'e', 'c', 't', 's'], 'O\n'],
 ['German', ['G', 'e', 'r', 'm', 'a', 'n'], 'B-MISC\n'],
 ['call', ['c', 'a', 'l', 'l'], 'O\n'],
 ['to', ['t', 'o'], 'O\n'],
 ['boycott', ['b', 'o', 'y', 'c', 'o', 't', 't'], 'O\n'],
 ['British', ['B', 'r', 'i', 't', 'i', 's', 'h'], 'B-MISC\n'],
 ['lamb', ['l', 'a', 'm', 'b'], 'O\n'],
 ['.', ['.'], 'O\n']]

### Tag Mappings

In [6]:
labels_set = set()
words = {}

# unique words and labels in data
for dataset in [train_sentences, dev_sentences, test_sentences]:
  for sentence in dataset:
    for token, char, label in sentence:
      # token ... token, char ... list of chars, label ... BIO labels
      labels_set.add(label)
      words[token.lower()] = True

In [7]:
# mapping for labels
indexes = {"PADDING":0}
for label in labels_set:
  indexes[label] = len(indexes)


tag_to_id = {}
for word,index in indexes.items():
  if index != 0:
    word = word[:len(word)-1]

  tag_to_id[word] = index

tag_to_id

{'PADDING': 0,
 'B-ORG': 1,
 'I-MISC': 2,
 'I-ORG': 3,
 'I-LOC': 4,
 'B-PER': 5,
 'B-LOC': 6,
 'O': 7,
 'I-PER': 8,
 'B-MISC': 9}

In [8]:
id_to_tag = {v: k for k, v in tag_to_id.items()}
len(id_to_tag)

10

### Word and Character Embeddings

In [10]:
# Download Glove Word Embeddings
!wget http://nlp.stanford.edu/data/glove.6B.zip && unzip glove.6B.zip -d /content/

--2025-03-21 12:39:37--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2025-03-21 12:39:37--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2025-03-21 12:39:37--  https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip.1’


2

In [9]:
word_to_id = {}
word_embeddings = []

EMBEDDINGS_FILE = open("/content/glove.6B.50d.txt", encoding="utf-8")


# loop through each word in embeddings
for line in EMBEDDINGS_FILE:
    split = line.strip().split(" ")
    word = split[0]  # embedding word entry

    if len(word_to_id) == 0:  # add padding+unknown
        word_to_id["PADDING_TOKEN"] = len(word_to_id)
        vector = np.zeros(len(split) - 1)  # zero vector for 'PADDING' word
        word_embeddings.append(vector)

        word_to_id["UNKNOWN_TOKEN"] = len(word_to_id)
        vector = np.random.uniform(-0.25, 0.25, len(split) - 1)
        word_embeddings.append(vector)

    if split[0].lower() in words:
        vector = np.array([float(num) for num in split[1:]])
        word_embeddings.append(vector)  # word embedding vector
        word_to_id[split[0]] = len(word_to_id)  # corresponding word dict

word_embeddings = np.array(word_embeddings)
word_embeddings.shape, len(word_to_id)

((22949, 50), 22949)

In [10]:
id_to_word = {v: k for k, v in word_to_id.items()}
len(id_to_word)

22949

In [11]:
# dictionary of all possible characters
char_to_id = {"PADDING": 0, "UNKNOWN": 1}
for c in " 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.,-_()[]{}!?:;#'\"/\\%$`&=*+@^~|<>":
    char_to_id[c] = len(char_to_id)

len(char_to_id)

97

### Prepare Dataset

In [12]:
def create_dataset(sentences, word_to_id, tag_to_id, char_to_id):
    unk_index = word_to_id['UNKNOWN_TOKEN']
    pad_index = word_to_id['PADDING_TOKEN']

    dataset = []

    word_count = 0
    unk_word_count = 0

    for sentence in sentences:
        word_indices = []
        char_indices = []
        tag_indices = []

        for word, char, tag in sentence:
            word_count += 1
            if word in word_to_id:
                word_index = word_to_id[word]
            elif word.lower() in word_to_id:
                word_index = word_to_id[word.lower()]
            else:
                word_index = unk_index
                unk_word_count += 1
            char_index = []
            for x in char:
                char_index.append(char_to_id[x])
            # Get the label and map to int
            word_indices.append(word_index)
            char_indices.append(char_index)
            tag_indices.append(tag_to_id[tag])

        dataset.append([word_indices, char_indices, tag_indices])

    return dataset

In [13]:
def padding(sentences):
  maxlen = 52
  for sentence in sentences:
      char = sentence[1]
      for x in char:
          maxlen = max(maxlen, len(x))
  for i, sentence in enumerate(sentences):
      sentences[i][1] = keras.preprocessing.sequence.pad_sequences(sentences[i][1], 52, padding='post')
  return sentences

In [14]:
train_set = padding(create_dataset(train_sentences, word_to_id, indexes, char_to_id))
dev_set = padding(create_dataset(dev_sentences, word_to_id, indexes, char_to_id))
test_set = padding(create_dataset(test_sentences, word_to_id, indexes, char_to_id))


In [15]:
def unpack(dataset):
  words=[]
  chars=[]
  tags = []
  for word, char, tag in dataset:
    words.append(word)
    chars.append(char)
    tags.append(tag)

  words = keras.preprocessing.sequence.pad_sequences(words, 52, padding='post')
  chars = keras.preprocessing.sequence.pad_sequences(chars, 52, padding='post')
  tags = keras.preprocessing.sequence.pad_sequences(tags, 52, padding='post')
  tags = keras.utils.to_categorical(tags, num_classes=10)

  return words, chars, tags

In [16]:
train_words, train_chars, train_tags = unpack(train_set)
valid_words, valid_chars, valid_tags = unpack(dev_set)
test_words, test_chars, test_tags = unpack(test_set)

In [17]:
train_words.shape, train_chars.shape, train_tags.shape

((14041, 52), (14041, 52, 52), (14041, 52, 10))

## Define Model
The model that is implemented in this project uses the following architectures:
- Convolution Neural Network for Character Level Representation of words
- Bidirectional LSTM for Word Level Encoding
- Conditional Random Fields (CRF Layer) for output decodings.

In [78]:
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer
from tensorflow.keras import constraints, initializers, regularizers

class CRF(Layer):
    """Conditional Random Field layer.
    Implementation of CRF layer with similar API to keras-contrib's CRF.
    """

    def __init__(self, units=None, sparse_target=False,
                 learn_mode='join', test_mode='viterbi',
                 use_boundary=True, use_bias=True,
                 kernel_initializer='glorot_uniform',
                 chain_initializer='orthogonal',
                 bias_initializer='zeros',
                 boundary_initializer='zeros',
                 kernel_regularizer=None,
                 chain_regularizer=None,
                 boundary_regularizer=None,
                 bias_regularizer=None,
                 kernel_constraint=None,
                 chain_constraint=None,
                 boundary_constraint=None,
                 bias_constraint=None,
                 **kwargs):
        super(CRF, self).__init__(**kwargs)
        self.units = units
        self.sparse_target = sparse_target
        self.learn_mode = learn_mode
        self.test_mode = test_mode
        self.use_boundary = use_boundary
        self.use_bias = use_bias

        self.kernel_initializer = initializers.get(kernel_initializer)
        self.chain_initializer = initializers.get(chain_initializer)
        self.boundary_initializer = initializers.get(boundary_initializer)
        self.bias_initializer = initializers.get(bias_initializer)

        self.kernel_regularizer = regularizers.get(kernel_regularizer)
        self.chain_regularizer = regularizers.get(chain_regularizer)
        self.boundary_regularizer = regularizers.get(boundary_regularizer)
        self.bias_regularizer = regularizers.get(bias_regularizer)

        self.kernel_constraint = constraints.get(kernel_constraint)
        self.chain_constraint = constraints.get(chain_constraint)
        self.boundary_constraint = constraints.get(boundary_constraint)
        self.bias_constraint = constraints.get(bias_constraint)

        self.supports_masking = True

    def build(self, input_shape):
        self.input_dim = input_shape[-1]
        self.input_spec = tf.keras.layers.InputSpec(min_ndim=3, axes={-1: self.input_dim})

        if self.units is None:
            self.units = self.input_dim

        self.kernel = self.add_weight(
            shape=(self.input_dim, self.units),
            name='kernel',
            initializer=self.kernel_initializer,
            regularizer=self.kernel_regularizer,
            constraint=self.kernel_constraint
        )

        self.chain_kernel = self.add_weight(
            shape=(self.units, self.units),
            name='chain_kernel',
            initializer=self.chain_initializer,
            regularizer=self.chain_regularizer,
            constraint=self.chain_constraint
        )

        if self.use_bias:
            self.bias = self.add_weight(
                shape=(self.units,),
                name='bias',
                initializer=self.bias_initializer,
                regularizer=self.bias_regularizer,
                constraint=self.bias_constraint
            )
        else:
            self.bias = None

        if self.use_boundary:
            self.left_boundary = self.add_weight(
                shape=(self.units,),
                name='left_boundary',
                initializer=self.boundary_initializer,
                regularizer=self.boundary_regularizer,
                constraint=self.boundary_constraint
            )
            self.right_boundary = self.add_weight(
                shape=(self.units,),
                name='right_boundary',
                initializer=self.boundary_initializer,
                regularizer=self.boundary_regularizer,
                constraint=self.boundary_constraint
            )

        self.built = True

    def call(self, inputs, mask=None, training=None):
        potentials = tf.matmul(inputs, self.kernel)
        if self.use_bias:
            potentials = potentials + self.bias

        if mask is None:
            mask = tf.ones_like(inputs[:, :, 0], dtype=tf.bool)
        else:
            mask = tf.cast(mask, dtype=tf.bool)

        sequence_lengths = tf.reduce_sum(tf.cast(mask, tf.int64), axis=1)

        if training:
            return potentials
        else:
            viterbi_sequence, _ = self.viterbi_decode(potentials, sequence_lengths)
            return tf.one_hot(viterbi_sequence, self.units)

    def loss_function(self, y_true, y_pred):
        if self.sparse_target:
            y_true = tf.one_hot(tf.cast(y_true, tf.int32), self.units)

        log_likelihood, _ = self.forward_algorithm(y_pred, y_true)
        return -log_likelihood

    def viterbi_decode(self, potentials, sequence_lengths):
        # Implementation of Viterbi decoding
        # This is a simplified version - in a real implementation, you would use tf.TensorArray
        # for more efficient dynamic computation

        batch_size = tf.shape(potentials)[0]
        max_seq_len = tf.shape(potentials)[1]

        # Initialize with left boundary if used
        if self.use_boundary:
            initial_state = self.left_boundary
        else:
            initial_state = tf.zeros([self.units], dtype=potentials.dtype)

        # Create a mask for valid positions
        mask = tf.sequence_mask(sequence_lengths, maxlen=max_seq_len)

        def _viterbi_step(previous, current):
            emissions = current[0]
            mask_t = current[1]

            previous = tf.expand_dims(previous, 2)  # (batch, num_tags, 1)
            transition_scores = tf.expand_dims(self.chain_kernel, 0)  # (1, num_tags, num_tags)

            # Calculate scores for all possible paths
            scores = previous + transition_scores

            # Find the best path
            best_scores = tf.reduce_max(scores, axis=1)
            best_paths = tf.argmax(scores, axis=1)

            # Add emission scores
            scores_with_emissions = best_scores + emissions

            # Apply mask
            mask_t = tf.expand_dims(mask_t, 1)
            scores_masked = scores_with_emissions * tf.cast(mask_t, scores_with_emissions.dtype)

            return scores_masked, best_paths

        # Iterate through the sequence
        initial_scores = initial_state + potentials[:, 0]

        # Placeholder for the implementation
        # In a full implementation, you would use tf.scan or a loop to compute the Viterbi path

        # For now, return a dummy implementation
        best_paths = tf.argmax(potentials, axis=-1)
        return best_paths, None

    def get_config(self):
        config = {
            'units': self.units,
            'sparse_target': self.sparse_target,
            'learn_mode': self.learn_mode,
            'test_mode': self.test_mode,
            'use_boundary': self.use_boundary,
            'use_bias': self.use_bias,
            'kernel_initializer': initializers.serialize(self.kernel_initializer),
            'chain_initializer': initializers.serialize(self.chain_initializer),
            'boundary_initializer': initializers.serialize(self.boundary_initializer),
            'bias_initializer': initializers.serialize(self.bias_initializer),
            'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
            'chain_regularizer': regularizers.serialize(self.chain_regularizer),
            'boundary_regularizer': regularizers.serialize(self.boundary_regularizer),
            'bias_regularizer': regularizers.serialize(self.bias_regularizer),
            'kernel_constraint': constraints.serialize(self.kernel_constraint),
            'chain_constraint': constraints.serialize(self.chain_constraint),
            'boundary_constraint': constraints.serialize(self.boundary_constraint),
            'bias_constraint': constraints.serialize(self.bias_constraint)
        }
        base_config = super(CRF, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))
    def forward_algorithm(self, potentials, y_true):
        # Triển khai thuật toán forward để tính log-likelihood
        # Đây là phần giả lập, cần triển khai đầy đủ
        sequence_lengths = tf.reduce_sum(tf.cast(tf.not_equal(tf.argmax(y_true, axis=-1), 0), tf.int32), axis=-1)
        log_likelihood = tf.reduce_sum(potentials * y_true, axis=[1, 2])
        return log_likelihood, sequence_lengths


def crf_loss(y_true, y_pred):
    """CRF loss function.

    Args:
        y_true: True target tensor.
        y_pred: Predicted tensor from CRF layer.

    Returns:
        Negative log-likelihood loss.
    """
    # Use tf.cond for conditional execution in graph mode
    y_true = tf.cond(tf.equal(tf.rank(y_true), 2),
                     lambda: tf.one_hot(tf.cast(y_true, tf.int32), tf.shape(y_pred)[-1]),
                     lambda: y_true)

    # Compute negative log-likelihood
    # Đây là một placeholder cho việc triển khai thực tế
    log_likelihood = -tf.reduce_sum(y_true * y_pred, axis=-1)
    return tf.reduce_mean(log_likelihood)

def crf_viterbi_accuracy(y_true, y_pred):
    """Accuracy based on Viterbi path.

    Args:
        y_true: True target tensor.
        y_pred: Predicted tensor from CRF layer.

    Returns:
        Accuracy metric.
    """
    # Convert to class indices
    y_pred_argmax = tf.argmax(y_pred, axis=-1)

    # If y_true is one-hot, convert to indices
    # Instead of `len(tf.shape(y_true))`, use `tf.shape(y_true).shape[0]`
    if tf.shape(y_true).shape[0] == 3:
        y_true = tf.argmax(y_true, axis=-1)

    # Create mask for valid positions (non-padding)
    mask = tf.not_equal(y_true, 0)

    # Calculate accuracy only on valid positions
    correct = tf.cast(tf.equal(y_true, y_pred_argmax), tf.float32) * tf.cast(mask, tf.float32)
    accuracy = tf.reduce_sum(correct) / tf.reduce_sum(tf.cast(mask, tf.float32))

    return accuracy

In [79]:
# Define Model Layers
char_input = Input(shape=(None, 52,), name="char_input")
char_embed = TimeDistributed(Embedding(len(char_to_id), 30, embeddings_initializer=RandomUniform(minval=-0.5, maxval=0.5)), name="char_embed")(char_input)
char_dropout = Dropout(0.5)(char_embed)
char_cnn = TimeDistributed(Conv1D(kernel_size=3, filters=30, padding='same', activation='tanh', strides=1), name="conv1d")(char_dropout)
maxpool_out = TimeDistributed(MaxPooling1D(52), name="maxpool")(char_cnn)
char_flat = TimeDistributed(Flatten(), name="flatten")(maxpool_out)
char = Dropout(0.5)(char_flat)
words_input = Input(shape=(None,), dtype='int32', name='words_input')
words = Embedding(input_dim=word_embeddings.shape[0], output_dim=word_embeddings.shape[1], weights=[word_embeddings],trainable=False)(words_input)
concat = concatenate([words, char])
lstm = Bidirectional(LSTM(200, return_sequences=True, dropout=0.5,recurrent_dropout=0.25), name="bilstm")(concat)
dense_out = TimeDistributed(Dense(len(tag_to_id)), name="dense_layer")(lstm)

# Sử dụng lớp CRF tùy chỉnh
crf_layer = CRF(len(tag_to_id))
output = crf_layer(dense_out)

# Define Model Inputs and Output
model = keras.models.Model([char_input, words_input], output)
# Compile Model
model.compile(optimizer='adam', loss=crf_loss, metrics=[crf_viterbi_accuracy])
# Model Summary
model.summary()


## Training

In [80]:
history = model.fit([train_chars,train_words], train_tags,validation_data=[[valid_chars, valid_words], valid_tags],batch_size=64,epochs=20,verbose=1)

Epoch 1/20




[1m220/220[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m128s[0m 464ms/step - crf_viterbi_accuracy: 0.0029 - loss: -92.0694 - val_crf_viterbi_accuracy: 0.0000e+00 - val_loss: -0.6980
Epoch 2/20
[1m220/220[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m130s[0m 433ms/step - crf_viterbi_accuracy: 0.0000e+00 - loss: -915.6615 - val_crf_viterbi_accuracy: 0.0000e+00 - val_loss: -0.6980
Epoch 3/20
[1m220/220[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 441ms/step - crf_viterbi_accuracy: 0.0000e+00 - loss: -2696.1038 - val_crf_viterbi_accuracy: 0.0000e+00 - val_loss: -0.6980
Epoch 4/20


KeyboardInterrupt: 

## Training and Testing Accuracy

In [58]:
_, train_acc = model.evaluate([train_chars, train_words],train_tags)
_, val_acc = model.evaluate([valid_chars, valid_words], valid_tags)
_, test_acc = model.evaluate([test_chars, test_words],test_tags)

print('Training Accuray: ', train_acc * 100)
print('Validation Accuray: ', val_acc * 100)
print('Testing Accuray: ', test_acc * 100)

AttributeError: Exception encountered when calling CRF.call().

[1mmodule 'tensorflow' has no attribute 'contrib'[0m

Arguments received by CRF.call():
  • inputs=tf.Tensor(shape=(None, 52, 10), dtype=float32)
  • mask=None
  • training=False

## Evaluation Metrics Report
Now we will use Precision, Recall and F1 score to evaluate the performance of our model on each tag.

In [None]:
# Evaluate
pred_cat = model.predict([test_chars,test_words])
predicted = [[[np.argmax(i)] for i in w] for w in pred_cat]
predicted = np.array(predicted)
actual = [[[np.argmax(i)] for i in w] for w in test_tags]
actual = np.array(actual)
# Convert the index to tag
predicted_tag = [[id_to_tag[i[0]] for i in row] for row in predicted]
actual_tag = [[id_to_tag[i[0]] for i in row] for row in actual]

# Metrics Report
report = flat_classification_report(y_pred=predicted_tag, y_true=actual_tag)
print(report)

## Accuracy Curve

In [None]:
# Plotting Training vs Validation Accuracy
plt.plot(history.history['crf_viterbi_accuracy'])
plt.plot(history.history['val_crf_viterbi_accuracy'])
plt.title('Train vs Validation Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'])
plt.show()

## Loss Curve

In [None]:
# Plotting Training vs Validation Accuracy
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Train vs Validation Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'])
plt.show()

## Testing

In [None]:
good_example = []
bad_example = []

for i in range(5):
  for j in range(5):
    if test_words[i][j] != 0:
      word = id_to_word[test_words[i][j]]
      a_tag = actual_tag[i][j]
      p_tag = predicted_tag[i][j]
      if a_tag == p_tag:
        good_example.append([word, p_tag, a_tag])
      else:
        bad_example.append([word, p_tag, a_tag])

col1_width = max([len(x[0]) for x in good_example])
col2_width = max([len(x[1]) for x in good_example])
col3_width = max([len(x[2]) for x in good_example])

print("-------- Good Examples --------")
print ("|{0:<{col1}}  |{1:<{col2}}  |{2:<{col3}}  |".format("Word","Actual","Predicted",col1=col1_width,col2=col2_width,col3=col3_width))

for word, p_tag, a_tag in good_example:
  print ("|{0:<{col1}}  |{1:<{col2}}   |{2:<{col3}}      |".format(word,a_tag,p_tag,col1=col1_width,col2=col2_width,col3=col3_width))

col1_width = max([len(x[0]) for x in bad_example])
col2_width = max([len(x[1]) for x in bad_example])
col3_width = max([len(x[2]) for x in bad_example])

print("\n\n-------- Bad Examples --------")
print ("|{0:<{col1}}  |{1:<{col2}}  |{2:<{col3}}  |".format("Word","Actual","Predicted",col1=col1_width,col2=col2_width,col3=col3_width))

for word, p_tag, a_tag in bad_example:
   print ("|{0:<{col1}}  |{1:<{col2}}   |{2:<{col3}}      |".format(word,a_tag,p_tag,col1=col1_width,col2=col2_width,col3=col3_width))