<a href="https://colab.research.google.com/github/hdang20/NER_CoNLL2003/blob/main/ner.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Command to run the file:

    1. Upload datasets and pretrained embedding dataset into Google colab: train.txt, test.txt, valid.txt, GoogleNews-vectors-negative300.bin
    2. Open ner.ipynb in Google colab
    3. Go to Edit>Notebook settings>Hardware accelerator: select GPU


Detailed Procedure: 

i. Import necessary libraries

ii. Def preprocessing & embeddings

    1. def readword()
    2. def embed()
    3. def readtag()
    4. def tag_indexing()

iii. Def training

    1. def simple_rnn_model()
    2. def bi_rnn_model()
    3. def simple_lstm_model()
    4. def bi_lstm_model()
    5. def simple_gru_model()
    6. def bi_gru_model()

    Note: After trainning 6 models RNN, bi-RNN, LSTM, bi-LSTM, GRU, bi-GRU, bi-GRU is the best performer (highest val_accuracy: 0.9920). 
    Therefore, we decided to do parameter tuning with bi-GRU. 
    We did 4 tuning experiments and got val_accuracy as following:
    (i) bi-GRU with 512 hidden units: val_accuracy=0.9921
    (ii) bi-GRU with learning rate = 0.001: val_accuracy = 0.9926
    (iii) bi-GRU with batch_size = 32: val_accuracy = 0.9901
    (iv) stacked-bi-GRU with learning rate = 0.001: val_accuracy = 0.9937

    Model stacked-bi-GRU with learning rate = 0.001 is chosen to be the best model

    7. def stacked_bi_gru_model() {best tunning model}

iv. Def testing

    1. def flatten()
    2. def get_id()
    3. def to_file()

v. Main

    Preprocessing & embeddings 
    1. Create x_train, x_val, x_test
    2. Word padding: find the maximum sentence length in the data, add ['0'] in shorter sentences
    3. Word embeddings: convert each word to a 300 dimensional embedding vector from word2vec embeddings trained on the google news dataset
    4. Create y_train, y_val, y_test
    5. Gold standard tag padding: add ['<pad>'] in shorter sentences
    6. Tag indexing: mapping each tag to number from 0-9
    7. Convert x_train, x_val, x_test, y_train, y_val, y_test to NP array

    For each model:
    Training
    8. Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7), using training and validation dataset
    9. Save model
    Testing
    10. Evaluate model by test dataset
    11. Print test result to file with required format
    12. Print F1 score by tag




**i. Import necessary libraries**

In [None]:
#install necessary libraries
!pip install seqeval[gpu]
import numpy as np
from gensim.models.keyedvectors import KeyedVectors
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras import optimizers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, SimpleRNN, LSTM, GRU, Dropout, Activation, Bidirectional, Conv2D, MaxPooling2D
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

Collecting seqeval[gpu]
  Downloading https://files.pythonhosted.org/packages/34/91/068aca8d60ce56dd9ba4506850e876aba5e66a6f2f29aa223224b50df0de/seqeval-0.0.12.tar.gz
Collecting tensorflow-gpu
[?25l  Downloading https://files.pythonhosted.org/packages/25/44/47f0722aea081697143fbcf5d2aa60d1aee4aaacb5869aee2b568974777b/tensorflow_gpu-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl (380.8MB)
[K     |████████████████████████████████| 380.8MB 40kB/s 
Collecting tensorflow-estimator<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/fc/08/8b927337b7019c374719145d1dceba21a8bb909b93b1ad6f8fb7d22c1ca1/tensorflow_estimator-2.0.1-py2.py3-none-any.whl (449kB)
[K     |████████████████████████████████| 450kB 48.6MB/s 
Collecting tensorboard<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/d3/9e/a48cd34dd7b672ffc227b566f7d16d63c62c58b542d54efa45848c395dd4/tensorboard-2.0.1-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 29.6MB/s 


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


**ii. Def preprocessing & embeddings**

In [None]:
def readword(filename, *, encoding="UTF8"):
    '''
    read words in the first column of data set
    lower case capitalized words (i.e., starts with a capital letter) but not all capital words(e.g., USA)
    return format: [['sen1_word1', 'sen1_word2', ...', 'sen1_lastword'], ..., [...] ]
    '''
    with open(filename, mode='rt', encoding=encoding) as f:
        sentences = []
        sentence = []
        for line in f:
            if len(line) == 0 or line.startswith('-DOCSTART') or line[0] == "\n":
                if len(sentence) > 0:
                    sentences.append(sentence)
                    sentence = []
                continue
            splits = line.split(' ')
            if not splits[0].isupper():
                splits[0]=splits[0].lower()
            sentence.append(splits[0])

    if len(sentence) > 0:
        sentences.append(sentence)
        sentence = []
    return sentences

def embed(datasets):
  '''
    convert each word to a 300 dimensional embedding vector from word2vec embeddings trained on the google news dataset
  '''
  word2Idx = {}
  words = {}
  # unique words in data  
  for dataset in datasets:
    for sentence in dataset:
      for word in sentence: 
        words[word] = True
  model = KeyedVectors.load_word2vec_format("gdrive/My Drive/Colab Notebooks/data/GoogleNews-vectors-negative300.bin", binary=True)
  for word in words:
    if len(word2Idx) == 0:  # add padding+unknown
      word2Idx["UNKNOWN_TOKEN"] = len(word2Idx)
      vector = np.random.uniform(-0.25, 0.25, 300)
      model["UNKNOWN_TOKEN"] = vector

      word2Idx["0"] = len(word2Idx)
      vector = np.zeros(300)  # zero vector for "0" padding word
      model["0"] = vector

      if word in model:
        vector = model[word]
        word2Idx[word] = len(word2Idx)  # corresponding word dict
  
  for dataset in datasets:
    for i, sentence in enumerate(dataset):
      embedded_sentence = []
      for word in sentence: 
          if word in model:
            embedded_sentence.append(model[word])
          else:
            embedded_sentence.append(model["UNKNOWN_TOKEN"])
      dataset[i] = embedded_sentence
  return word2Idx, words

def readtag(filename, *, encoding="UTF8"):
    '''
    read gold standard tags in the last column of data set
    lower case capitalized words (i.e., starts with a capital letter) but not all capital words(e.g., USA)
    return format: [[['sen1_tag1'], ['sen1_tag2'], ..., ['sen1_lasttag']], ..., [...] ]
    '''
    with open(filename, mode='rt', encoding=encoding) as f:
        sentences = []
        sentence = []
        for line in f:
            if len(line) == 0 or line.startswith('-DOCSTART') or line[0] == "\n":
                if len(sentence) > 0:
                    sentences.append(sentence)
                    sentence = []
                continue
            splits = line.split(' ')
            sentence.append([splits[-1].rstrip()])

    if len(sentence) > 0:
        sentences.append(sentence)
        sentence = []
    return sentences


def tag_indexing(y_trainSentences, y_valSentences, y_testSentences):
  '''
  create a list of unique tags: ['B-MISC', '<pad>', 'B-LOC', 'B-PER', 'B-ORG', 'I-ORG', 'I-MISC', 'O', 'I-LOC', 'I-PER']
  mapping each tag to number from 0-9
  '''
  # Create list of unique Tag
  tags=[]
  for dataset in [y_trainSentences, y_valSentences, y_testSentences]:
      for sentence in dataset:
        for tag in sentence:
          tags.append(tag[0])
  tag_set = set(tags)

  print(tag_set)
  n_tags=len(tag_set)
  #['B-MISC', '<pad>', 'B-LOC', 'B-PER', 'B-ORG', 'I-ORG', 'I-MISC', 'O', 'I-LOC', 'I-PER']

  # mapping for Tags
  tag2Idx = {}
  for tag in tag_set:
    tag2Idx[tag] = len(tag2Idx)

  for dataset in [y_trainSentences, y_valSentences, y_testSentences]:
    for i, sentence in enumerate(dataset):
      tag_id_sentence = []
      for tag in sentence: 
        tag_id_sentence.append([tag2Idx[tag[0]]]);
      dataset[i] = tag_id_sentence
  print(y_trainSentences[0])
  return tag2Idx

**iii. Def training**

In [None]:
def simple_rnn_model():
    '''
    a vanilla RNN with: 
    one layer of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.0001
    '''
    adam = optimizers.Adam(lr = 0.0001)
    model = Sequential()
    model.add(SimpleRNN(256, return_sequences = True, input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

def bi_rnn_model():
    '''
    a bidrectional RNN with: 
    one layer of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.0001
    '''
    adam = optimizers.Adam(lr = 0.0001)
    model = Sequential()
    model.add(Bidirectional(SimpleRNN(256, return_sequences = True), input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

def simple_lstm_model():
    '''
    a simple LSTM with: 
    one layer of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.0001
    '''
    adam = optimizers.Adam(lr = 0.0001)
    model = Sequential()
    model.add(LSTM(256, return_sequences = True, input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

def bi_lstm_model():
    '''
    a bidirectional LSTM with: 
    one layer of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.0001
    '''
    adam = optimizers.Adam(lr = 0.0001)
    model = Sequential()
    model.add(Bidirectional(LSTM(256, return_sequences = True), input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

def simple_gru_model():
    '''
    a simple GRU with: 
    one layer of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.0001
    '''
    adam = optimizers.Adam(lr = 0.0001)
    model = Sequential()
    model.add(GRU(256, return_sequences = True, input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

def bi_gru_model():
    '''
    a bidirectional GRU with: 
    one layer of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.0001
    '''
    adam = optimizers.Adam(lr = 0.0001)
    model = Sequential()
    model.add(Bidirectional(GRU(256, return_sequences = True), input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

def stacked_bi_gru_model():
    '''
    a bidirectional GRU with: 
    two layers of 256 hidden units, 
    a fully connected output layer using softmax as activation function,
    Adam optimizer, 
    cross-entropy for the loss function,
    learning rate 0.001
    '''
    adam = optimizers.Adam(lr = 0.001)
    model = Sequential()
    model.add(Bidirectional(GRU(256, return_sequences = True), input_shape = (124,300)))
    model.add(Dropout(0.2))
    model.add(Bidirectional(GRU(256, return_sequences = True), input_shape = (124,300)))
    model.add(Dense(len(tag2Idx), activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    return model

**iv. Def testing**

In [None]:
def flatten(l):
    '''
    convert data from 3D to 1D
    '''
    flatten_l = []
    for sentence in l:
      for word in sentence:
        flatten_l.append(word[0])
    return flatten_l

def get_id(predict_result):
  '''
  in 10-element tag vector, find the maximum element and return the id corresponding to predicted tag
  '''
  # print(predict_result)
  predict_result = predict_result.tolist()
  return(predict_result.index(max(predict_result)))


# Print to file
PATH = "gdrive/My Drive/Colab Notebooks/"
def to_file(orig_x_testSentences, orig_y_testSentences, y_predict, filename):
  '''
  process the predicted result: remove lines which the second column is <pad>, then convert the <pad> values in third column to 'O'
  write result to txt.file 
  '''
  idx2tag = {i: w for w, i in tag2Idx.items()}
  #convert y_predict to tag
  y_predict_tag = []
  for sentences in y_predict:
    tag_sentence = []
    for word in sentences:
      tag_sentence.append([idx2tag[get_id(word)]])
    y_predict_tag.append(tag_sentence)

  f = open(PATH + filename,"w+")
  for i, sentence in enumerate(orig_x_testSentences):
    for j, word in enumerate(sentence):
      if orig_y_testSentences[i][j][0] != "<pad>":
        if y_predict_tag[i][j][0] == "<pad>":
          f.write("%s %s %s \n" %(word, orig_y_testSentences[i][j][0], "O"))
        else:
          f.write("%s %s %s \n" %(word, orig_y_testSentences[i][j][0], y_predict_tag[i][j][0]))
        # print("%s %s %s \n" %(word, orig_y_testSentences[i][j][0], y_predict_tag[i][j][0]))
    f.write("\n")
    # print("\n")
  f.close() 

  return y_predict_tag


**Main**

In [None]:
################## LOAD DATA ###################################################
#read word, create x_train, x_val, x_test
x_trainSentences = readword("gdrive/My Drive/Colab Notebooks/train.txt")
x_valSentences = readword("gdrive/My Drive/Colab Notebooks/valid.txt")
x_testSentences = readword("gdrive/My Drive/Colab Notebooks/test.txt")

################## PADDING WORD #####################################################
# find the maximum sentence length in the data
maxlen = 0
for dataset in [x_trainSentences, x_valSentences, x_testSentences]:
    for sentence in dataset:
        maxlen = max(maxlen, len(sentence))

#add ['0'] in shorter sentences
for sentence in x_trainSentences:
    while len(sentence)<maxlen:
        sentence.append('0')

for sentence in x_valSentences:
    while len(sentence)<maxlen:
        sentence.append('0')

for sentence in x_testSentences:
    while len(sentence)<maxlen:
        sentence.append('0')

#create orginial x_testSentences
orig_x_testSentences = x_testSentences.copy()

################## EMBEDDINGS ##################################################
# convert each word to a 300 dimensional embedding vector from word2vec embeddings trained on the google news dataset
word2Idx, word = embed([x_trainSentences, x_valSentences, x_testSentences])
print(len(x_trainSentences[0]))



################## LOAD DATA ###################################################
#read tag, create y_train, y_val, y_test
y_trainSentences = readtag("gdrive/My Drive/Colab Notebooks/train.txt")
y_valSentences = readtag("gdrive/My Drive/Colab Notebooks/valid.txt")
y_testSentences = readtag("gdrive/My Drive/Colab Notebooks/test.txt")

################## PADDING TAG #################################################

#add ['<pad>'] in shorter sentences
for sentence in y_trainSentences:
    while len(sentence)<maxlen:
        sentence.append(['<pad>'])

for sentence in y_valSentences:
    while len(sentence)<maxlen:
        sentence.append(['<pad>'])

for sentence in y_testSentences:
    while len(sentence)<maxlen:
        sentence.append(['<pad>'])
print(y_trainSentences[0])

orig_y_testSentences = y_testSentences.copy()

#################### TAG INDEXING #############################################
tag2Idx = tag_indexing(y_trainSentences, y_valSentences, y_testSentences)

#### CONVERT x_train, x_val, x_test, y_train, y_val, y_test  TO NP ARRAY #######
from tensorflow.keras.utils import to_categorical
x_trainSentences = np.array(x_trainSentences)
x_valSentences = np.array(x_valSentences)
x_testSentences = np.array(x_testSentences)
print(x_trainSentences.shape)

y_trainSentences = to_categorical(np.array(y_trainSentences))
y_valSentences = to_categorical(np.array(y_valSentences))
y_testSentences = to_categorical(np.array(y_testSentences))
print(y_trainSentences.shape)



  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


124
[['B-ORG'], ['O'], ['B-MISC'], ['O'], ['O'], ['O'], ['B-MISC'], ['O'], ['O'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pad>'], ['<pa

**RNN**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
simple_rnn = simple_rnn_model()
simple_rnn.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
simple_rnn.summary()
#Save model
simple_rnn.save("gdrive/My Drive/Colab Notebooks/RNN.h5")
#Evaluate model by test dataset
result_simple_rnn = simple_rnn.evaluate(x_testSentences,y_testSentences)
print(result_simple_rnn)
#Print test result to file with required format
y_predict_simple_rnn = simple_rnn.predict(x_testSentences)
y_predict_simple_rnn = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_simple_rnn, "rnn_log.txt")
#Print F1 score by tag
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_simple_rnn)))

Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn (SimpleRNN)       (None, 124, 256)          142592    
_________________________________________________________________
dense (Dense)                (None, 124, 10)           2570      
Total params: 145,162
Trainable params: 145,162
Non-trainable params: 0
_________________________________________________________________
[0.03757357570919098, 0.9888339]
              precision    recall  f1-score   support

       <pad>       1.00      1.00      1.00    381737
       B-LOC       0.74      0.75      0.75      1668
      B-MISC       0.69      0.48      0.56       702
       B-ORG       0.67      0.37      0.48      1661
       B-PER       0.79      0.52      0.62     

**biRNN**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
bi_rnn_model = bi_rnn_model()
bi_rnn_model.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
bi_rnn_model.summary()
#Save model
bi_rnn_model.save("gdrive/My Drive/Colab Notebooks/Bi-RNN.h5")
#Evaluate model by test dataset
result_bi_rnn = bi_rnn_model.evaluate(x_testSentences,y_testSentences)
print(result_bi_rnn)
#Print test result to file with required format
y_predict_bi_rnn = bi_rnn_model.predict(x_testSentences)
y_predict_bi_rnn = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_bi_rnn, "bi_rnn_log.txt")
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_bi_rnn)))

Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional (Bidirectional (None, 124, 512)          285184    
_________________________________________________________________
dense_2 (Dense)              (None, 124, 10)           5130      
Total params: 290,314
Trainable params: 290,314
Non-trainable params: 0
_________________________________________________________________
[0.03167785642425323, 0.9908471]
              precision    recall  f1-score   support

       <pad>       1.00      1.00      1.00    381737
       B-LOC       0.83      0.73      0.78      1668
      B-MISC       0.69      0.56      0.62       702
       B-ORG       0.73      0.59      0.65      1661
       B-PER       0.75      0.69      0.72   

**LSTM**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
simple_lstm=simple_lstm_model()
simple_lstm.summary()
simple_lstm.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
#Save model
simple_lstm.save("gdrive/My Drive/Colab Notebooks/LSTM.h5")
#Evaluate model by test dataset
result = simple_lstm.evaluate(x_testSentences,y_testSentences)
print(result)
#Print test result to file with required format
y_predict_simple_lstm = simple_lstm.predict(x_testSentences)
y_predict_simple_lstm = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_simple_lstm, "lstm_log.txt")
#Print F1 score by tag
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_simple_lstm)))

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_7 (LSTM)                (None, 124, 256)          570368    
_________________________________________________________________
dense_7 (Dense)              (None, 124, 10)           2570      
Total params: 572,938
Trainable params: 572,938
Non-trainable params: 0
_________________________________________________________________
Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[0.034506713557754295, 0.98972607]
              precision    recall  f1-score   support

       <pad>       1.00      1.00      1.00    381737
       B-LOC       0.80      0.73      0.76      1668
      B-MISC       0.70      0.52      0.60       702
       B-ORG       0.73      0.41      0.52      1661
       B-PER       0.84      0.53      0.65 

**biLSTM**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
bi_lstm = bi_lstm_model()
bi_lstm.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
bi_lstm.summary()
#Save model
bi_lstm.save("gdrive/My Drive/Colab Notebooks/Bi-LSTM.h5")
#Evaluate model by test dataset
result_bi_lstm = bi_lstm.evaluate(x_testSentences,y_testSentences)
print(result_bi_lstm)

#Print test result to file with required format
y_predict_bi_lstm = bi_lstm.predict(x_testSentences)
y_predict_bi_lstm = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_bi_lstm, "bi_lstm_log.txt")
#Print F1 score by tag
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_bi_lstm)))

Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional_2 (Bidirection (None, 124, 512)          1140736   
_________________________________________________________________
dense_6 (Dense)              (None, 124, 10)           5130      
Total params: 1,145,866
Trainable params: 1,145,866
Non-trainable params: 0
_________________________________________________________________
[0.02835212469364193, 0.9916272]
              precision    recall  f1-score   support

       <pad>       1.00      1.00      1.00    381737
       B-LOC       0.77      0.80      0.78      1668
      B-MISC       0.76      0.55      0.63       702
       B-ORG       0.78      0.61      0.68      1661
       B-PER       0.78      0.70      0.7

**GRU**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
simple_gru = simple_gru_model()
simple_gru.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
simple_gru.summary()
#Save model
simple_gru.save("gdrive/My Drive/Colab Notebooks/GRU.h5")
#Evaluate model by test dataset
result_simple_gru = simple_gru.evaluate(x_testSentences,y_testSentences)
print(result_simple_gru)
#Print test result to file with required format
y_predict_simple_gru = simple_gru.predict(x_testSentences)
y_predict_simple_gru = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_simple_gru, "gru_log.txt")
#Print F1 score by tag
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_simple_gru)))

Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru (GRU)                    (None, 124, 256)          428544    
_________________________________________________________________
dense_1 (Dense)              (None, 124, 10)           2570      
Total params: 431,114
Trainable params: 431,114
Non-trainable params: 0
_________________________________________________________________
[0.03428423858460502, 0.9896747]
              precision    recall  f1-score   support

       <pad>       1.00      1.00      1.00    381737
       B-LOC       0.79      0.74      0.76      1668
      B-MISC       0.73      0.50      0.59       702
       B-ORG       0.68      0.42      0.52      1661
       B-PER       0.82      0.54      0.65   

**biGRU**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
bi_gru = bi_gru_model()
bi_gru.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
bi_gru.summary()
#Save model
bi_gru.save("gdrive/My Drive/Colab Notebooks/BI_GRU.h5")
#Evaluate model by test dataset
result_bi_gru = bi_gru.evaluate(x_testSentences,y_testSentences)
print(result_bi_gru)
#Print test result to file with required format
y_predict_bi_gru = bi_gru.predict(x_testSentences)
y_predict_bi_gru = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_bi_gru, "bi_gru_log.txt")
#Print F1 score by tag
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_bi_gru)))

Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional_1 (Bidirection (None, 124, 512)          857088    
_________________________________________________________________
dense_3 (Dense)              (None, 124, 10)           5130      
Total params: 862,218
Trainable params: 862,218
Non-trainable params: 0
_________________________________________________________________
[0.02779109433591107, 0.99197054]
              precision    recall  f1-score   support

       <pad>       1.00      1.00      1.00    381737
       B-LOC       0.82      0.78      0.80      1668
      B-MISC       0.76      0.59      0.66       702
       B-ORG       0.77      0.62      0.69      1661
       B-PER       0.81      0.71      0.75  

**Stacked biGRU**

In [None]:
#Train with 10 epochs, 2000 mini batches per epoch (batch_size=14041/2000=7)
#Using training and validation dataset
stacked_bi_gru = stacked_bi_gru_model()
stacked_bi_gru.fit(x=x_trainSentences,y=y_trainSentences, batch_size=int(x_trainSentences.shape[0]/2000), epochs=10, verbose=1,validation_data=(x_valSentences,y_valSentences))
stacked_bi_gru.summary()
#Save model
stacked_bi_gru.save("gdrive/My Drive/Colab Notebooks/Stacked_Bi_GRU.h5")
#Print test result to file with required format
y_predict_stacked_bi_gru = stacked_bi_gru.predict(x_testSentences)
y_predict_stacked_bi_gru = to_file(orig_x_testSentences, orig_y_testSentences, y_predict_stacked_bi_gru, "stacked_bi_gru_log.txt")
#Print F1 score by tag
print(classification_report(flatten(orig_y_testSentences), flatten(y_predict_stacked_bi_gru)))

Train on 14041 samples, validate on 3250 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional_5 (Bidirection (None, 124, 512)          857088    
_________________________________________________________________
dropout_1 (Dropout)          (None, 124, 512)          0         
_________________________________________________________________
bidirectional_6 (Bidirection (None, 124, 512)          1182720   
_________________________________________________________________
dense_9 (Dense)              (None, 124, 10)           5130      
Total params: 2,044,938
Trainable params: 2,044,938
Non-trainable params: 0
_________________________________________________________________
              precision    recall  f1-score   support

       <pad>       1.00    