In this chapter, we’ll discuss various IE tasks and the methods for
implementing them for our applications. We’ll start with a brief
historical background, followed by an overview of different IE tasks
and applications of IE in the real world. We’ll then introduce the
typical NLP processing pipeline for solving any IE task and move on
to discuss how to solve specific IE tasks—key phrase extraction,
named entity recognition, named entity disambiguation and linking, and
relationship extraction—along with some practical advice on
implementing them in your projects. We’ll then present a case study of
how IE is used in a real-world scenario and briefly cover other
advanced IE tasks.

# 1. IE Applications
- Tagging news and other content
- Chatbots
- Applications in social media
- Extracting data from forms and receipts

# 2. IE Tasks
The overarching goal of IE is to extract “knowledge” from
text, and each of these tasks provides different information to do that.
- Keyword or keyphrase extraction (KPE).
- Named entity recognition (NER).
- Event extraction
- Temporal information extraction
- Template filling

# 3. The General Pipeline for IE
![](images/pipeline-IE.png)
<center>IE pipeline illustrating NLP processing needed for some IE tasks</center>

# 4. Keyphrase Extraction
Keyword and phrase extraction, as the name indicates, is the IE task
concerned with extracting important words and phrases that capture the
gist of the text from a given text document.

## Implementing KPE
The Python library textacy, built on top of the well-known library
spaCy, contains implementations for some of the common graph based keyword and phrase extraction algorithms.

In [None]:
import spacy
import textacy.ke
from textacy import *

#Load a spacy model, which will be used for all further processing.
en = textacy.load_spacy_lang("en_core_web_sm")

In [None]:
#Let us use a sample text file, nlphistory.txt, which is the text from the history section of Wikipedia's
#page on Natural Language Processing 
#https://en.wikipedia.org/wiki/Natural_language_processing
try :
    from google.colab import files
    uploaded=files.upload()
    mytext = open('nlphistory.txt').read()

except ModuleNotFoundError :
    mytext = open('Data/nlphistory.txt').read()

Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving nlphistory.txt to nlphistory.txt

In [None]:
#convert the text into a spacy document.
doc = textacy.make_spacy_doc(mytext, lang=en)
textacy.ke.textrank(doc, topn=5)

In [None]:
#Print the keywords using TextRank algorithm, as implemented in Textacy.
print("Textrank output: ", [kps for kps, weights in textacy.ke.textrank(doc, normalize="lemma", topn=5)])\
#Print the key words and phrases, using SGRank algorithm, as implemented in Textacy
print("SGRank output: ", [kps for kps, weights in textacy.ke.sgrank(doc, topn=5)])

In [None]:
#To address the issue of overlapping key phrases, textacy has a function: aggregage_term_variants.
#Choosing one of the grouped terms per item will give us a list of non-overlapping key phrases!
terms = set([term for term,weight in textacy.ke.sgrank(doc)])
print(textacy.ke.utils.aggregate_term_variants(terms))

In [None]:
#A way to look at key phrases is just consider all noun chunks as potential ones. 
#However, keep in mind this will result in a lot of phrases, and no way to rank them!

print([chunk for chunk in textacy.extract.noun_chunks(doc)])

Textacy also has a bunch of other information extraction functions, many of them based on regular expression patterns and heuristics to address extracting specific expressions such as acronyms and quotations. Apart from these, we can also extract matching custom regular expressions including POS tag patterns, or look for statements involving an entity, subject-verb-object tuples etc. We will discuss some of these as they come, in this chapter.

# 5. Named Entity Recognition
NER refers to the IE task of identifying the entities in a document.
Entities are typically names of persons, locations, and organizations,
and other specialized strings, such as money expressions, dates,
products, names/numbers of laws or articles, and so on. NER is an
important step in the pipeline of several NLP applications involving
information extraction.

![](images/ner-example.png)
<center>NER example using the displaCy visualizer</center>

## Building an NER System
**Conditional random fields (CRFs)** is one of the popular sequence
classifier training algorithms.
To perform sequence classification, we need data in a format that
allows us to model the context. Typical training data for NER looks
like Figure below, which is a sentence from the CONLL-03 dataset.
The labels in the figure follow what’s known as a BIO notation: **B**
indicates the beginning of an entity; **I**, inside an entity, indicates when
entities comprise more than one word; and **O**, other, indicates non entities.

![](images/NER-tag.png)
<center>NER training data format example</center>

In [31]:
#Make the necessary imports
from nltk.tag import pos_tag
from sklearn_crfsuite import CRF, metrics
from sklearn.metrics import make_scorer,confusion_matrix
from pprint import pprint
from sklearn.metrics import f1_score,classification_report
from sklearn.pipeline import Pipeline
import string
import warnings
warnings.filterwarnings('ignore')

### Loading The Data

In [3]:
"""
Load the training/testing data. 
input: conll format data, but with only 2 tab separated colums - words and NEtags.
output: A list where each item is 2 lists.  sentence as a list of tokens, NER tags as a list for each token.
"""
def load__data_conll(file_path):
    myoutput,words,tags = [],[],[]
    fh = open(file_path)
    for line in fh:
        line = line.strip()
        if "\t" not in line:
            #Sentence ended.
            myoutput.append([words,tags])
            words,tags = [],[]
        else:
            word, tag = line.split("\t")
            words.append(word)
            tags.append(tag)
    fh.close()
    return myoutput

In [4]:
"""
Get features for all words in the sentence
Features:
- word context: a window of 2 words on either side of the current word, and current word.
- POS context: a window of 2 POS tags on either side of the current word, and current tag. 
input: sentence as a list of tokens.
output: list of dictionaries. each dict represents features for that word.
"""
def sent2feats(sentence):
    feats = []
    sen_tags = pos_tag(sentence) #This format is specific to this POS tagger!
    for i in range(0,len(sentence)):
        word = sentence[i]
        wordfeats = {}
       #word features: word, prev 2 words, next 2 words in the sentence.
        wordfeats['word'] = word
        if i == 0:
            wordfeats["prevWord"] = wordfeats["prevSecondWord"] = "<S>"
        elif i==1:
            wordfeats["prevWord"] = sentence[0]
            wordfeats["prevSecondWord"] = "</S>"
        else:
            wordfeats["prevWord"] = sentence[i-1]
            wordfeats["prevSecondWord"] = sentence[i-2]
        #next two words as features
        if i == len(sentence)-2:
            wordfeats["nextWord"] = sentence[i+1]
            wordfeats["nextNextWord"] = "</S>"
        elif i==len(sentence)-1:
            wordfeats["nextWord"] = "</S>"
            wordfeats["nextNextWord"] = "</S>"
        else:
            wordfeats["nextWord"] = sentence[i+1]
            wordfeats["nextNextWord"] = sentence[i+2]
        
        #POS tag features: current tag, previous and next 2 tags.
        wordfeats['tag'] = sen_tags[i][1]
        if i == 0:
            wordfeats["prevTag"] = wordfeats["prevSecondTag"] = "<S>"
        elif i == 1:
            wordfeats["prevTag"] = sen_tags[0][1]
            wordfeats["prevSecondTag"] = "</S>"
        else:
            wordfeats["prevTag"] = sen_tags[i - 1][1]

            wordfeats["prevSecondTag"] = sen_tags[i - 2][1]
            # next two words as features
        if i == len(sentence) - 2:
            wordfeats["nextTag"] = sen_tags[i + 1][1]
            wordfeats["nextNextTag"] = "</S>"
        elif i == len(sentence) - 1:
            wordfeats["nextTag"] = "</S>"
            wordfeats["nextNextTag"] = "</S>"
        else:
            wordfeats["nextTag"] = sen_tags[i + 1][1]
            wordfeats["nextNextTag"] = sen_tags[i + 2][1]
        #That is it! You can add whatever you want!
        feats.append(wordfeats)
    return feats

### Extracting Features

In [5]:
#Extract features from the conll data, after loading it.
def get_feats_conll(conll_data):
    feats = []
    labels = []
    for sentence in conll_data:
        feats.append(sent2feats(sentence[0]))
        labels.append(sentence[1])
    return feats, labels

### Training a Model

In [40]:
#Train a sequence model
def train_seq(X_train,Y_train,X_dev,Y_dev):
   # crf = CRF(algorithm='lbfgs', c1=0.1, c2=0.1, max_iterations=50, all_possible_states=True)
    crf = CRF(algorithm='lbfgs', c1=0.1, c2=10, max_iterations=50)#, all_possible_states=True)
    #Just to fit on training data
    crf.fit(X_train, Y_train)
    labels = list(crf.classes_)
    #testing:
    y_pred = crf.predict(X_dev)
    sorted_labels = sorted(labels, key=lambda name: (name[1:], name[0]))
    print(metrics.flat_f1_score(Y_dev, y_pred,average='weighted'))
    # print(metrics.flat_classification_report(Y_dev, y_pred))
    #print(metrics.sequence_accuracy_score(Y_dev, y_pred))
    get_confusion_matrix(Y_dev, y_pred,labels=sorted_labels)

In [7]:
def print_cm(cm, labels):
    print("\n")
    """pretty print for confusion matrixes"""
    columnwidth = max([len(x) for x in labels] + [5])  # 5 is value length
    empty_cell = " " * columnwidth
    # Print header
    print("    " + empty_cell, end=" ")
    for label in labels:
        print("%{0}s".format(columnwidth) % label, end=" ")
    print()
    # Print rows
    for i, label1 in enumerate(labels):
        print("    %{0}s".format(columnwidth) % label1, end=" ")
        sum = 0
        for j in range(len(labels)):
            cell = "%{0}.0f".format(columnwidth) % cm[i, j]
            sum =  sum + int(cell)
            print(cell, end=" ")
        print(sum) #Prints the total number of instances per cat at the end.

In [29]:
#python-crfsuite does not have a confusion matrix function, 
#so writing it using sklearn's confusion matrix and print_cm from github
def get_confusion_matrix(y_true,y_pred,labels):
    trues,preds = [], []
    for yseq_true, yseq_pred in zip(y_true, y_pred):
        trues.extend(yseq_true)
        preds.extend(yseq_pred)
    print_cm(confusion_matrix(trues,preds),labels)

### Call all our functions inside the main method

In [41]:
def main():
    
    try:
        from google.colab import files
        uploaded = files.upload()
        # files are present in Data/conlldata
        train_path = 'train.txt'
        test_path = 'test.txt'
    except:
        train_path = 'data/conlldata/train.txt'
        test_path = 'data/conlldata/test.txt'
        
    conll_train = load__data_conll(train_path)
    conll_dev = load__data_conll(test_path)
    
    print("Training a Sequence classification model with CRF")
    feats, labels = get_feats_conll(conll_train)
    devfeats, devlabels = get_feats_conll(conll_dev)
    train_seq(feats, labels, devfeats, devlabels)
    print("Done with sequence model")

if __name__=="__main__":
    main()

Training a Sequence classification model with CRF
0.9255103670420659


                O  B-LOC  I-LOC B-MISC I-MISC  B-ORG  I-ORG  B-PER  I-PER 
         O   1276     36     95     98      1      1     14      4    143 1668
     B-LOC     48    217     56     19      1      2     13      2    344 702
     I-LOC    236     48    932    151      0      3     20      6    265 1661
    B-MISC    138      5     90   1238      1      3     44     12     86 1617
    I-MISC      6      1      0      0    124      5     52     37     32 257
     B-ORG      1      4      2      0      4    109     29      9     58 216
     I-ORG     15      2     21      8     18     15    588     92     76 835
     B-PER      1      0      2      0     16      4     83   1024     26 1156
     I-PER    118     22    193     88      3     32    224     64  37579 38323
Done with sequence model


This is pretty good. We already have a model which has an F-score of 92%!!!

## NER Using an Existing Library

In [1]:
#Problems of NER illustration through Spacy.
import spacy
nlp = spacy.load("en_core_web_lg")
mytext = """SAN FRANCISCO — Shortly after Apple used a new tax law last year to bring back most of the $252 billion it had held abroad, the company said it would buy back $100 billion of its stock.

On Tuesday, Apple announced its plans for another major chunk of the money: It will buy back a further $75 billion in stock.

“Our first priority is always looking after the business and making sure we continue to grow and invest,” Luca Maestri, Apple’s finance chief, said in an interview. “If there is excess cash, then obviously we want to return it to investors.”

Apple’s record buybacks should be welcome news to shareholders, as the stock price is likely to climb. But the buybacks could also expose the company to more criticism that the tax cuts it received have mostly benefited investors and executives.
"""
doc = nlp(mytext)
for ent in doc.ents:
    print(ent.text, "\t", ent.label_)

SAN FRANCISCO 	 GPE
Apple 	 ORG
last year 	 DATE
$252 billion 	 MONEY
$100 billion 	 MONEY
Tuesday 	 DATE
Apple 	 ORG
$75 billion 	 MONEY
first 	 ORDINAL
Luca Maestri 	 PERSON
Apple 	 ORG


## Practical Advice
Despite the fact that state-of-the-art NER
is highly accurate (with F1 scores over 90% using standard evaluation
frameworks for NER in NLP research), there are several issues to
keep in mind when using NER in our own software applications. Here
are a couple caveats based on our own experience with developing
NER systems:
- NER is very sensitive to the format of its input.
- NER is also very sensitive to the accuracy of the prior steps
in its processing pipeline: sentence splitting, tokenization, and
POS tagging.

Despite such shortcomings, NER is immensely useful for many IE
scenarios, such as content tagging, search, and mining social media to
identify customer feedback about specific products, to name a few.

In this notebook we demonstrate how we can leverage BERT to perform NER on conll2003 dataset.
This notebook requires a GPU to get setup. We suggest you to run this on your local machine only if you have a GPU setup or else you can use google colab.

In [None]:
#Installing required packages
try :
    from google.colab import files
    %tensorflow_version 1.x
    
except ModuleNotFoundError :
    Print("Not Using Colab")
    
!pip install pytorch-pretrained-bert==0.4.0
!pip install seqeval==0.0.12

#importing packages for string processing,dataframe handling, array manipulations, etc
import string
import pandas as pd
import numpy as np
from tqdm import tqdm, trange

import warnings
warnings.filterwarnings("ignore")

#importing all the pytorch packages
import torch
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from pytorch_pretrained_bert import BertTokenizer, BertConfig
from pytorch_pretrained_bert import BertForTokenClassification, BertAdam

#importing additonal packages to aid preprocessing of data
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

#importing packages to calculate the f1_score of our model
from seqeval.metrics import f1_score

In [None]:
#uploading data into google colab
#upload the test.txt and train.txt files respectively
try :
    from google.colab import files
    uploaded = files.upload()

except ModuleNotFoundError :
    print("Not using Colab")

In [None]:
"""
Load the training/testing data. 
input: conll format data, but with only 2 tab separated colums - words and NEtags.
output: A list where each item is 2 lists.  sentence as a list of tokens, NER tags as a list for each token.
"""
#functions for preparing the data in the *.txt files
def load__data_conll(file_path):
    myoutput,words,tags = [],[],[]
    fh = open(file_path)
    for line in fh:
        line = line.strip()
        if "\t" not in line:
            #Sentence ended.
            myoutput.append([words,tags])
            words,tags = [],[]
        else:
            word, tag = line.split("\t")
            words.append(word)
            tags.append(tag)
    fh.close()
    return myoutput

"""
Get features for all words in the sentence
Features:
- word context: a window of 2 words on either side of the current word, and current word.
- POS context: a window of 2 POS tags on either side of the current word, and current tag. 
input: sentence as a list of tokens.
output: list of dictionaries. each dict represents features for that word.
"""
def sent2feats(sentence):
    feats = []
    sen_tags = pos_tag(sentence) #This format is specific to this POS tagger!
    for i in range(0,len(sentence)):
        word = sentence[i]
        wordfeats = {}
       #word features: word, prev 2 words, next 2 words in the sentence.
        wordfeats['word'] = word
        if i == 0:
            wordfeats["prevWord"] = wordfeats["prevSecondWord"] = "<S>"
        elif i==1:
            wordfeats["prevWord"] = sentence[0]
            wordfeats["prevSecondWord"] = "</S>"
        else:
            wordfeats["prevWord"] = sentence[i-1]
            wordfeats["prevSecondWord"] = sentence[i-2]
        #next two words as features
        if i == len(sentence)-2:
            wordfeats["nextWord"] = sentence[i+1]
            wordfeats["nextNextWord"] = "</S>"
        elif i==len(sentence)-1:
            wordfeats["nextWord"] = "</S>"
            wordfeats["nextNextWord"] = "</S>"
        else:
            wordfeats["nextWord"] = sentence[i+1]
            wordfeats["nextNextWord"] = sentence[i+2]
        
        #POS tag features: current tag, previous and next 2 tags.
        wordfeats['tag'] = sen_tags[i][1]
        if i == 0:
            wordfeats["prevTag"] = wordfeats["prevSecondTag"] = "<S>"
        elif i == 1:
            wordfeats["prevTag"] = sen_tags[0][1]
            wordfeats["prevSecondTag"] = "</S>"
        else:
            wordfeats["prevTag"] = sen_tags[i - 1][1]

            wordfeats["prevSecondTag"] = sen_tags[i - 2][1]
            # next two words as features
        if i == len(sentence) - 2:
            wordfeats["nextTag"] = sen_tags[i + 1][1]
            wordfeats["nextNextTag"] = "</S>"
        elif i == len(sentence) - 1:
            wordfeats["nextTag"] = "</S>"
            wordfeats["nextNextTag"] = "</S>"
        else:
            wordfeats["nextTag"] = sen_tags[i + 1][1]
            wordfeats["nextNextTag"] = sen_tags[i + 2][1]
        #That is it! You can add whatever you want!
        feats.append(wordfeats)
    return feats

In [None]:
#preprocess the data by calling the functions
try :
    from google.colab import files
    train_path = 'train.txt'
    test_path = 'test.txt' 

except ModuleNotFoundError :
    train_path = 'Data/conlldata/train.txt'
    test_path = 'Data/conlldata/test.txt'

    
conll_train = load__data_conll(train_path)
conll_test = load__data_conll(test_path) 

In [None]:
#BERT needs us to pre-process the data in a particular way.
#Lets take the raw data from the txt files
df_train = pd.read_csv(train_path, engine="python",delimiter="\t",header=None,encoding='utf-8',error_bad_lines=False)
df_test = pd.read_csv(test_path, engine="python",delimiter="\t",encoding='utf-8',header=None, error_bad_lines=False)

In [None]:
#merge 
df = pd.merge(df_train,df_test)
label = list(df[1].values)#we will be using this to make a set of all unique labels

In [None]:
np.array(conll_train).shape#calculating the size

In [None]:
np.array(conll_test).shape#calculating the size

In [None]:
import re
def untokenize(words):
    """
    Untokenizing a text undoes the tokenizing operation, restoring
    punctuation and spaces to the places that people expect them to be.
    Ideally, `untokenize(tokenize(text))` should be identical to `text`,
    except for line breaks.
    """
    text = ' '.join(words)
    step1 = text.replace("`` ", '"').replace(" ''", '"').replace('. . .',  '...')
    step2 = step1.replace(" ( ", " (").replace(" ) ", ") ")
    step3 = re.sub(r' ([.,:;?!%]+)([ \'"`])', r"\1\2", step2)
    step4 = re.sub(r' ([.,:;?!%]+)$', r"\1", step3)
    step5 = step4.replace(" '", "'").replace(" n't", "n't").replace(
         "can not", "cannot")
    step6 = step5.replace(" ` ", " '")
    return step6.strip()

In [None]:
#lets convert them to dataframs for easier handling
df_train = pd.DataFrame(conll_train,columns=["sentence","labels"])
df_test = pd.DataFrame(conll_test,columns=["sentence","labels"])

In [None]:
#getting all the sentences and labels present in both test and train
sentences = list(df_train['sentence'])+list(df_test['sentence'])
print("No of sentences:",len(sentences))
labels = list(df_train['labels'])+list(df_test['labels']) 
print("No of labels:",len(labels))

In [None]:
sentences = [untokenize(sent) for sent in sentences]
sentences[0]

In [None]:
#setting up pytorch to use GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()

#prescribed configurations that we need to fix for BERT.
MAX_LEN = 75
bs = 32

#BERT's implementation comes with a pretained tokenizer and a defined vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

#tokenizing the text 
tokenized_texts = list(map(lambda x: ['[CLS]'] + tokenizer.tokenize(x) + ['[SEP]'] , sentences))
print(tokenized_texts[0])

In [None]:
labels[0]

In [None]:
#pre-processing the labels
#converting tags to indices 
tags_vals = list(set(label))  
tag2idx = {t: i for i, t in enumerate(tags_vals)}


In [None]:
#We now need to give BERT input ids,ie, a sequence of integers which uniquely identify each input token to its index number.
#cutting and padding the tokens and labels to our desired length

input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
                          maxlen=MAX_LEN, dtype="long", truncating="post", padding="post")

tags = pad_sequences([[tag2idx.get(l) for l in lab] for lab in labels],
                     maxlen=MAX_LEN, value=tag2idx["O"], padding="post",
                     dtype="long", truncating="post")

In [None]:
#BERT supports something called attention masks
#Tells the model which tokens should be attended to, and which should not.
#learn more about this at https://huggingface.co/transformers/glossary.html#attention-mask

attention_masks = [[float(i>0) for i in ii] for ii in input_ids]

In [None]:
#split the dataset to use 20% to validate the model.
tr_inputs, val_inputs, tr_tags, val_tags = train_test_split(input_ids, tags, 
                                                            random_state=2020, test_size=0.2)
tr_masks, val_masks, _, _ = train_test_split(attention_masks, input_ids,
                                             random_state=2020, test_size=0.2)

In [None]:
#pytorch requires inputs to be in the form of torch tensors
#Learn more about torch tensors at https://pytorch.org/docs/stable/tensors.html
tr_inputs = torch.tensor(tr_inputs)
val_inputs = torch.tensor(val_inputs)
tr_tags = torch.tensor(tr_tags)
val_tags = torch.tensor(val_tags)
tr_masks = torch.tensor(tr_masks)
val_masks = torch.tensor(val_masks)

In [None]:
#Define the Data Loaders
#Shuffle the data at training time
#Pass them sequentially during test time
train_data = TensorDataset(tr_inputs, tr_masks, tr_tags)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=bs)
print("Train Data Loaders Ready")
valid_data = TensorDataset(val_inputs, val_masks, val_tags)
valid_sampler = SequentialSampler(valid_data)
valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=bs)
print("Test Data Loaders Ready")

In [None]:
# BertForTokenClassification class of pytorch-pretrained-bert package provides  for token-level predictions
model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=len(tag2idx))#loading pre trained bert
print("BERT model ready to use")

In [None]:
#Passing model parameters into GPU
if torch.cuda.is_available():    
    print("Passing Model parameters in GPU")
    print(model.cuda()) 
else: 
    print(model)

In [None]:
#Before starting fine tuing we need to add the optimizer. Generally Adam is used
#weight_decay is added as regularization to the main weight matrices
print("Fine Tuning BERT")
FULL_FINETUNING = True
if FULL_FINETUNING:
    param_optimizer = list(model.named_parameters())
    no_decay = ['bias', 'gamma', 'beta']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
         'weight_decay_rate': 0.01},
        {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
         'weight_decay_rate': 0.0}
    ]
else:
    param_optimizer = list(model.classifier.named_parameters()) 
    optimizer_grouped_parameters = [{"params": [p for n, p in param_optimizer]}]
optimizer = Adam(optimizer_grouped_parameters, lr=3e-5)

In [None]:
#accuracy
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=2).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

In [None]:
#Add the epoch number. The bert paper recomends 3-4
epochs = 4
max_grad_norm = 1.0
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
train_loss_set=[]
for _ in trange(epochs, desc="Epoch"):
    # TRAIN loop
    model.train()
    tr_loss = 0
    nb_tr_examples, nb_tr_steps = 0, 0
    for step, batch in enumerate(train_dataloader):
        # add batch to gpu
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_mask, b_labels = batch
        # forward pass
        loss = model(b_input_ids, token_type_ids=None,
                     attention_mask=b_input_mask, labels=b_labels)
        train_loss_set.append(loss)
        # backward pass
        loss.backward()
        # track train loss
        tr_loss += loss.item()
        nb_tr_examples += b_input_ids.size(0)
        nb_tr_steps += 1
        # gradient clipping
        torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=max_grad_norm)
        # update parameters
        optimizer.step()
        model.zero_grad()
    # print train loss per epoch
    print("Train loss: {}".format(tr_loss/nb_tr_steps))
    # VALIDATION on validation set
    model.eval()
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0
    predictions , true_labels = [], []
    for batch in valid_dataloader:
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_mask, b_labels = batch
        
        with torch.no_grad():
            tmp_eval_loss = model(b_input_ids, token_type_ids=None,
                                  attention_mask=b_input_mask, labels=b_labels)
            logits = model(b_input_ids, token_type_ids=None,
                           attention_mask=b_input_mask)
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
        predictions.extend([list(p) for p in np.argmax(logits, axis=2)])
        true_labels.append(label_ids)
        
        tmp_eval_accuracy = flat_accuracy(logits, label_ids)
        
        eval_loss += tmp_eval_loss.mean().item()
        eval_accuracy += tmp_eval_accuracy
        
        nb_eval_examples += b_input_ids.size(0)
        nb_eval_steps += 1
    eval_loss = eval_loss/nb_eval_steps
    print("Validation loss: {}".format(eval_loss))
    print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))
    pred_tags = [tags_vals[p_i] for p in predictions for p_i in p]
    valid_tags = [tags_vals[l_ii] for l in true_labels for l_i in l for l_ii in l_i]
    print("F1-Score: {}".format(f1_score(pred_tags, valid_tags)))

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(15,8))
plt.title("Training loss")
plt.xlabel("Batch")
plt.ylabel("Loss")
plt.ylim(0,0.25)
plt.plot(train_loss_set)
plt.show()

In [None]:
#Evaluate the model
model.eval()
predictions = []
true_labels = []
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0
for batch in valid_dataloader:
    batch = tuple(t.to(device) for t in batch)
    b_input_ids, b_input_mask, b_labels = batch

    with torch.no_grad():
        tmp_eval_loss = model(b_input_ids, token_type_ids=None,
                              attention_mask=b_input_mask, labels=b_labels)
        logits = model(b_input_ids, token_type_ids=None,
                       attention_mask=b_input_mask)
        
    logits = logits.detach().cpu().numpy()
    predictions.extend([list(p) for p in np.argmax(logits, axis=2)])
    label_ids = b_labels.to('cpu').numpy()
    true_labels.append(label_ids)
    tmp_eval_accuracy = flat_accuracy(logits, label_ids)

    eval_loss += tmp_eval_loss.mean().item()
    eval_accuracy += tmp_eval_accuracy

    nb_eval_examples += b_input_ids.size(0)
    nb_eval_steps += 1

pred_tags = [[tags_vals[p_i] for p_i in p] for p in predictions]
valid_tags = [[tags_vals[l_ii] for l_ii in l_i] for l in true_labels for l_i in l ]
print("Validation loss: {}".format(eval_loss/nb_eval_steps))
print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))
print("Validation F1-Score: {}".format(f1_score(pred_tags, valid_tags)))

# 6.Named Entity Disambiguation and Linking
![](images/entity-link.png)
<center>Entity linking by IBM</center>

*Named entity disambiguation (NED)* refers to the NLP task of
achieving exactly this: assigning a unique identity to entities mentioned
in the text. It’s also the first step in moving toward more sophisticated
tasks to address the scenario mentioned above by identifying
relationships between entities. NER and NED together are known as
named entity linking (NEL). Some other NLP applications that would
need NEL include question answering and constructing large
knowledge bases of connected events and entities, such as the Google
Knowledge Graph

## NEL Using Azure API
The Azure Text Analytics API is one of the popular APIs for NEL.
DBpedia Spotlight is a freely available tool to do the same.

In [None]:
import requests
import pprint

my_api_key = 'xxxx' #replace this with your api key

In [None]:
def print_entities(text):
    url = "https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/entities"
    documents = {'documents':[{'id':'1', 'language':'en', 'text':text}]}
    headers = {'Ocp-Apim-Subscription-Key': my_api_key}
    response = requests.post(url, headers=headers, json=documents)
    entities = response.json()
    return entities

In [None]:
mytext = open("nytarticle.txt").read() #This file is in the same folder. 
entities = print_entities(mytext)
for document in entities["documents"]:
    pprint.pprint(document["entities"])
#This above code will print you a whole lot of stuff you may or may not use later.

In [None]:
#Let us clean up a little bit, and not print the whole lot of messy stuff it gives us?
for document in entities['documents']:
    print("Entities in this document: ")
    for entity in document['entities']:
        if entity['type'] in ["Person", "Location", "Organization"]:
            print(entity['name'], "\t", entity['type'])
            if 'wikipediaUrl' in entity.keys():
                print(entity['wikipediaUrl'])

There are a few important things to keep in mind while using
NEL in your project:
- Existing NEL approaches are not perfect, and they’re unlikely
to fare well with new names or domain-specific terms. Since
NEL also requires further linguistic processing, including
syntactic parsing, its accuracy is also affected by how well
the different processing steps are done.
- Like with other IE tasks, the first step in any NLP pipeline—
text extraction and cleanup—affects what we see as output for
NEL as well. When we use third-party services, we have
little control over adapting them to our domain, if needed, or
understanding their internal workings to modify them to our
needs.

# 7. Relationship Extraction
*Relationship extraction (RE)* is the IE task that deals with extracting
entities and relationships between them from text documents. It’s an
important step in building a knowledge base, and it’s also useful in
improving search and developing question-answering systems.

## Approaches to RE
RE is often treated as a supervised classification problem. The
datasets used to train RE systems contain a set of pre-defined
relations, similar to classification datasets. This consists of modeling
it as a two-step classification problem:
1. Whether two entities in a text are related (binary
classification).
2. If they are related, what is the relation between them
(multiclass classification)?

## RE with the Watson API
RE is a hard problem, and it would be challenging and time consuming
to develop our own relation extraction systems from scratch. A
solution commonly used in NLP projects in the industry is to rely on
the Natural Language Understanding service provided by IBM Watson.

# 8. Other Advanced IE Tasks
## Temporal Information Extraction
![](images/temporal-event.png)
<center>Identifying and extracting temporal events from emails</center>

## Event Extraction
![](images/life-event.png)
<center>Examples of extracting life events from Twitter data</center>

## Template Filling
![](images/template-filling.png)
<center>Example of template filling</center>

# 9. Case Study
![](images/meeting-extraction.png)
<center>Meeting information extraction from email (representative image)</center>

![](images/pipeline-meeting-extraction.png)
<center>Pipeline for meeting information extraction system development</center>