Now let us build a Custom named entity recognition which can be used to build an NER for our own custom set of entities.

In [None]:
import spacy

Let us load the en_core_web_sm module into the spacy

In [None]:
nlp=spacy.load('en_core_web_sm')
nlp.pipe_names

['tagger', 'parser', 'ner']

Initially Iam testing the exisiting spacy trained model to a custom sentence of my own choice which displays the recognised entities below like India as GPE and Twitter as an Organisation.

In [None]:
doc=nlp("India wants social media platforms like Facebook and Twitter to adhere to their new policies")

In [None]:
for ent in doc.ents:
  print(ent.text, ent.start_char, ent.end_char, ent.label_)

India 0 5 GPE
Facebook and Twitter 40 60 ORG


Now let us take a custom sentence for which we have to idenitfy the entities

In [None]:
doc=nlp("I do have enough money to pay my credit card bills")

In [None]:
for ent in doc.ents:
  print(ent.text, ent.start_char, ent.end_char, ent.label_)

In [None]:
doc=nlp("How to open a new savings account")

In [None]:
for ent in doc.ents:
  print(ent.text, ent.start_char, ent.end_char, ent.label_)

Initially we are not getting any output for the given sentence since we have not yet trained our spacy model to idenitfy new entities. Now lets ur train the model by providing it with a decent amount of data necessary for it to get trained to recognize the new entities. The data is given in the format required by spacy.

In [None]:
train = [
         ("Money transfer from my checking account is not working", {"entities": [(6, 13, "ACTIVITY"), (23, 39, 'PRODUCT')]}),
         ("I want to check balance in my savings account", {"entities": [(16, 23, "ACTIVITY"), (30, 45, 'PRODUCT')]}),
         ("I suspect a fraud in my credit card account", {"entities": [(12, 17, "ACTIVITY"), (24, 35, 'PRODUCT')]}),
         ("I am here for opening a new savings account", {"entities": [(14, 21, "ACTIVITY"), (28, 43, 'PRODUCT')]}),
         ("Your mortgage is in delinquent status", {"entities": [(20, 30, "ACTIVITY"), (5, 13, 'PRODUCT')]}),
         ("Your credit card is in past due status", {"entities": [(23, 31, "ACTIVITY"), (5, 16, 'PRODUCT')]}),
         ("My loan account is still not approved and funded", {"entities": [(25, 37, "ACTIVITY"), (3, 15, 'PRODUCT'), (42, 48, "ACTIVITY")]}),
         ("How do I open a new loan account", {"entities": [(9, 13, "ACTIVITY"), (20, 32, 'PRODUCT')]}),
         ("What are the charges on Investment account", {"entities": [(13, 20, "ACTIVITY"), (24, 42, 'PRODUCT')]}),
         ("Can you explain late charges on my credit card", {"entities": [(21, 28, "ACTIVITY"), (35, 46, 'PRODUCT')]}),
         ("I want to open a new loan account", {"entities": [(10, 14, "ACTIVITY"), (21, 33, 'PRODUCT')]}),
         ("Can you help updating payment on my credit card", {"entities": [(22, 29, "ACTIVITY"), (36, 47, 'PRODUCT')]}),
         ("When is the payment due date on my card", {"entities": [(12, 19, "ACTIVITY"), (35, 39, 'PRODUCT')]})
        ]

In [None]:
nlp.pipe_names

['tagger', 'parser', 'ner']

In [None]:
ner=nlp.get_pipe("ner")

In [None]:
for _, annotations in train:
  for ent in annotations.get("entities"):
      ner.add_label(ent[2])

In [None]:
disable_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']

We are training the model with the training data list we have. Once the model is trained to idenitify the custom entities , we are updating it to the existing spacy model.

In [None]:
import random
from spacy.util import minibatch, compounding
from pathlib import Path

with nlp.disable_pipes(*disable_pipes):
  optimizer = nlp.resume_training()

  for iteration in range(100):

    random.shuffle(train)
    losses = {}

    batches = minibatch(train, size=compounding(1.0, 4.0, 1.001))
    for batch in batches:
        text, annotation = zip(*batch)
        nlp.update(
                    text,  
                    annotation, 
                    drop=0.5, 
                    losses=losses,
                    sgd=optimizer
                )
        print("Losses", losses)

Losses {'ner': 7.933005571368804}
Losses {'ner': 18.967684130781585}
Losses {'ner': 33.15721891414493}
Losses {'ner': 39.968622080287936}
Losses {'ner': 49.7040590242982}
Losses {'ner': 56.73179444020488}
Losses {'ner': 68.7188342524455}
Losses {'ner': 77.60269355653293}
Losses {'ner': 81.62554499483032}
Losses {'ner': 89.64080249494238}
Losses {'ner': 99.69587036923907}
Losses {'ner': 106.11476266072206}
Losses {'ner': 115.41285321546677}
Losses {'ner': 8.944212122936733}
Losses {'ner': 19.35319158032828}
Losses {'ner': 29.655029924495032}
Losses {'ner': 38.92352556838161}
Losses {'ner': 45.607614339244066}
Losses {'ner': 54.909518739242316}
Losses {'ner': 60.57358339671117}
Losses {'ner': 67.76446241276284}
Losses {'ner': 76.71563632363399}
Losses {'ner': 80.8394254023629}
Losses {'ner': 86.87959103494956}
Losses {'ner': 96.01408441837117}
Losses {'ner': 101.67140561426055}
Losses {'ner': 4.730300008935814}
Losses {'ner': 15.579514799419343}
Losses {'ner': 24.888568637672364}
Losses 

Once the spacy model is updated with the model we have trained. It will display all the entities it is trained with.

In [None]:
for text, _ in train:
    doc = nlp(text)
    print('Entities', [(ent.text, ent.label_) for ent in doc.ents])

Entities [('open', 'ACTIVITY'), ('loan account', 'PRODUCT')]
Entities [('checking account', 'PRODUCT')]
Entities [('opening', 'ACTIVITY'), ('savings account', 'PRODUCT')]
Entities [('charges', 'ACTIVITY'), ('credit card', 'PRODUCT')]
Entities [('balance', 'ACTIVITY'), ('savings account', 'PRODUCT')]
Entities [('mortgage', 'PRODUCT'), ('delinquent', 'ACTIVITY')]
Entities [('payment', 'ACTIVITY'), ('credit card', 'PRODUCT')]
Entities [('charges', 'ACTIVITY'), ('Investment account', 'PRODUCT')]
Entities [('open', 'ACTIVITY'), ('loan account', 'PRODUCT')]
Entities [('credit card', 'PRODUCT'), ('past due', 'ACTIVITY')]
Entities [('loan account', 'PRODUCT'), ('not approved', 'ACTIVITY')]
Entities [('payment', 'ACTIVITY'), ('card', 'PRODUCT')]
Entities [('fraud', 'ACTIVITY'), ('credit card', 'PRODUCT')]


Below are few of the test cases for which we are apllying our newly trained model and the output is obtained successfully.

In [None]:
from spacy import displacy

doc = nlp("what is the process to open a new savings account")
displacy.render(nlp(doc.text),style='ent', jupyter=True)

In [None]:
doc = nlp("My credit card payment will be delayed")
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

credit card 3 14 PRODUCT
payment 15 22 ACTIVITY


In [None]:
doc = nlp("what are the charges on credit card late payment in Bank of America")
displacy.render(nlp(doc.text),style='ent', jupyter=True)

In [None]:
doc = nlp("I lost my investment account password and cannot open my account now")
displacy.render(nlp(doc.text),style='ent', jupyter=True)

In [None]:
doc = nlp("what is the status of my loan account")
displacy.render(nlp(doc.text),style='ent', jupyter=True)