## Annotate your data for NER Training 📣

#### Author: Deepak John Reji



Youtube: https://youtu.be/Zi9DR4hRQrE

Linkedin: https://www.linkedin.com/in/deepak-john-reji/

### Training data format for Spacy NER

In [None]:
train_data = [('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}),
 ('did you see the F16 landing?', {'entities': [(16, 19, 'aircraft')]}),
 ('how many missiles can a F35 carry', {'entities': [(24, 27, 'aircraft')]}),
 ('is the F15 outdated', {'entities': [(7, 10, 'aircraft')]}),
 ('does the US still train pilots to dog fight?',
  {'entities': [(0, 0, 'aircraft')]}),
 ('how long does it take to train a F16 pilot',
  {'entities': [(33, 36, 'aircraft')]}),
 ('how much does a F35 cost', {'entities': [(16, 19, 'aircraft')]}),
 ('would it be possible to steal a F15', {'entities': [(32, 35, 'aircraft')]}),
 ('who manufactures the F16', {'entities': [(21, 24, 'aircraft')]}),
 ('how many countries have bought the F35',
  {'entities': [(35, 38, 'aircraft')]}),
 ('is the F35 a waste of money', {'entities': [(7, 10, 'aircraft')]})]

#### Step-1 : Create your annotations with the help of an external tool
https://tecoholic.github.io/ner-annotator/

#### Step-2 : Read the annotated json file

In [1]:
import pandas as pd
import json
import os
os.chdir(r'C:\Users\deepa\Downloads')

with open('annotations.json', 'r') as f:
    data = json.load(f)

In [2]:
data

{'classes': ['AIRCRAFT'],
 'annotations': [['The F15 aircraft uses a lot of fuel\r',
   {'entities': [[4, 7, 'AIRCRAFT']]}],
  ['did you see the F16 landing?\r', {'entities': [[16, 19, 'AIRCRAFT']]}],
  ['how many missiles can a F35 carry\r',
   {'entities': [[24, 27, 'AIRCRAFT']]}],
  ['is the F15 outdated\r', {'entities': [[7, 10, 'AIRCRAFT']]}],
  ['does the US still train pilots to dog fight\r', {'entities': []}],
  ['how long does it take to train a F16 pilot\r',
   {'entities': [[33, 36, 'AIRCRAFT']]}],
  ['how much does a F35 cost\r', {'entities': [[16, 19, 'AIRCRAFT']]}],
  ['would it be possible to steal a F15\r',
   {'entities': [[32, 35, 'AIRCRAFT']]}],
  ['who manufactures the F16\r', {'entities': [[21, 24, 'AIRCRAFT']]}],
  ['how many countries have bought the F35\r',
   {'entities': [[35, 38, 'AIRCRAFT']]}],
  ['is the F35 a waste of money', {'entities': [[7, 10, 'AIRCRAFT']]}]]}

#### Step-3 : Run this custom code to convert to required format

In [3]:
entity_name = "AIRCRAFT"

train_data = data['annotations']
train_data = [tuple(i) for i in train_data]

In [4]:
train_data

[('The F15 aircraft uses a lot of fuel\r', {'entities': [[4, 7, 'AIRCRAFT']]}),
 ('did you see the F16 landing?\r', {'entities': [[16, 19, 'AIRCRAFT']]}),
 ('how many missiles can a F35 carry\r', {'entities': [[24, 27, 'AIRCRAFT']]}),
 ('is the F15 outdated\r', {'entities': [[7, 10, 'AIRCRAFT']]}),
 ('does the US still train pilots to dog fight\r', {'entities': []}),
 ('how long does it take to train a F16 pilot\r',
  {'entities': [[33, 36, 'AIRCRAFT']]}),
 ('how much does a F35 cost\r', {'entities': [[16, 19, 'AIRCRAFT']]}),
 ('would it be possible to steal a F15\r',
  {'entities': [[32, 35, 'AIRCRAFT']]}),
 ('who manufactures the F16\r', {'entities': [[21, 24, 'AIRCRAFT']]}),
 ('how many countries have bought the F35\r',
  {'entities': [[35, 38, 'AIRCRAFT']]}),
 ('is the F35 a waste of money', {'entities': [[7, 10, 'AIRCRAFT']]})]

In [5]:
for i in train_data:
    if i[1]['entities'] == []:
        i[1]['entities'] = (0, 0, entity_name)
    else:
        i[1]['entities'][0] = tuple(i[1]['entities'][0])

In [6]:
train_data

[('The F15 aircraft uses a lot of fuel\r', {'entities': [(4, 7, 'AIRCRAFT')]}),
 ('did you see the F16 landing?\r', {'entities': [(16, 19, 'AIRCRAFT')]}),
 ('how many missiles can a F35 carry\r', {'entities': [(24, 27, 'AIRCRAFT')]}),
 ('is the F15 outdated\r', {'entities': [(7, 10, 'AIRCRAFT')]}),
 ('does the US still train pilots to dog fight\r',
  {'entities': (0, 0, 'AIRCRAFT')}),
 ('how long does it take to train a F16 pilot\r',
  {'entities': [(33, 36, 'AIRCRAFT')]}),
 ('how much does a F35 cost\r', {'entities': [(16, 19, 'AIRCRAFT')]}),
 ('would it be possible to steal a F15\r',
  {'entities': [(32, 35, 'AIRCRAFT')]}),
 ('who manufactures the F16\r', {'entities': [(21, 24, 'AIRCRAFT')]}),
 ('how many countries have bought the F35\r',
  {'entities': [(35, 38, 'AIRCRAFT')]}),
 ('is the F35 a waste of money', {'entities': [(7, 10, 'AIRCRAFT')]})]

#### Step-4 : Continue from Step-2 in this notebook 
https://github.com/dreji18/NER-Training-Spacy-3.0/blob/main/NER%20Training%20with%20Spacy%20v3%20Notebook.ipynb

video tutorial for NER training: https://www.youtube.com/watch?v=9mXoGxAn6pM&t=57s

## Yessss!!! we made it 😀😀😀

![download.jpeg](attachment:download.jpeg)