# NAMED ENTITY RECOGNITION using BERT

### NAMED ENTITY RECOGNITION:

1. The named entities are pre-defined categories chosen according to the use case such as names of people, organizations, places, codes, time notations, monetary values, etc.

1. NER aims to assign a class to each token (usually a single word) in a sequence. Because of this, NER is also referred to as token classification.

## Simple Transformers
Simple Transformers library is based on the Transformers library by HuggingFace. Simple Transformers lets you quickly train and evaluate Transformer models. Only 3 lines of code are needed to initialize a model, train the model, and evaluate a model. Currently supports Sequence Classification, Token Classification (NER), and Question Answering.

In [2]:
!pip install simpletransformers

Collecting simpletransformers
  Downloading simpletransformers-0.63.6-py3-none-any.whl (249 kB)
[?25l[K     |█▎                              | 10 kB 19.8 MB/s eta 0:00:01[K     |██▋                             | 20 kB 10.6 MB/s eta 0:00:01[K     |████                            | 30 kB 9.1 MB/s eta 0:00:01[K     |█████▎                          | 40 kB 8.4 MB/s eta 0:00:01[K     |██████▋                         | 51 kB 4.5 MB/s eta 0:00:01[K     |████████                        | 61 kB 5.4 MB/s eta 0:00:01[K     |█████████▏                      | 71 kB 5.5 MB/s eta 0:00:01[K     |██████████▌                     | 81 kB 4.3 MB/s eta 0:00:01[K     |███████████▉                    | 92 kB 4.7 MB/s eta 0:00:01[K     |█████████████▏                  | 102 kB 5.2 MB/s eta 0:00:01[K     |██████████████▌                 | 112 kB 5.2 MB/s eta 0:00:01[K     |███████████████▉                | 122 kB 5.2 MB/s eta 0:00:01[K     |█████████████████               | 133 kB 5.

In [1]:
import pandas as pd
data = pd.read_csv("ner_dataset.csv",encoding="latin1" )

In [2]:
data.head(30)

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,,of,IN,O
2,,demonstrators,NNS,O
3,,have,VBP,O
4,,marched,VBN,O
5,,through,IN,O
6,,London,NNP,B-geo
7,,to,TO,O
8,,protest,VB,O
9,,the,DT,O


In [3]:
data =data.fillna(method ="ffill")

In [4]:
data.head(30)

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,Sentence: 1,of,IN,O
2,Sentence: 1,demonstrators,NNS,O
3,Sentence: 1,have,VBP,O
4,Sentence: 1,marched,VBN,O
5,Sentence: 1,through,IN,O
6,Sentence: 1,London,NNP,B-geo
7,Sentence: 1,to,TO,O
8,Sentence: 1,protest,VB,O
9,Sentence: 1,the,DT,O


In [5]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [6]:
data["Sentence #"] = LabelEncoder().fit_transform(data["Sentence #"] )

In [7]:
data.head(30)

Unnamed: 0,Sentence #,Word,POS,Tag
0,0,Thousands,NNS,O
1,0,of,IN,O
2,0,demonstrators,NNS,O
3,0,have,VBP,O
4,0,marched,VBN,O
5,0,through,IN,O
6,0,London,NNP,B-geo
7,0,to,TO,O
8,0,protest,VB,O
9,0,the,DT,O


In [8]:
data.rename(columns={"Sentence #":"sentence_id","Word":"words","Tag":"labels"}, inplace =True)

In [9]:
data["labels"] = data["labels"].str.upper()

###seperate dependent and independent terms

In [10]:
X= data[["sentence_id","words"]]
Y =data["labels"]

### train and test split

In [11]:
x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size =0.2)

In [12]:
#building up train data and test data
train_data = pd.DataFrame({"sentence_id":x_train["sentence_id"],"words":x_train["words"],"labels":y_train})
test_data = pd.DataFrame({"sentence_id":x_test["sentence_id"],"words":x_test["words"],"labels":y_test})

In [13]:
train_data

Unnamed: 0,sentence_id,words,labels
399047,9149,his,O
155196,19906,'s,O
289693,3633,they,O
16147,20140,'s,O
456959,12102,of,O
...,...,...,...
168910,20616,Affairs,I-ORG
259163,2067,said,O
506203,14604,antagonist,O
63673,15255,conservative,O


In [14]:
test_data

Unnamed: 0,sentence_id,words,labels
233869,787,countries,O
191995,21822,mid-May,B-TIM
319455,5143,decided,O
397549,9071,.,O
391030,8754,will,O
...,...,...,...
316148,4970,of,O
343085,6325,a,O
498455,14209,on,O
326190,5486,pollution,O


# Model Training


In [15]:
from simpletransformers.ner import NERModel,NERArgs

In [16]:
#unique labels in target coloumn
label = data["labels"].unique().tolist()
label

['O',
 'B-GEO',
 'B-GPE',
 'B-PER',
 'I-GEO',
 'B-ORG',
 'I-ORG',
 'B-TIM',
 'B-ART',
 'I-ART',
 'I-PER',
 'I-GPE',
 'I-TIM',
 'B-NAT',
 'B-EVE',
 'I-EVE',
 'I-NAT']

In [19]:
#NER argument parameters for model training
args = NERArgs()
args.num_train_epochs = 2
args.learning_rate = 1e-4
args.overwrite_output_dir =True
args.train_batch_size = 32
args.eval_batch_size = 32


In [20]:
# Initialize a BERT model according to the specified arguments and labels
model = NERModel('bert', 'bert-base-cased',labels=label,args =args)


Downloading:   0%|          | 0.00/416M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cas

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

In [21]:
model.train_model(train_data,eval_data = test_data,acc=accuracy_score)

  0%|          | 0/3 [00:00<?, ?it/s]



Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 0 of 2:   0%|          | 0/724 [00:00<?, ?it/s]



Running Epoch 1 of 2:   0%|          | 0/724 [00:00<?, ?it/s]

(1448, 0.1799740410739175)

In [22]:
result, model_outputs, preds_list = model.eval_model(test_data)

  0%|          | 0/2 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/706 [00:00<?, ?it/s]

In [23]:
result

{'eval_loss': 0.18141045550111334,
 'precision': 0.8183743518586138,
 'recall': 0.7549178119105362,
 'f1_score': 0.7853663664715982}

In [28]:
prediction, model_output = model.predict(["Bengaluru is the capital of India's southern Karnataka state."])

  0%|          | 0/1 [00:00<?, ?it/s]

Running Prediction:   0%|          | 0/1 [00:00<?, ?it/s]

In [29]:
prediction

[[{'Bengaluru': 'B-GEO'},
  {'is': 'O'},
  {'the': 'O'},
  {'capital': 'O'},
  {'of': 'O'},
  {"India's": 'B-GEO'},
  {'southern': 'O'},
  {'Karnataka': 'B-GEO'},
  {'state.': 'O'}]]

In [31]:
model_outputs

[[[[-1.951,
    7.25,
    2.73,
    0.4739,
    0.6235,
    2.932,
    0.7095,
    -0.6973,
    -1.131,
    -2.244,
    -1.934,
    -2.49,
    -0.8564,
    -2.592,
    -1.614,
    -2.375,
    -2.793]],
  [[9.98,
    -0.5225,
    -2.27,
    -1.34,
    -0.5522,
    -0.8857,
    1.526,
    -0.2722,
    -1.689,
    -1.552,
    -0.61,
    -1.791,
    0.735,
    -1.866,
    -2.346,
    -1.823,
    -1.728]]],
 [[[10.5,
    -0.2125,
    -2.35,
    -1.3545,
    -1.074,
    -0.7583,
    1.731,
    0.4468,
    -1.643,
    -1.64,
    -0.857,
    -2.408,
    0.306,
    -2.031,
    -2.234,
    -2.27,
    -2.09]],
  [[8.695,
    -0.05872,
    -2.312,
    -1.956,
    0.4377,
    0.648,
    2.705,
    -0.1885,
    -1.91,
    -1.834,
    -0.862,
    -2.623,
    0.812,
    -3.1,
    -2.97,
    -2.549,
    -2.734]],
  [[11.09,
    0.00897,
    -1.546,
    -0.9814,
    -1.62,
    -0.3552,
    0.2822,
    0.0621,
    -1.282,
    -1.55,
    -0.9326,
    -2.057,
    -0.3806,
    -1.899,
    -1.81,
    -2.531,

In [30]:
preds_list

[['B-GEO', 'O'],
 ['O', 'O', 'O', 'B-ORG', 'O'],
 ['O', 'B-TIM', 'O', 'O'],
 ['O', 'O'],
 ['O', 'B-TIM', 'I-PER', 'O', 'O', 'O', 'B-GPE', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'B-GEO', 'O', 'B-GEO', 'I-GEO', 'O', 'O'],
 ['O', 'O'],
 ['O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'B-GEO', 'O', 'O', 'I-PER'],
 ['O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'B-PER'],
 ['I-PER', 'B-GEO', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O'],
 ['O', 'B-TIM', 'O', 'O', 'O', 'O', 'B-PER', 'O'],
 ['B-PER', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'B-PER'],
 ['O', 'O', 'O'],
 ['O', 'O'],
 ['O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'B-PER', 'O'],
 ['O', 'O', 'B-TIM'],
 ['O', 'O', 'O', 'B-GEO', 'O'],
 ['O', 'O', 'O', 'O', 'O', 'O'],
 ['O', 'O'],
 ['O', 'O'],
 ['O', 'O'],
 ['B-ORG', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O'],
 ['O', 'O', 'O', 'O', 'O'],
 ['O', 'O'],
 ['O', 'O', 'O', 'O', 'O'],
 ['O', 'B-GEO', 'B-ORG', 'O'