<a href="https://colab.research.google.com/github/muhanangmahrub/named-entity-recognizer-aps/blob/main/training_ner_using_bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Read dataset as DataFrame format

In [None]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/data/NER dataset.csv', encoding='latin1')

Checking dataset sample

In [None]:
df.head()

Unnamed: 0,Sentence #,Word,POS,Tag
0,Sentence: 1,Thousands,NNS,O
1,,of,IN,O
2,,demonstrators,NNS,O
3,,have,VBP,O
4,,marched,VBN,O


Showing unique values from POS column

In [None]:
df.POS.unique()

array(['NNS', 'IN', 'VBP', 'VBN', 'NNP', 'TO', 'VB', 'DT', 'NN', 'CC',
       'JJ', '.', 'VBD', 'WP', '``', 'CD', 'PRP', 'VBZ', 'POS', 'VBG',
       'RB', ',', 'WRB', 'PRP$', 'MD', 'WDT', 'JJR', ':', 'JJS', 'WP$',
       'RP', 'PDT', 'NNPS', 'EX', 'RBS', 'LRB', 'RRB', '$', 'RBR', ';',
       'UH', 'FW'], dtype=object)

Count how much each Tag in the entire dataset

In [None]:
df.Tag.value_counts()

Unnamed: 0_level_0,count
Tag,Unnamed: 1_level_1
O,887908
B-geo,37644
B-tim,20333
B-org,20143
I-per,17251
B-per,16990
I-org,16784
B-gpe,15870
I-geo,7414
I-tim,6528


Convert dataset to list sentences, posses, labels

In [None]:
def convert_ner_format(df):
    sentences = []
    labels = []
    posses = []

    current_sentence = []
    current_labels = []
    current_posses = []

    for _, row in df.iterrows():
        sentence_marker = row["Sentence #"]

        if isinstance(sentence_marker, str) and sentence_marker.startswith("Sentence:"):
            # A new sentence starts, save the previous one if not empty
            if current_sentence:
                sentences.append(current_sentence)
                labels.append(current_labels)
                posses.append(current_posses)

            # Reset for the new sentence
            current_sentence = []
            current_labels = []
            current_posses = []

        # Add words and labels to the current sentence
        current_sentence.append(row["Word"])
        current_labels.append(row["Tag"])
        current_posses.append(row["POS"])

    # Append the last sentence if not empty
    if current_sentence:
        sentences.append(current_sentence)
        labels.append(current_labels)
        posses.append(current_posses)

    return sentences, labels, posses

sentences, labels, posses = convert_ner_format(df)

Checking sample data from sentences, labels, and posses list

In [None]:
print(' '.join(sentences[5000]))
print(labels[5000])
print(posses[5000])

Separately , officials say a policeman was killed in Mosul when he tried to move a decapitated body that was rigged with explosives .
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
['RB', ',', 'NNS', 'VBP', 'DT', 'NN', 'VBD', 'VBN', 'IN', 'NNP', 'WRB', 'PRP', 'VBD', 'TO', 'VB', 'DT', 'JJ', 'NN', 'WDT', 'VBD', 'VBN', 'IN', 'NNS', '.']


Create list of tags

In [None]:
tags_vals = list(set(df["Tag"].values))
tags_vals

['B-gpe',
 'I-per',
 'I-geo',
 'I-eve',
 'O',
 'I-gpe',
 'I-tim',
 'I-art',
 'B-eve',
 'B-tim',
 'B-nat',
 'B-org',
 'B-art',
 'B-geo',
 'I-nat',
 'I-org',
 'B-per']

In [None]:
tags_vals.append('X')
tags_vals.append('[CLS]')
tags_vals.append('[SEP]')
tags_vals = set(tags_vals)
tags_vals

{'B-art',
 'B-eve',
 'B-geo',
 'B-gpe',
 'B-nat',
 'B-org',
 'B-per',
 'B-tim',
 'I-art',
 'I-eve',
 'I-geo',
 'I-gpe',
 'I-nat',
 'I-org',
 'I-per',
 'I-tim',
 'O',
 'X',
 '[CLS]',
 '[SEP]'}

Create dictionary tags

In [None]:
tag2idx = {t: i for i, t in enumerate(tags_vals)}
tag2idx

{'[SEP]': 0,
 'I-nat': 1,
 'B-gpe': 2,
 'I-tim': 3,
 'B-eve': 4,
 'B-nat': 5,
 'I-org': 6,
 'I-geo': 7,
 'I-eve': 8,
 'O': 9,
 'I-gpe': 10,
 '[CLS]': 11,
 'B-art': 12,
 'I-art': 13,
 'B-per': 14,
 'I-per': 15,
 'B-tim': 16,
 'B-org': 17,
 'B-geo': 18,
 'X': 19}

In [None]:
tag2name={tag2idx[key] : key for key in tag2idx.keys()}
tag2name

{0: '[SEP]',
 1: 'I-nat',
 2: 'B-gpe',
 3: 'I-tim',
 4: 'B-eve',
 5: 'B-nat',
 6: 'I-org',
 7: 'I-geo',
 8: 'I-eve',
 9: 'O',
 10: 'I-gpe',
 11: '[CLS]',
 12: 'B-art',
 13: 'I-art',
 14: 'B-per',
 15: 'I-per',
 16: 'B-tim',
 17: 'B-org',
 18: 'B-geo',
 19: 'X'}

Preparing device to train the model

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()

In [None]:
n_gpu

1

Load BERT tokenizer from transformers

In [None]:
from transformers import BertTokenizer

max_len  = 45
# load tokenizer, with manual file address or pretrained address
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Tokenizing texts and create word piece labels

In [None]:
tokenized_texts = []
word_piece_labels = []
i_inc = 0
for word_list,label in (zip(sentences,labels)):
    temp_lable = []
    temp_token = []

    # Add [CLS] at the front
    temp_lable.append('[CLS]')
    temp_token.append('[CLS]')

    for word,lab in zip(word_list,label):
        token_list = tokenizer.tokenize(str(word))
        for m,token in enumerate(token_list):
            temp_token.append(token)
            if m==0:
                temp_lable.append(lab)
            else:
                temp_lable.append('X')

    # Add [SEP] at the end
    temp_lable.append('[SEP]')
    temp_token.append('[SEP]')

    tokenized_texts.append(temp_token)
    word_piece_labels.append(temp_lable)

    if 5 > i_inc:
        print("No.%d,len:%d"%(i_inc,len(temp_token)))
        print("texts:%s"%(" ".join(temp_token)))
        print("No.%d,len:%d"%(i_inc,len(temp_lable)))
        print("lables:%s"%(" ".join(temp_lable)))
    i_inc +=1

No.0,len:26
texts:[CLS] thousands of demonstrators have marched through london to protest the war in iraq and demand the withdrawal of british troops from that country . [SEP]
No.0,len:26
lables:[CLS] O O O O O O B-geo O O O O O B-geo O O O O O B-gpe O O O O O [SEP]
No.1,len:33
texts:[CLS] families of soldiers killed in the conflict joined the protesters who carried banners with such slogan ##s as " bush number one terrorist " and " stop the bombings . " [SEP]
No.1,len:33
lables:[CLS] O O O O O O O O O O O O O O O O X O O B-per O O O O O O O O O O O [SEP]
No.2,len:16
texts:[CLS] they marched from the houses of parliament to a rally in hyde park . [SEP]
No.2,len:16
lables:[CLS] O O O O O O O O O O O B-geo I-geo O [SEP]
No.3,len:24
texts:[CLS] police put the number of marche ##rs at 10 , 000 while organizers claimed it was 1 , 00 , 000 . [SEP]
No.3,len:24
lables:[CLS] O O O O O O X O O X X O O O O O O X X X X O [SEP]
No.4,len:28
texts:[CLS] the protest comes on the eve of the annual conf

`pad_sequences` ensures that all input sequences have the same length required by the BERT model, by padding shorter sequences and truncating longer ones. This is a standard preprocessing step when using BERT and similar deep learning models for NLP tasks.

In [None]:
from keras.preprocessing.sequence import pad_sequences

input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
                          maxlen=max_len, dtype="long", truncating="post", padding="post")
print((input_ids[0]))

[  101  5190  1997 28337  2031  9847  2083  2414  2000  6186  1996  2162
  1999  5712  1998  5157  1996 10534  1997  2329  3629  2013  2008  2406
  1012   102     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0]


Converts the named entity recognition tags into a numerical format that can be used by the BERT model, and it ensures that all sequences have the same length by padding them. This preprocessing step is essential for preparing data for deep learning models, particularly sequence-based models like BERT, that expect consistent input shapes.

In [None]:
tags = pad_sequences([[tag2idx.get(l) for l in lab] for lab in word_piece_labels],
                     maxlen=max_len, value=tag2idx["O"], padding="post",
                     dtype="long", truncating="post")
print((tags[0]))

[11  9  9  9  9  9  9 18  9  9  9  9  9 18  9  9  9  9  9  2  9  9  9  9
  9  0  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9]


In essence, the attention mask tells the BERT model which tokens to pay attention to and which to ignore (padding tokens). During processing, BERT uses this mask to ensure it doesn't focus on the padding and only considers the relevant information in the input sequence.

In [None]:
attention_masks = [[int(i>0) for i in ii] for ii in input_ids]
attention_masks[0];

In essence, the attention mask tells the BERT model which tokens to pay attention to and which to ignore (padding tokens). During processing, BERT uses this mask to ensure it doesn't focus on the padding and only considers the relevant information in the input sequence.

In [None]:
segment_ids = [[0] * len(input_id) for input_id in input_ids]
segment_ids[1]

[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0]

Split data into training data and validation data

In [None]:
from sklearn.model_selection import train_test_split

tr_inputs, val_inputs, tr_tags, val_tags,tr_masks, val_masks,tr_segs, val_segs = train_test_split(input_ids, tags,attention_masks,segment_ids,
                                                            random_state=4, test_size=0.3)

In [None]:
tr_inputs = torch.tensor(tr_inputs)
val_inputs = torch.tensor(val_inputs)
tr_tags = torch.tensor(tr_tags)
val_tags = torch.tensor(val_tags)
tr_masks = torch.tensor(tr_masks)
val_masks = torch.tensor(val_masks)
tr_segs = torch.tensor(tr_segs)
val_segs = torch.tensor(val_segs)

Define number of batches

In [None]:
batch_num = 32

Creating train dataloader and validation dataloader

In [None]:
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler

# Only set token embedding, attention embedding, no segment embedding
train_data = TensorDataset(tr_inputs, tr_masks, tr_tags)
train_sampler = RandomSampler(train_data)
# Drop last can make batch training better for the last one
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_num,drop_last=True)

valid_data = TensorDataset(val_inputs, val_masks, val_tags)
valid_sampler = SequentialSampler(valid_data)
valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=batch_num)

This code snippet imports a pre-trained BERT model specifically for token classification (like NER) and configures it to handle the number of entity types you have defined in your dataset. It's basically setting up the core model you'll be using for your NER task.

In [None]:
from transformers import BertForTokenClassification

model = BertForTokenClassification.from_pretrained("bert-base-uncased",num_labels=len(tag2idx))

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
model;

In [None]:
model.cuda();

In [None]:
if n_gpu >1:
    model = torch.nn.DataParallel(model)

In [None]:
epochs = 5
max_grad_norm = 1.0

In [None]:
import math

num_train_optimization_steps = int( math.ceil(len(tr_inputs) / batch_num) / 1) * epochs
num_train_optimization_steps

5250

In [None]:
FULL_FINETUNING = False

In short, this code snippet sets up the optimization process for fine-tuning a BERT model, determining whether to fine-tune all layers or just the classifier and configuring the optimizer accordingly. This is a crucial step in adapting the pre-trained BERT model to your specific named entity recognition task.

In [None]:
from transformers import AdamW

# Fine tune model all layer parameters

if FULL_FINETUNING:
    # Fine tune model all layer parameters
    param_optimizer = list(model.named_parameters())
    no_decay = ['bias', 'gamma', 'beta']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
         'weight_decay_rate': 0.01},
        {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
         'weight_decay_rate': 0.0}
    ]
else:
    # Only fine tune classifier parameters
    param_optimizer = list(model.classifier.named_parameters())
    optimizer_grouped_parameters = [{"params": [p for n, p in param_optimizer]}]
optimizer = AdamW(optimizer_grouped_parameters, lr=3e-5)



In [None]:
model.train();

Train a model

In [None]:
from tqdm import tqdm,trange

print("***** Running training *****")
print("  Num examples = %d"%(len(tr_inputs)))
print("  Batch size = %d"%(batch_num))
print("  Num steps = %d"%(num_train_optimization_steps))
for _ in trange(epochs,desc="Epoch"):
    tr_loss = 0
    nb_tr_examples, nb_tr_steps = 0, 0
    for step, batch in enumerate(train_dataloader):
        # add batch to gpu
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_mask, b_labels = batch

        # forward pass
        outputs = model(b_input_ids, token_type_ids=None,
        attention_mask=b_input_mask, labels=b_labels)
        loss, scores = outputs[:2]
        if n_gpu>1:
            # When multi gpu, average it
            loss = loss.mean()

        # backward pass
        loss.backward()

        # track train loss
        tr_loss += loss.item()
        nb_tr_examples += b_input_ids.size(0)
        nb_tr_steps += 1

        # gradient clipping
        torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=max_grad_norm)

        # update parameters
        optimizer.step()
        optimizer.zero_grad()

    # print train loss per epoch
    print("Train loss: {}".format(tr_loss/nb_tr_steps))

***** Running training *****
  Num examples = 33571
  Batch size = 32
  Num steps = 5250


Epoch:  20%|██        | 1/5 [04:43<18:55, 283.85s/it]

Train loss: 0.1078703172391261


Epoch:  40%|████      | 2/5 [09:31<14:18, 286.24s/it]

Train loss: 0.04841629621191918


Epoch:  60%|██████    | 3/5 [14:20<09:34, 287.28s/it]

Train loss: 0.03755891674133399


Epoch:  80%|████████  | 4/5 [19:09<04:47, 287.86s/it]

Train loss: 0.028951950824941137


Epoch: 100%|██████████| 5/5 [23:57<00:00, 287.54s/it]

Train loss: 0.02169505424453061





This code snippet defines the location where the trained BERT model will be saved and makes sure that the directory exists before attempting to save the model there. If the directory doesn't exist, it will be created. This is a common practice in machine learning projects to organize and manage trained models.

In [None]:
import os

bert_out_address = 'models/bert_out_model/en09'
# Make dir if not exits
if not os.path.exists(bert_out_address):
        os.makedirs(bert_out_address)

In [None]:
model_to_save = model.module if hasattr(model, 'module') else model  # Only save the model it-self

In [None]:
output_model_file = os.path.join(bert_out_address, "pytorch_model.bin")
output_config_file = os.path.join(bert_out_address, "config.json")

# Save model into file
torch.save(model_to_save.state_dict(), output_model_file)
model_to_save.config.to_json_file(output_config_file)
tokenizer.save_vocabulary(bert_out_address)

('models/bert_out_model/en09/vocab.txt',)

In [None]:
model = BertForTokenClassification.from_pretrained(bert_out_address,num_labels=len(tag2idx))

model.cuda(); # Set model to GPU

if n_gpu >1:
    model = torch.nn.DataParallel(model)

In [None]:
model.eval();

In summary, this code section takes the trained BERT model, applies it to the validation data, compares the predictions with the true labels, and then calculates and reports various performance metrics to assess the model's effectiveness in named entity recognition.

In [None]:
from sklearn.metrics import f1_score,accuracy_score,classification_report
import torch.nn.functional as F
from sklearn.preprocessing import MultiLabelBinarizer

eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0
y_true = []
y_pred = []

print("***** Running evaluation *****")
print("  Num examples ={}".format(len(val_inputs)))
print("  Batch size = {}".format(batch_num))
for step, batch in enumerate(valid_dataloader):
    batch = tuple(t.to(device) for t in batch)
    input_ids, input_mask, label_ids = batch

#     if step > 2:
#         break

    with torch.no_grad():
        outputs = model(input_ids, token_type_ids=None,
        attention_mask=input_mask,)
        # For eval mode, the first result of outputs is logits
        logits = outputs[0]

    # Get NER predict result
    logits = torch.argmax(F.log_softmax(logits,dim=2),dim=2)
    logits = logits.detach().cpu().numpy()


    # Get NER true result
    label_ids = label_ids.to('cpu').numpy()


    # Only predict the real word, mark=0, will not calculate
    input_mask = input_mask.to('cpu').numpy()

    # Compare the valuable predict result
    for i,mask in enumerate(input_mask):
        # Real one
        temp_1 = []
        # Predict one
        temp_2 = []

        for j, m in enumerate(mask):
            # Mark=0, meaning its a pad word, dont compare
            if m:
                if tag2name[label_ids[i][j]] != "X" and tag2name[label_ids[i][j]] != "[CLS]" and tag2name[label_ids[i][j]] != "[SEP]" : # Exclude the X label
                    temp_1.append(tag2name[label_ids[i][j]])
                    temp_2.append(tag2name[logits[i][j]])
            else:
                break


        y_true.append(temp_1)
        y_pred.append(temp_2)

# Flatten y_true and y_pred
y_true_flat = [item for sublist in y_true for item in sublist]
y_pred_flat = [item for sublist in y_pred for item in sublist]

# Assuming you have a multi-label classification task:
mlb = MultiLabelBinarizer()
y_true_binary = mlb.fit_transform(y_true)  # Use y_true or y_true_flat if necessary
y_pred_binary = mlb.transform(y_pred)

print("f1 socre: %f"%(f1_score(y_true_flat, y_pred_flat, average='weighted'))) # Calculate F1 score with flattened lists and specify average method
print("Accuracy score: %f"%(accuracy_score(y_true_flat, y_pred_flat))) # Calculate Accuracy score with flattened lists

# Get acc, recall, F1 result report
# For multi-label use: classification_report(y_true_binary, y_pred_binary, target_names=mlb.classes_, digits=4)
# For multi-class use: classification_report(y_true_flat, y_pred_flat, digits=4)
report = classification_report(y_true_flat, y_pred_flat, digits=4)

# Save the report into file
output_eval_file = os.path.join(bert_out_address, "eval_results.txt")
with open(output_eval_file, "w") as writer:
    print("***** Eval results *****")
    print("\n%s"%(report))
    print("f1 socre: %f"%(f1_score(y_true_flat, y_pred_flat, average='weighted')))
    print("Accuracy score: %f"%(accuracy_score(y_true_flat, y_pred_flat)))

    writer.write("f1 socre:\n")
    writer.write(str(f1_score(y_true_flat, y_pred_flat, average='weighted')))
    writer.write("\n\nAccuracy score:\n")
    writer.write(str(accuracy_score(y_true_flat, y_pred_flat)))
    writer.write("\n\n")
    writer.write(report)

***** Running evaluation *****
  Num examples =14388
  Batch size = 32




f1 socre: 0.967120
Accuracy score: 0.967487


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


***** Eval results *****

              precision    recall  f1-score   support

       B-art     0.4333    0.1844    0.2587       141
       B-eve     0.5333    0.4000    0.4571       100
       B-geo     0.8633    0.8969    0.8798     11182
       B-gpe     0.9606    0.9396    0.9500      4750
       B-nat     0.5172    0.2419    0.3297        62
       B-org     0.7720    0.7014    0.7350      6064
       B-per     0.8593    0.8381    0.8486      5065
       B-tim     0.8747    0.9164    0.8951      6089
       I-art     0.5000    0.0568    0.1020        88
       I-eve     0.2667    0.1333    0.1778        90
       I-geo     0.7929    0.7907    0.7918      2193
       I-gpe     0.9375    0.6923    0.7965        65
       I-nat     0.8000    0.1739    0.2857        23
       I-org     0.7683    0.7258    0.7464      5054
       I-per     0.8412    0.9188    0.8783      5150
       I-tim     0.7590    0.8297    0.7928      1985
           O     0.9906    0.9903    0.9904    264666
 

In summary, this code snippet takes an input sentence, tokenizes it, converts the tokens into numerical IDs that BERT can understand, and pads the sequences to a fixed length for model input. This is a common preprocessing pipeline when using BERT for NLP tasks like Named Entity Recognition.

In [None]:
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

sentences = "The Israeli army has killed a Palestinian youth in the northern Gaza Strip and wounded at least three other people ."
word_tokens = nltk.word_tokenize(sentences)
pos_tags = nltk.pos_tag(word_tokens)
tokenized_texts = []
word_piece_labels = []
i_inc = 0
temp_token = []
# Add [CLS] at the front
temp_token.append('[CLS]')
for word,lab in pos_tags:
    token_list = tokenizer.tokenize(word)
    for m,token in enumerate(token_list):
        temp_token.append(token)
# Add [SEP] at the end
temp_token.append('[SEP]')
tokenized_texts.append(temp_token)
print("texts:%s"%(" ".join(temp_token)))
input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
                          maxlen=max_len, dtype="long", truncating="post", padding="post")
print(input_ids[0])
b_input_mask = ""

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


texts:[CLS] the israeli army has killed a palestinian youth in the northern gaza strip and wounded at least three other people . [SEP]
[  101  1996  5611  2390  2038  2730  1037  9302  3360  1999  1996  2642
 14474  6167  1998  5303  2012  2560  2093  2060  2111  1012   102     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0]


Using the trained BERT model to make predictions (inference) on a new sentence.

In [None]:
# prompt: continue to inference with the trained model

attention_masks = [[int(i>0) for i in ii] for ii in input_ids]
input_ids = torch.tensor(input_ids).to(device)
attention_masks = torch.tensor(attention_masks).to(device)

with torch.no_grad():
    outputs = model(input_ids, token_type_ids=None,
                    attention_mask=attention_masks)
    logits = outputs[0]

logits = torch.argmax(F.log_softmax(logits,dim=2),dim=2)
logits = logits.detach().cpu().numpy()

print(logits)

predicted_labels = [tag2name[logit] for logit in logits[0]]
predicted_labels


[[ 6 14 17 14 14 14 14 17 14 14 14 14  2 12 14 14 14 14 14 14 14 14 15 14
  14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14]]


['[CLS]',
 'O',
 'B-gpe',
 'O',
 'O',
 'O',
 'O',
 'B-gpe',
 'O',
 'O',
 'O',
 'O',
 'B-geo',
 'I-geo',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 '[SEP]',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O']