This notebook provide example of training NER model.

In [12]:
import os
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.append(str(project_root))

import json
import logging

from ner.trainer_ner import AnimalNERTrainer

from ner.inference_ner import extract_animals

In [3]:
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s - %(levelname)s - %(message)s',
                    force=True)
logger = logging.getLogger(__name__)

In [None]:
trainer = AnimalNERTrainer(
    data_dir='../data/texts',
    model_path='../models/ner_model',
)

trainer.train()

2025-07-15 14:56:20,651 - INFO - Using device: cpu
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at dslim/distilbert-NER and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([9]) in the checkpoint and torch.Size([2]) in the model instantiated
- classifier.weight: found shape torch.Size([9, 768]) in the checkpoint and torch.Size([2, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
logger.info("Starting training...")
trainer.train()

2025-07-15 14:56:38,960 - INFO - Starting training...
2025-07-15 14:56:39,061 - INFO - Loaded 872 records from ../data/texts\train_ner.json
2025-07-15 14:56:39,105 - INFO - Loaded 187 records from ../data/texts\val_ner.json
2025-07-15 14:56:39,126 - INFO - Epoch 1/3
Training: 100%|██████████| 55/55 [05:55<00:00,  6.47s/it]
2025-07-15 15:02:35,225 - INFO - Training Loss: 0.0823
Evaluating: 100%|██████████| 12/12 [00:14<00:00,  1.22s/it]
2025-07-15 15:02:50,581 - INFO - Validation Loss: 0.0039, F1 Score: 0.9986
2025-07-15 15:02:52,888 - INFO - Save best model to models/ner_model with F1: 0.9986.
2025-07-15 15:02:52,890 - INFO - Epoch 2/3
Training: 100%|██████████| 55/55 [05:49<00:00,  6.36s/it]
2025-07-15 15:08:42,814 - INFO - Training Loss: 0.0015
Evaluating: 100%|██████████| 12/12 [00:21<00:00,  1.83s/it]
2025-07-15 15:09:05,362 - INFO - Validation Loss: 0.0020, F1 Score: 0.9993
2025-07-15 15:09:07,633 - INFO - Save best model to models/ner_model with F1: 0.9993.
2025-07-15 15:09:07,63

In [9]:
logger.info("Loading best model for evaluation...")
trainer.load_model()

2025-07-15 15:15:47,774 - INFO - Loading best model for evaluation...
2025-07-15 15:15:49,025 - INFO - Model loaded from models/ner_model


In [10]:
def load_json(path):
    with open(path, 'r', encoding='utf-8') as f:
        return json.load(f)

In [13]:
logger.info("Evaluating on test set...")
test_data = load_json(os.path.join(trainer.data_dir, 'test_ner.json'))
test_loader = trainer.create_dataloader(test_data,
                                        batch_size=trainer.batch_size,
                                        shuffle=False)
test_loss, metrics = trainer.evaluate(test_loader)

logger.info(f"Test Loss: {test_loss:.4f}")
logger.info(f"Test F1 Score: {metrics['f1_score']:.4f}")
logger.info(f"Test precision: {metrics['precision']:.4f}")
logger.info(f"Test recall: {metrics['recall']:.4f}")

2025-07-15 15:17:25,876 - INFO - Evaluating on test set...
Evaluating: 100%|██████████| 12/12 [00:31<00:00,  2.60s/it]
2025-07-15 15:17:57,281 - INFO - Test Loss: 0.0005
2025-07-15 15:17:57,284 - INFO - Test F1 Score: 1.0000
2025-07-15 15:17:57,287 - INFO - Test precision: 1.0000
2025-07-15 15:17:57,291 - INFO - Test recall: 1.0000


In [14]:
test_text = "I saw a cat playing with a ball."

# test_tokens = tokenize(test_text)

predictions = trainer.predict(test_text)
animals = trainer.extract_entities(test_text)

print("\nPrediction Results:")
for token, label in predictions:
    print(f"{token}: {label}")

print(f"\nExtracted Animals: {animals}")


Prediction Results:
i: O
saw: O
a: O
cat: B-ANIMAL
playing: O
with: O
a: O
ball: O

Extracted Animals: ['cat']


In [15]:
animals_from_text = extract_animals(test_text)
print(f"Animals found in text: {animals_from_text}")

2025-07-15 15:22:40,735 - INFO - Initializing trainer...
2025-07-15 15:22:40,742 - INFO - Using device: cpu
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at dslim/distilbert-NER and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([9]) in the checkpoint and torch.Size([2]) in the model instantiated
- classifier.weight: found shape torch.Size([9, 768]) in the checkpoint and torch.Size([2, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2025-07-15 15:22:42,535 - INFO - Loading best model for prediction...
2025-07-15 15:22:45,448 - INFO - Model loaded from ./models/ner_model


Animals found in text: ['cat']
