# NER Model Training and Inference Demo

Notebook demonstrates how to train a BERT-based Named Entity Recognition model and use it for inference on new text.

## Overview

The NER system recognizes **MOUNTAIN** entities in text using:
- **Model**: DistilBERT
- **Task**: Token classification with BIO tagging
- **Evaluation**: Precision, Recall, F1 Score


## Setup and Installation


In [1]:
# Install required packages
%cd NER
!pip install -r requirements.txt

import warnings
warnings.filterwarnings('ignore')

[Errno 2] No such file or directory: 'NER'
/Users/vika/PycharmProjects/Quantum/NER

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Part 1: Model Training

Train a BERT-based NER model on the mountain dataset.


In [2]:
# Initialize the trainer
from NER.src.model.ner_train import NERTrainer

trainer = NERTrainer(model_name="distilbert-base-uncased")

# Load training data
print("Loading training data...")
train_data = trainer.load_data_from_csv('results/data/train_dataset.csv')
print(f"Loaded {len(train_data['tokens'])} samples")

Loading training data...

Loaded 3000 samples


In [3]:
# Train the model
from NER.src.model.ner_inference import NERInference

print("Starting training...")

trained_trainer = trainer.train(
    training_data=train_data,
    output_dir="results/ner-model",
    num_train_epochs=3,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    learning_rate=2e-5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1"
)

Starting training...
Training samples: 2400
Validation samples: 600


Map:   0%|          | 0/2400 [00:00<?, ? examples/s]

Map:   0%|          | 0/600 [00:00<?, ? examples/s]

Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Starting training...


Epoch,Training Loss,Validation Loss,Precision,Recall,F1
1,No log,0.096089,0.714286,0.816667,0.762053
2,No log,0.037464,0.927632,0.94,0.933775
3,No log,0.031527,0.931373,0.95,0.940594


Model saved to results/ner-model


### Visualize Training Metrics

Plot the training metrics to see how the model performed:


In [4]:
NERTrainer.plot_training_metrics(trained_trainer, show_plot=True)

## Part 2: Model Inference


In [5]:
# Initialize the inference pipeline
print("Loading trained model...")
ner_inference = NERInference(model_path="results/ner-model")
print("✓ Model loaded successfully!")


Device set to use cpu


Loading trained model...
✓ Model loaded successfully!


### Single Text Prediction

Test the model on individual sentences:


In [9]:
test_texts = [
    "Mount Everest is the highest peak in the world.",
    "I climbed Kilimanjaro last year and it was amazing.",
    "The Matterhorn is a famous mountain in the Swiss Alps.",
    "Denali National Park features the tallest mountain in North America."
]

for text in test_texts:
    print(f"Text: {text}")
    entities = ner_inference.predict_single(text)
    if entities:
        for entity in entities:
            print(f"  Mountain: {entity['word']:20s} | Confidence: {entity['score']:.3f}")
    else:
        print("  No mountains detected")
    print("-"*50)


Text: Mount Everest is the highest peak in the world.
  Mountain: mount everest        | Confidence: 0.962
--------------------------------------------------
Text: I climbed Kilimanjaro last year and it was amazing.
  Mountain: kilimanjaro          | Confidence: 0.900
--------------------------------------------------
Text: The Matterhorn is a famous mountain in the Swiss Alps.
  Mountain: matterhorn           | Confidence: 0.983
--------------------------------------------------
Text: Denali National Park features the tallest mountain in North America.
  Mountain: denali               | Confidence: 0.965
--------------------------------------------------


### Prediction from prepared csv


In [10]:
import pandas as pd

df = pd.read_csv('results/data/inference_dataset.csv')
sample_texts = df['text'].tolist()

results_df = ner_inference.predict_to_dataframe(sample_texts)
print("Prediction Results:")
print(results_df.head())


Prediction Results:
                                                text             entity  \
0  The Matterhorn, a famous mountain with a disti...         matterhorn   
1  Fuji, Japan's highest mountain, is an active s...               fuji   
2  Aspiring Mountain, also known as Aspiring, is ...  aspiring mountain   
3  Aspiring Mountain, also known as Aspiring, is ...           aspiring   
4  Mount Whitney, at 14,505 feet, is the highest ...      mount whitney   

      label  confidence  
0  MOUNTAIN    0.983667  
1  MOUNTAIN    0.968383  
2  MOUNTAIN    0.816696  
3  MOUNTAIN    0.807276  
4  MOUNTAIN    0.974698  


### Analyze Inference Results


In [12]:
predicted_results = results_df[results_df['entity'].notna()]

print("Inference Statistics:")
print(f"Total texts analyzed:       {len(results_df):,}")
print(f"Texts with entities found:  {len(predicted_results):,}")
print(f"Detection rate:             {len(predicted_results)/len(results_df)*100:.1f}%")

if len(predicted_results) > 0:
    print(f"\nAverage confidence:        {predicted_results['confidence'].mean():.3f}")
    print(f"Highest confidence:        {predicted_results['confidence'].max():.3f}")
    print(f"Lowest confidence:         {predicted_results['confidence'].min():.3f}")


Inference Statistics:
Total texts analyzed:       210
Texts with entities found:  204
Detection rate:             97.1%

Average confidence:        0.937
Highest confidence:        0.984
Lowest confidence:         0.450
