# **Geoparsing: Prediction Evaluation**
---
**Prepared by**: Feyi Adesanya

**Submission Date**: April 30, 2024

In [1]:
import textwrap

In [2]:
from Pre.Preprocess import Preprocess
from ML.CRF_Manager import CRF_Manager
from Gaz.Gazetteer import Gazetteer
from ML.Baseline_Manager import Baseline_Manager
from ML.BI_LSTM_Manager import BI_LSTM_Manager
from ML.SVM_Manager import SVM_Manager
from ML.BERT_Manager import BERT_Manager

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
gaz = Gazetteer()

Retrieving Locations Array from Saved Data
Corpus has 133639 Locations
Retrieving BK Tree from Saved Data


In [4]:
preprocess = Preprocess(gaz)
preprocess.extract_train_data()

Retrieving Corpus from Saved Data
Corpus has 588 documents


In [5]:
print("Sample Corpus Entry")
print("-"*50)
print("Text: ", end="")
print(textwrap.fill(preprocess.corpus[0]['text'], width=120))
print()
print(f"Toponyms: {preprocess.corpus[0]['toponyms']}")
print(f"GeoNames IDs: {preprocess.corpus[0]['geoIDs']}")
print(f"Labels: {preprocess.corpus[0]['labels']}")
print(f"Features: {preprocess.corpus[0]['features'][3]}...........")


Sample Corpus Entry
--------------------------------------------------
Text: Alexandria woman charged in connection with Kelleyland fire. Chiquita Raquel Henry, 19, of 1935 Orchard St., Alexandria,
was arrested and charged with aggravated arson and unauthorized entry of an inhabited dwelling. Both the Sheriff’s
Office and Rapides Parish Fire District 2 investigated the March 7 fire at 6016 Dublin Road that led to Henry’s arrest.
According to the Sheriff’s Office, the man who lives in the mobile home at that address was inside the home when Henry
came in and set the bed and couch on fire with a lighter. The man only knew the woman by a nickname but investigation
led detectives to Henry.

Toponyms: [{'phrase': 'Alexandria', 'start': '0', 'end': '10', 'geonameid': 4314550, 'name': 'Alexandria', 'fclass': 'P', 'fcode': 'PPL', 'lat': 31.3113, 'lon': -92.4451, 'country': 'United States', 'admin1': 'Louisiana'}, {'phrase': 'Alexandria', 'start': '109', 'end': '119', 'geonameid': 4314550, 'nam

# Prediction Model Results Analysis

### Baseline Model

In [6]:
test_baseline = Baseline_Manager(gaz, preprocess)
test_baseline.predict_corpus()

--------------------------------------------------
Base Scores
Accuracy: 0.8113634283067943
Precision: 0.9581236525580117
Recall: 0.8113634283067943
F1 Score: 0.8716944829103495
--------------------------------------------------
Relevant Scores: Labels of Interest: B-LOC and I-LOC
Accuracy: 0.6012752075919335
Precision: 0.9418148305098311
Recall: 0.6012752075919335
F1 Score: 0.732003186515096


  _warn_prf(average, modifier, msg_start, len(result))


--------------------------------------------------
Classification Report:
              precision    recall  f1-score   support

       B-LOC       0.09      0.63      0.16      4980
       I-LOC       0.82      0.51      0.63      1764
           O       0.98      0.82      0.89    178957

    accuracy                           0.81    185701
   macro avg       0.63      0.66      0.56    185701
weighted avg       0.96      0.81      0.87    185701



In [7]:
text = "I'm leaving for Toronto tomorrow, I'll be sure to go into Calgary too"
test_baseline.new_prediction(text)

['toronto', 'be', 'to', 'calgary']

### CRF Model

In [8]:
test_CRF = CRF_Manager(gaz, preprocess)
test_CRF.predict_corpus()

Weights loaded successfully.
--------------------------------------------------
Base Scores
Accuracy: 0.9888516011603435
Precision: 0.9882925795120062
Recall: 0.9888516011603435
F1 Score: 0.9882769035825909
--------------------------------------------------
Relevant Scores: Labels of Interest: B-LOC and I-LOC
Accuracy: 0.7364864864864865
Precision: 0.9839289947842579
Recall: 0.7364864864864865
F1 Score: 0.8422852691145374
--------------------------------------------------
Classification Report:
              precision    recall  f1-score   support

       B-LOC       0.91      0.75      0.82       425
       I-LOC       0.88      0.69      0.78       167
           O       0.99      1.00      0.99     16989

    accuracy                           0.99     17581
   macro avg       0.93      0.82      0.87     17581
weighted avg       0.99      0.99      0.99     17581



  _warn_prf(average, modifier, msg_start, len(result))


In [9]:
text = "I'm leaving for Toronto tomorrow, I'll be sure to go into Calgary too"
test_CRF.new_prediction(text)

['toronto', 'calgary']

### BI LSTM Model

In [10]:
test_BILSTM = BI_LSTM_Manager(gaz, preprocess)
test_BILSTM.predict_corpus()

  trackable.load_own_variables(weights_store.get(inner_path))


Weights loaded successfully.
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 4s/step


  _warn_prf(average, modifier, msg_start, len(result))


--------------------------------------------------
Base Scores
Accuracy: 0.9890777043320378
Precision: 0.9903665888851142
Recall: 0.9890777043320378
F1 Score: 0.9895388029513108
--------------------------------------------------
Relevant Scores: Labels of Interest: B-LOC and I-LOC
Accuracy: 0.9219474497681608
Precision: 0.9830343599073025
Recall: 0.9219474497681608
F1 Score: 0.9514484837074295
--------------------------------------------------
Classification Report:
              precision    recall  f1-score   support

       B-LOC       0.82      0.92      0.87       939
       I-LOC       0.74      0.92      0.82       355
           O       1.00      0.99      0.99     33955

   micro avg       0.99      0.99      0.99     35249
   macro avg       0.85      0.94      0.89     35249
weighted avg       0.99      0.99      0.99     35249



  _warn_prf(average, modifier, msg_start, len(result))


In [11]:
text = "I'm leaving for Toronto tomorrow, I'll be sure to go into Calgary too"
test_BILSTM.new_prediction(text)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 468ms/step


['toronto', 'calgary']

### SVM Model

In [12]:
test_SVM = SVM_Manager(gaz, preprocess)
test_SVM.predict_corpus()

Weights loaded successfully.
--------------------------------------------------
Base Scores
Accuracy: 0.996176834850035
Precision: 0.9961753389769372
Recall: 0.996176834850035
F1 Score: 0.9961678439607572
--------------------------------------------------
Relevant Scores: Labels of Interest: B-LOC and I-LOC
Accuracy: 0.9400584795321637
Precision: 0.9953923269712742
Recall: 0.9400584795321637
F1 Score: 0.9668993795265762
--------------------------------------------------
Classification Report:
              precision    recall  f1-score   support

       B-LOC       0.96      0.93      0.95       478
       I-LOC       0.97      0.93      0.95       206
           O       1.00      1.00      1.00     17887

   micro avg       1.00      1.00      1.00     18571
   macro avg       0.98      0.95      0.96     18571
weighted avg       1.00      1.00      1.00     18571
 samples avg       1.00      1.00      1.00     18571



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [13]:
text = "I'm leaving for Toronto tomorrow, I'll be sure to go into Calgary too"
test_SVM.new_prediction(text)

['toronto', 'calgary']

### Custom BERT Model

In [14]:
test_BERT = BERT_Manager(gaz, preprocess)
test_BERT.predict_corpus()

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Weights loaded successfully.


Validation: 100%|██████████| 1/1 [03:27<00:00, 207.91s/it]


Validation Loss: 0.01
Accuracy: 0.9974201749794361
Relevant Accuracy: 0.9937733499377335
--------------------------------------------------
Base Scores
Accuracy: 0.9974201749794361
Precision: 0.9975015088583858
Recall: 0.9974201749794361
F1 Score: 0.9974420204073391

Relevant Scores: Labels of Interest: B-LOC and I-LOC
Accuracy: 0.9937733499377335
Precision: 1.0
Recall: 0.9937733499377335
F1 Score: 0.9968751270273566
--------------------------------------------------
Classification Report:
              precision    recall  f1-score   support

       B-LOC       0.97      0.99      0.98      1353
       I-LOC       0.93      1.00      0.96       253
           O       1.00      1.00      1.00     25140

   micro avg       1.00      1.00      1.00     26746
   macro avg       0.97      1.00      0.98     26746
weighted avg       1.00      1.00      1.00     26746



In [15]:
text = "I'm leaving for Toronto tomorrow, I'll be sure to go into Calgary too"
test_BERT.new_prediction(text)

Prediction: 100%|██████████| 1/1 [00:04<00:00,  4.10s/it]


['toronto', 'calgary']