This notebook contains a qualitative comparisons of best NER models based on research, so as to decide which one to implement. 

In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

import spacy

# !python -m spacy download en_core_web_trf
!python -m spacy download en_core_web_md


##Model Comparisons:
`en_core_web_sm`, `en_core_web_md` and nltk models.

In [None]:
nlp_sm = spacy.load("en_core_web_sm")
nlp_md = spacy.load("en_core_web_md")

In [None]:
sentences = ["WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement.",
             "Nigerian President Muhammadu Buhari has condemned the killing of one of two Catholic priests by kidnappers in the northern state of Kaduna.",
             'He said the armed groups behind the spate of kidnappings of Christian clerics "seem to be bent on creating chaos and disorder in the country"',
             "Emmanuel Onuarah, the president of the Premium Bread Makers Association, told the BBC that some bakeries had had to fire their staff as they were not able to pay salaries.",
             "He suggested that the government should stop charging a 15% tax on imported wheat - the price of which has already shot up this year because of the war in Ukraine.",
             "The youths had attended a wedding in Imo state and were on their way home when they were shot, Amnesty said.",
             "The famous Nigerian author Chimamanda Ngozi Adichie has thrown her support behind the Labour Party's presidential candidate, Peter Obi, ahead of the country's February 2023 elections.",
             "He was dubbed Super Mario for his handling of the eurozone crisis as head of the European Central Bank. In February last year, he was given the task of guiding Italy through the Covid pandemic and economic recovery, bolstered by a big EU package conditional on major reforms."]

In [None]:
# spaCy small NER model
for sentence in sentences:
  doc = nlp_sm(sentence)
  print("\n")
  for entity in doc.ents:
    print(entity.text, entity.label_)



WASHINGTON GPE
New York GPE
the 1990s DATE
Loretta E. Lynch PERSON
Brooklyn GPE
African-Americans NORP


Nigerian NORP
Muhammadu Buhari PERSON
one CARDINAL
two CARDINAL
Catholic NORP
Kaduna GPE


Christian NORP


Emmanuel Onuarah PERSON
the Premium Bread Makers Association ORG
BBC ORG


15% PERCENT
this year DATE
Ukraine GPE


Amnesty ORG


Nigerian NORP
Chimamanda Ngozi Adichie PERSON
the Labour Party's ORG
Peter Obi PERSON
February 2023 DATE


the European Central Bank ORG
February last year DATE
Italy GPE
EU GPE


In [None]:
# spaCy medium NER model
for sentence in sentences:
  doc = nlp_md(sentence)
  print("\n")
  for entity in doc.ents:
    print(entity.text, entity.label_)



WASHINGTON GPE
New York GPE
the 1990s DATE
Loretta E. Lynch PERSON
Brooklyn GPE
African-Americans NORP


Nigerian NORP
Muhammadu Buhari PERSON
one CARDINAL
two CARDINAL
Catholic NORP
Kaduna GPE


Christian NORP


Emmanuel Onuarah PERSON
the Premium Bread Makers Association ORG
BBC ORG


15% PERCENT
this year DATE
Ukraine GPE


Amnesty ORG


Nigerian NORP
Chimamanda Ngozi Adichie PERSON
the Labour Party's ORG
Peter Obi PERSON
February 2023 DATE


the European Central Bank ORG
February last year DATE
Italy GPE
EU ORG


In [None]:
# NLTK
for sentence in sentences:
  for sent in nltk.sent_tokenize(sentence):
    for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
      if hasattr(chunk, 'label'):
          print(chunk.label(), ' '.join(c[0] for c in chunk))

GPE WASHINGTON
GPE New York
PERSON Loretta E. Lynch
GPE Brooklyn
GPE Nigerian
PERSON Muhammadu Buhari
ORGANIZATION Catholic
GPE Kaduna
GPE Christian
PERSON Emmanuel
ORGANIZATION Onuarah
ORGANIZATION BBC
GPE Ukraine
GPE Imo
PERSON Amnesty
GPE Nigerian
ORGANIZATION Chimamanda Ngozi Adichie
ORGANIZATION Labour Party
PERSON Peter Obi
PERSON Super Mario
ORGANIZATION European Central Bank
GPE Italy
GPE Covid
GPE EU


These results are analysed [here](https://1drv.ms/w/s!AhOCUnnVayEAe73EAJl1oQExymo?e=xqqqU6).


Since the first model (`en_core_web_sm`) has a small size (12MB) and outperformed NLTK, it could be more efficient for implementation. 