# **NAME ENTITY RECOGNITION**

# **Table of Contents**

1.   [Introduction](#Introduction)
2.   [Prerequisites](#Prerequisites)
3.   [Applications](#Applications)
4.   [Rule-based-model](#Rule-based-models)
5.   [Machinelearning-model](#Machinelearning-model)
6.   [Conclusion](#Conclusion)
7.   [References](#References)

## **Introduction**

Name Entity recognition(NER) is a subtask of Natural language process(NLP) which focuses on identifying and grouping entities within a text or document.
Entities present specific objects or names such as Persons, organizations, dates and times, countries, drugs, and various unique information within a document. 


## **Prerequisites**

Before diving into coding exercises and examples, one should have basic knowledge in the following:
1. python programming 
2. writing and creating algorithms 
3. problem solving, critical thinking and creative skills 

<a id='guide'></a>
## **Applications**

NER is applied in various sectors of our daily lives; so of these applied areas are:
1. Spam detection in emails 
2. 

<a id='guide'></a>
## **Types-of-NER-models**

NER is applied in various sectors of our daily lives; so of these applied areas are:
1. Rule based NER models
2. Machine learning (ML) models
3. Deep learning models 

## **Rule-based-model**
A rule-based NER model is a .....

Examples of rule based approaches 

a. Spacy Entity 

b. NLTK


In [None]:
import nltk
import svgling
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')


sentence = "At eight o'clock on Thursday morning Arthur didn't feel very good. can arthur go to the Ghana"
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)


entities = nltk.chunk.ne_chunk(tagged)
entities

### **code-examples-for-Spacy** 

In [None]:
#installations
%%capture
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_sm

In [None]:
import spacy
import EntityRuler

In [None]:
def spacy_rb_ner(patterns,text,model_name='en'):

  #create a blank model
  nlp = spacy.blank(model_name)

  #create an new entity to for NER
  ruler = nlp.add_pipe("entity_ruler")

  ruler.add_patterns(patterns)

  #extract components from the text
  doc = nlp(text)
  # print(doc)
  for ent in doc.ents:
      print(ent.text, ent.label_)

In [None]:
patterns = [{"label": "AGE", "pattern": [{"like_num": True}, {"lower": "years"}, {"lower": "old"}]}]
text = "John is 25 years old"
spacy_rb_ner(patterns,text)

### **Exercises 1** 

Please list some advantages and disadvantages as you try out these rule based name entity recognition models.

Advantages
1. 
2. 

Disadvantages
1. 
2. 

### **Exercises 2**

Let's try our hands on sample exercises for better understanding 

In [None]:
# Easy
#customize your own pattern and provide your text for testing
pattern =[]
text = ""
spacy_rb_ner(pattern,text)

In [None]:
#Hard

#1. dataset extraction from huggingface
import kagglehub

path = kagglehub.dataset_download("remakia/drugs-dictionary")
print("Path to dataset files:", path)

#2. load the json dictionary
import json
def read_json(json_file):

  return 0

#3. convert the drug dict into patterns
def generate_patern(drug_dict):

  return 0

json_file = "drug.json"
drug_dict = read_json(json_file)
pattern =generate_patern(drug_dict)
text = "Perfusion d'une ampoule de prexidine de lithium et introduction d'un antihistaminique par Cétirizine 10 mg x 2 par jour, avec diminution puis disparition de l'oedème."
text = text.lower()
spacy_rb_ner(pattern,text)

## **Machinelearning-model**

1. Conditional Random Fields - (provide explanation)
2. SVM - (provide explanation)

### **Example-code**

Below is an example code of Spacy, a machine learning NER model train with the theory of conditional random fields.

In [None]:
import pandas as pd
import spacy
import requests

nlp = spacy.load("en_core_web_sm")
pd.set_option("display.max_rows", 200)

content ="Esi is a 27-years-old individual who came back home from school. she is meant to go back to school soon to see her friends. Have you heard from Kwame because the last time i spoke to him, he said he was going to the Ghana, Kigali"

doc = nlp(content)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

### **Exercise 1**

We will train our own Spacy model. 

In [None]:

def processed_data():
     return 0 

def normalization():
    return 0

def dataset_preprocessing(TRAIN_DATA,ner):

    for _, annotations in TRAIN_DATA:
        for ent in annotations.get('entities'):
                ner.add_label(ent[2])
    return ner


In [None]:
import random

from tqdm import tqdm

n_iter= 30 #number of times you want the model to train 
model= "name-of-blank-model"
nlp = spacy.load(model)


#create and set up a pipeline 
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)

#data preprocessing
TRAIN_DATA=processed_data()
dataset_preprocessing(TRAIN_DATA)

#training 
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):  # only train NER
    optimizer = nlp.begin_training()
    for itn in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in tqdm(TRAIN_DATA):
            nlp.update(
                [text],  
                [annotations],  
                drop=0.5,  
                sgd=optimizer,
                losses=losses)
        print(losses)


## **Conclusion**



## **References**


1. https://www.kaggle.com/code/remakia/introduction-to-ner-part-i-rule-based
2. https://spacy.io/
3. https://www.kaggle.com/code/remakia/introduction-to-ner-part-i-rule-based
4. https://www.nltk.org/


# **Facilitator(s) Details**

**Facilitator(s):**

*   Name: [FELIX TETTEH AKWERH,Adwoa Asantewaa Bremang]
*   Email: [felix.akwerh@knust.edu.gh,adwoabremang@gmail.com]
*   LinkedIn: 



# **Reviewer’s Name**

*   Name: [Reviewer’s Name]