## Custom Name Entity Recognition using Spacy
Identifying Named Entities stands out as a crucial task in Natural Language Processing (NLP), playing a pivotal role in processing data. The goal is to pinpoint and categorize significant information, such as entities, within textual data. These entities encompass words or word sequences, typically proper nouns, consistently representing a specific entity. As an example, a system for entity detection might identify the term "NewsCatcher" in text and label it as an "Organization."

At its core, all entity recognition systems have two steps:

- Detecting the entities in text 
- Categorizing the entities into named classes

1. In the first step, NER finds where in the text an entity starts and ends using a method called inside-outside-beginning chunking.

2. The second step involves putting entities into categories. These categories can change based on what you're looking for, but common ones include people, organizations, locations, time, measurements, and patterns like emails or phone numbers.

While there are some rule-based approaches, most modern systems use machine learning or deep learning. Since text can be tricky with its ambiguity, like the word 'Sydney' being both a place and a person's name, these systems help make sense of it.

### Applications of Name Entity Recognition
NER is like a super-smart assistant for dealing with lots of text. It's handy whenever you need the computer to quickly figure out what a bunch of text is all about. A good NER helps the computer get the gist of the subject or main idea in the text and sort documents based on how relevant they are. It's like having a fast and efficient organizer for a mountain of information!

List of applications are:
- Information Extraction And Summarization
- Optimizing Search Engines
- Machine Translation
- Content Classification
- Customer Support

### Practical Implementation

**NER in Spacy**
Think of spaCy as the quick and efficient superhero of Python for dealing with language stuff. It's really fast and comes with handy tools for understanding text. In the latest version, spaCy v3.0, it got even better with the latest and coolest tech. When you use spaCy, it automatically brings in tools like figuring out parts of speech, understanding the structure of sentences, and spotting important named entities. It's like a one-stop-shop for making sense of words!

In [1]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m?[0m eta [36m0:00:00[0mB/s[0m eta [36m0:00:01[0m[36m0:00:02[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


#### 1. Imports

In [19]:
import spacy
import json

nlp = spacy.load("en_core_web_lg")

print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


In [14]:
text = "Samantha loves her cozy home in Green Valley. Every morning, she sips hot cocoa on her favorite blue couch and chats with her best friend, Benny the Bunny. Outside, the sun shines on tall trees, and birds sing sweet melodies. In the evening, Samantha and Benny enjoy tasty carrot snacks together. Life in Green Valley is simple and joyful, filled with warmth and friendship."
doc = nlp(text)
print(doc)
print(type(doc))

Samantha loves her cozy home in Green Valley. Every morning, she sips hot cocoa on her favorite blue couch and chats with her best friend, Benny the Bunny. Outside, the sun shines on tall trees, and birds sing sweet melodies. In the evening, Samantha and Benny enjoy tasty carrot snacks together. Life in Green Valley is simple and joyful, filled with warmth and friendship.
<class 'spacy.tokens.doc.Doc'>


In [15]:
# entities
print(doc.ents)

(Samantha, Green Valley, Every morning, Benny the Bunny, Samantha, Benny, Green Valley)


In [16]:
print(type(doc.ents))

<class 'tuple'>


In [17]:
print(doc.ents[0], type(doc.ents[0]), sep="\n")

Samantha
<class 'spacy.tokens.span.Span'>


In [18]:
from spacy import displacy
displacy.render(doc, style="ent", jupyter=True)

#### 2. Data Loading and Processing
The data used in this experiment is directly came from [here](https://www.kaggle.com/datasets/finalepoch/medical-ner)

In [20]:
with open('./Corona2.json', 'r') as f:
    data = json.load(f)

In [21]:
data['examples'][0]

{'id': '18c2f619-f102-452f-ab81-d26f7e283ffe',
 'content': "While bismuth compounds (Pepto-Bismol) decreased the number of bowel movements in those with travelers' diarrhea, they do not decrease the length of illness.[91] Anti-motility agents like loperamide are also effective at reducing the number of stools but not the duration of disease.[8] These agents should be used only if bloody diarrhea is not present.[92]\n\nDiosmectite, a natural aluminomagnesium silicate clay, is effective in alleviating symptoms of acute diarrhea in children,[93] and also has some effects in chronic functional diarrhea, radiation-induced diarrhea, and chemotherapy-induced diarrhea.[45] Another absorbent agent used for the treatment of mild diarrhea is kaopectate.\n\nRacecadotril an antisecretory medication may be used to treat diarrhea in children and adults.[86] It has better tolerability than loperamide, as it causes less constipation and flatulence.[94]",
 'metadata': {},
 'annotations': [{'id': '0825a1

In [22]:
data['examples'][0].keys()

dict_keys(['id', 'content', 'metadata', 'annotations', 'classifications'])

In [23]:
data['examples'][0]['content']

"While bismuth compounds (Pepto-Bismol) decreased the number of bowel movements in those with travelers' diarrhea, they do not decrease the length of illness.[91] Anti-motility agents like loperamide are also effective at reducing the number of stools but not the duration of disease.[8] These agents should be used only if bloody diarrhea is not present.[92]\n\nDiosmectite, a natural aluminomagnesium silicate clay, is effective in alleviating symptoms of acute diarrhea in children,[93] and also has some effects in chronic functional diarrhea, radiation-induced diarrhea, and chemotherapy-induced diarrhea.[45] Another absorbent agent used for the treatment of mild diarrhea is kaopectate.\n\nRacecadotril an antisecretory medication may be used to treat diarrhea in children and adults.[86] It has better tolerability than loperamide, as it causes less constipation and flatulence.[94]"

In [24]:
data['examples'][0]['annotations'][0]

{'id': '0825a1bf-6a6e-4fa2-be77-8d104701eaed',
 'tag_id': 'c06bd022-6ded-44a5-8d90-f17685bb85a1',
 'end': 371,
 'start': 360,
 'example_id': '18c2f619-f102-452f-ab81-d26f7e283ffe',
 'tag_name': 'Medicine',
 'value': 'Diosmectite',
 'correct': None,
 'human_annotations': [{'timestamp': '2020-03-21T00:24:32.098000Z',
   'annotator_id': 1,
   'tagged_token_id': '0825a1bf-6a6e-4fa2-be77-8d104701eaed',
   'name': 'Ashpat123',
   'reason': 'exploration'}],
 'model_annotations': []}