### Named Entity Recognition

- A Named entity is any that is a noun, i.e an instance of a person, place or a thing.

Named Entity Recognition (NER) in Natural Language Processing (NLP) is a technique used to classify named entities into predefined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, and more. 

NER can be seen as a multiclass classification problem where we try to identify the class of the entity and assign the highest probable class to which it seems to belong. These classes are predefined categories which represent class/group of instances in the real world.

Ex: 
- Places
    - Countries
    - States
- Institution
    - School
    - College
    - Offices
- Persons

- NER serves as a bridge between unstructured text and structured data, by being able to provide context for the text.
- NER facilitating tasks like data analysis, information retrieval, and knowledge graph construction.

In [1]:
input_txt = ""

with open('./input.txt','r') as f:
    input_txt = f.read()

In [2]:
len(input_txt.split(' '))

266

In [3]:
import spacy
from collections import defaultdict

nlp = spacy.load("en_core_web_sm")
doc = nlp(input_txt)
categories = defaultdict(list)

for ent in doc.ents:
    categories[ent.label_].append(ent.text)

with open('output.txt','w') as out:
    for k,v in categories.items():
        out.write(k+'\n')
        out.write('\n')
        for token in v:
            out.write(token + '\n')
        out.write('-------------------------------------\n')
    out.write('''
    
------------------------------------- 
Named Entities Description:

PERSON:      People, including fictional.
NORP:        Nationalities or religious or political groups.
FAC:         Buildings, airports, highways, bridges, etc.
ORG:         Companies, agencies, institutions, etc.
GPE:         Countries, cities, states (GeoPolitical Entities)
LOC:         Non-GPE locations, mountain ranges, bodies of water.
PRODUCT:     Objects, vehicles, foods, etc. (Not services.)
EVENT:       Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW:         Named documents made into laws.
LANGUAGE:    Any named language.
DATE:        Absolute or relative dates or periods.
TIME:        Times smaller than a day.
PERCENT:     Percentage, including ”%“.
MONEY:       Monetary values, including unit.
QUANTITY:    Measurements, as of weight or distance.
ORDINAL:     “first”, “second”, etc.
CARDINAL:    Numerals that do not fall under another type.
    ''')