#Introduction to NER and its importance in NLP

---


###Overview of popular NER algorithms and approaches

---


###Hands-on exercise: Implementing a basic NER system using libraries like spaCy

https://spacy.io/usage/processing-pipelines

---
https://www.youtube.com/watch?v=1ePkOSGoIFI



SpaCy recognizes the following built-in entity types:

---


PERSON - People, including fictional.

NORP - Nationalities or religious or political groups.

FAC - Buildings, airports, highways, bridges, etc.

ORG - Companies, agencies, institutions, etc.

GPE - Countries, cities, states.

LOC - Non-GPE locations, mountain ranges, bodies of water.

PRODUCT - Objects, vehicles, foods, etc. (Not services.)

EVENT - Named hurricanes, battles, wars, sports events, etc.

WORK_OF_ART - Titles of books, songs, etc. (Sentence: "I recently read 'To Kill a Mockingbird' by Harper Lee.")

LAW - Named documents made into laws.

LANGUAGE - Any named language.

DATE - Absolute or relative dates or periods.

TIME - Times smaller than a day.
.
PERCENT - Percentage, including "%".

MONEY - Monetary values, including unit.

QUANTITY - Measurements, as of weight or distance.

ORDINAL - "first", "second", etc.

CARDINAL - Numerals that do not fall under another type. (Sentence: "I bought five books from the store yesterday.")

#Named entities and parts of speech are both linguistic concepts, but they serve different purposes and capture different aspects of language.

Named Entities: Named entities are specific words or phrases in a text that refer to specific entities such as people, organizations, locations, dates, quantities, and more. They represent named or specific information within a text. Named entities are often used for information extraction and understanding the context of a text. Examples of named entity types include PERSON, ORGANIZATION, LOCATION, DATE, CARDINAL, WORK_OF_ART, etc.
Example: In the sentence "I visited Paris last summer," the named entity is "Paris," which represents a specific location.

Parts of Speech: Parts of speech refer to the grammatical categories or roles that words play in a sentence. They classify words based on their syntactic function and relationship with other words in a sentence. Parts of speech help determine the structure and meaning of a sentence. Common parts of speech include nouns, verbs, adjectives, adverbs, pronouns, prepositions, conjunctions, etc.
Example: In the sentence "The cat chased the mouse," the parts of speech are "The" (article), "cat" (noun), "chased" (verb), "the" (article), and "mouse" (noun).

In summary, named entities focus on identifying specific entities within a text, while parts of speech focus on categorizing words based on their grammatical functions within a sentence.

# NER System using spaCy

## Exercise 1: Load and Process Text


In [None]:
import spacy

# Load the spaCy English model
nlp = spacy.load('en_core_web_sm')

In [None]:
# Define the input text
text = "Apple Inc. is looking to buy a startup in the United States for $1 billion."

# Process the text using spaCy
doc = nlp(text)

In [None]:
# Print the entities found in the text
for ent in doc.ents:
    print(ent.text, ent.label_)

Apple Inc. ORG
the United States GPE
$1 billion MONEY


#Important of NER in NLP
Named Entity Recognition (NER) is a crucial component of Natural Language Processing (NLP) that plays a significant role in various applications. Here are a few points highlighting the importance of NER in NLP:

---



Information Extraction: NER helps extract and identify named entities such as names of people, organizations, locations, dates, and other significant information from unstructured text. This extracted information can be used for various purposes like populating knowledge bases, summarization, and data analysis.

---


Entity Disambiguation: NER assists in disambiguating named entities by distinguishing between entities with the same name but different meanings. For example, "Apple" can refer to the company or the fruit, and NER helps determine the correct interpretation based on the context.

---


Text Understanding and Classification: NER aids in understanding the semantics of text by identifying and categorizing entities. This information is valuable for tasks such as sentiment analysis, document classification, and topic modeling, enabling more accurate analysis and interpretation of text data.

---


Question Answering and Chatbots: NER is crucial in question-answering systems and chatbots that need to identify specific entities mentioned in user queries. It helps in extracting relevant information and providing precise and context-aware responses to user queries.

---


Information Retrieval: NER can enhance information retrieval systems by identifying named entities in documents. It enables better indexing and searching capabilities, allowing users to find relevant documents or information more efficiently.

---


Machine Translation and Language Generation: NER can improve machine translation systems by accurately translating named entities. It helps in preserving the meaning and context of named entities during the translation process. Similarly, in text generation tasks, NER can be used to generate coherent and contextually relevant text by incorporating identified named entities.

---


Social Media Analysis: NER is essential for analyzing social media data, where named entities such as user mentions, hashtags, and location tags play a significant role. It enables tracking trends, sentiment analysis, user profiling, and other tasks related to social media analytics.

---


Information Security and Fraud Detection: NER can assist in identifying sensitive information like personal names, addresses, or financial details in texts. It helps in information security by automatically redacting or anonymizing such data. NER can also be employed in fraud detection systems to identify suspicious patterns or entities associated with fraudulent activities.

---


Overall, NER is a vital component of NLP, enabling better understanding, extraction, and utilization of named entities in various applications, leading to more accurate and valuable insights from textual data.

#Exercise 2:
Given a list of sentences, write a Python function that uses spaCy's NER model to count the occurrences of each named entity label (e.g., ORG, PERSON, LOC) across all the sentences.

---

The function count_named_entities takes a list of sentences as input, processes each sentence using spaCy's NER model, and counts the occurrences of each named entity label. It returns a Counter object with the named entity labels as keys and their respective counts as values.

In [None]:
import spacy
from collections import Counter

def count_named_entities(sentences):
    nlp = spacy.load('en_core_web_sm')
    entity_counts = Counter()

    for sentence in sentences:
        doc = nlp(sentence)
        entities = [ent.label_ for ent in doc.ents]
        entity_counts.update(entities)

    return entity_counts

# Test the function
sentences = ["Microsoft is based in Redmond, Washington.",
             "Google's headquarters are in Mountain View, California.",
             "Apple Inc. was founded by Steve Jobs in Cupertino, California."]
entity_counts = count_named_entities(sentences)
print(entity_counts)

Counter({'GPE': 6, 'ORG': 3, 'PERSON': 1})


#Exercise 3:
Write a Python function that takes a text document as input and identifies the most frequent named entity in the document.

---


The function find_most_frequent_named_entity uses spaCy to process the input document and identify the most frequent named entity within it. It extracts the named entities from the document, counts their occurrences using a Counter object, and returns the most common entity. If there are multiple entities with the same frequency, the function returns the first one encountered.

In [None]:
import spacy
from collections import Counter

def find_most_frequent_named_entity(document):
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(document)
    entities = [ent.text for ent in doc.ents]
    entity_counts = Counter(entities)
    most_common_entity = entity_counts.most_common(1)
    return most_common_entity[0][0] if most_common_entity else None

# Test the function
document = "Microsoft Corporation is an Washington American multinational technology company that produces software and other products. " \
           "Its headquarters is located in Redmond, Washington."
most_frequent_entity = find_most_frequent_named_entity(document)
print("Most frequent named entity:", most_frequent_entity)

Most frequent named entity: Washington


#Exercise 4: Train a custom NER model
###(refer video given above)
