<a href="https://colab.research.google.com/github/guilhermelaviola/NaturalLanguageProcessing/blob/main/Class08.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Named Entity Recognition**
Named Entity Recognition (NER) is a fundamental task in Natural Language Processing that aims to identify and classify entities such as people, places, dates, and organizations in unstructured text, transforming it into structured data for computational analysis. A key component of this process is manual entity annotation, which, despite being time-consuming, is essential for training accurate machine learning models. Tools such as NLTK and spaCy support NER by offering efficient text processing and pre-trained models for multiple languages. NER has wide-ranging applications, including question answering, information extraction, machine translation, and intelligent assistants, where it enhances accuracy, preserves meaning, and enables systems to understand and act on user input. Overall, NER plays a crucial role in advancing NLP by enabling deeper understanding and effective use of textual data.

In [None]:
! pip3 install wikipedia



In [None]:
# Importing all the necessary libraries and resources:
import spacy
import re
import nltk
import wikipedia
import os

## **Example: Named Entity Recognition in Texts**
Named Entity Recognition (NER) is an essential task in the field of Natural Language Processing (NLP) that consists of identifying and classifying significant semantic elements in texts, such as names of people, organizations, places, and dates. The importance of NER lies in its ability to assign meaning and structure to raw textual data, facilitating the understanding and automatic analysis of such data.

In [None]:
# Loading the English language template:
nlp = spacy.load('en_core_web_sm')

# Example text for entity recognition:
text = 'Barack Obama was born in Honolulu, Hawaii, on August 4, 1961.'

# Text processing:
doc = nlp(text)

# Displaying the recognized entities:
for entity in doc.ents:
  print(entity.text, entity.label_)

Barack Obama PERSON
Honolulu GPE
Hawaii GPE
August 4, 1961 DATE


## **Example: Named Entity Annotation**
Named entity annotation is a critical process in Natural Language Processing (NLP), involving the identification and classification of text segments as meaningful entities, such as names of people, places, organizations, and others. This step is fundamental for training machine learning models capable of automatically processing and interpreting large volumes of text.

In [None]:
text = 'Cristiano Ronaldo played for Real Madrid.'
rules = {'player': ['Cristiano Ronaldo'], 'club': ['Real Madrid']}

def annotate_text(text, rules):
  for entity, terms in rules.items():
    for term in terms:
      text = text.replace(term, f'<{entity}>{term}</{entity}>')
      return text

annotated_text = annotate_text(text, rules)
print(annotated_text)

<player>Cristiano Ronaldo</player> played for Real Madrid.


## **Example: NLTK and SpaCy Libraries for REN**
In the context of Named Entity Recognition (REN) in Natural Language Processing (NLP), the NLTK (Natural Language Toolkit) and spaCy libraries are essential tools. Both offer pre-trained models that facilitate the identification and classification of entities in texts. However, it is crucial to understand that these generic models may not cover all the specificities of different domains, which sometimes requires manual annotation and training of specific models.

In [None]:
nltk.download('punkt_tab') # Added to download the missing resource
nltk.download('averaged_perceptron_tagger_eng') # Changed to download the specific English tagger resource
nltk.download('maxent_ne_chunker_tab') # This line is added to download the missing data

text = 'Henrikh Mkhitaryan was born in Yerevan.'
tokens = nltk.word_tokenize(text)
tags = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tags)

print(entities)

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!


(S
  (PERSON Henrikh/NNP)
  (PERSON Mkhitaryan/NNP)
  was/VBD
  born/VBN
  in/IN
  (GPE Yerevan/NNP)
  ./.)
