<a href="https://colab.research.google.com/github/siddhesh1503/NLP/blob/main/NLP_EXP_9.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

NER using the NLTK(English Language)

In [15]:
!pip install -q nltk

import nltk
import os
import shutil

punkt_tab_dir = '/root/nltk_data/tokenizers/punkt_tab'
if os.path.exists(punkt_tab_dir):
    shutil.rmtree(punkt_tab_dir)
    print("Removed invalid 'punkt_tab' directory.")

# Download required NLTK datasets
required_nltk_datasets = [
    'punkt',
    'averaged_perceptron_tagger',
    'maxent_ne_chunker',
    'words'
]

for dataset in required_nltk_datasets:
    nltk.download(dataset)

from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.tree import Tree

def extract_named_entities(tree):
    entities = []
    for chunk in tree:
        if isinstance(chunk, Tree):
            entity_name = " ".join([token for token, pos in chunk.leaves()])
            entity_type = chunk.label()
            entities.append((entity_name, entity_type))
    return entities

# Main code
if __name__ == "__main__":
    text_en = input("Enter English text: ")

    # Tokenize and tag
    tokens = word_tokenize(text_en)
    pos_tags = pos_tag(tokens)

    # Named Entity Recognition
    ner_tree = ne_chunk(pos_tags)
    entities_nltk = extract_named_entities(ner_tree)

    # Display results
    print("\n🔹 NLTK NER Result:")
    if entities_nltk:
        for entity_name, entity_type in entities_nltk:
            print(f"Entity: {entity_name}, Type: {entity_type}")
    else:
        print("No entities found.")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!


Enter English text: Barack Obama was the 44th President of the United States.

🔹 NLTK NER Result:
Entity: Barack, Type: PERSON
Entity: Obama, Type: PERSON
Entity: United States, Type: GPE


NER using the Spacy(English Language)

In [8]:
# Install spaCy and download the model
!pip install -q spacy

import spacy

# Download the small English model if not already present
!python -m spacy download en_core_web_sm

# Load the model
nlp = spacy.load("en_core_web_sm")

# Main function
def perform_ner(text):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

if __name__ == "__main__":
    text_en = input("Enter English text: ")
    entities = perform_ner(text_en)

    print("\n🔹 spaCy NER Result:")
    if entities:
        for entity_text, entity_label in entities:
            print(f"Entity: {entity_text}, Type: {entity_label}")
    else:
        print("No entities found.")


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m94.6 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
Enter English text: Barack Obama was the 44th President of the United States.

🔹 spaCy NER Result:
Entity: Barack Obama, Type: PERSON
Entity: 44th, Type: ORDINAL
Entity: the United States, Type: GPE


NER using the Stanza(English Language)

In [12]:
# Install stanza
!pip install -q stanza

import stanza

# Download the English model if not already present
stanza.download('en')

# Initialize the pipeline with NER processor
nlp = stanza.Pipeline(lang='en', processors='tokenize,pos,ner')

def perform_ner(text):
    doc = nlp(text)
    entities = []
    for sentence in doc.sentences:
        for ent in sentence.ents:
            entities.append((ent.text, ent.type))
    return entities

if __name__ == "__main__":
    text_en = input("Enter English text: ")
    entities = perform_ner(text_en)

    print("\n🔹 Stanza NER Result:")
    if entities:
        for entity_text, entity_label in entities:
            print(f"Entity: {entity_text}, Type: {entity_label}")
    else:
        print("No entities found.")


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Downloading default packages for language: en (English) ...
INFO:stanza:File exists: /root/stanza_resources/en/default.zip
INFO:stanza:Finished downloading models and saved to /root/stanza_resources
INFO:stanza:Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Loading these models for language: en (English):
| Processor | Package                   |
-----------------------------------------
| tokenize  | combined                  |
| mwt       | combined                  |
| pos       | combined_charlm           |
| ner       | ontonotes-ww-multi_charlm |

INFO:stanza:Using device: cpu
INFO:stanza:Loading: tokenize
INFO:stanza:Loading: mwt
INFO:stanza:Loading: pos
INFO:stanza:Loading: ner
INFO:stanza:Done loading processors!


Enter English text: Barack Obama was the 44th President of the United States.

🔹 Stanza NER Result:
Entity: Barack Obama, Type: PERSON
Entity: 44th, Type: ORDINAL
Entity: the United States, Type: GPE


NER using the Stanza(Regional Language)

In [13]:
# Install stanza
!pip install -q stanza

import stanza

# Download models for Hindi and Marathi
stanza.download('hi')
stanza.download('mr')

# Initialize pipelines for Hindi and Marathi with NER
nlp_hi = stanza.Pipeline(lang='hi', processors='tokenize,pos,ner')
nlp_mr = stanza.Pipeline(lang='mr', processors='tokenize,pos,ner')

def perform_ner(doc, lang_name):
    entities = []
    for sentence in doc.sentences:
        for ent in sentence.ents:
            entities.append((ent.text, ent.type))
    return entities

if __name__ == "__main__":
    print("Choose language:")
    print("1. Hindi")
    print("2. Marathi")
    choice = input("Enter 1 or 2: ")

    if choice == '1':
        text = input("Enter text in Hindi: ")
        doc = nlp_hi(text)
        entities = perform_ner(doc, "Hindi")
        lang_name = "Hindi"
    elif choice == '2':
        text = input("Enter text in Marathi: ")
        doc = nlp_mr(text)
        entities = perform_ner(doc, "Marathi")
        lang_name = "Marathi"
    else:
        print("Invalid choice!")
        exit()

    print(f"\n🔹 Stanza NER Result ({lang_name}):")
    if entities:
        for entity_text, entity_label in entities:
            print(f"Entity: {entity_text}, Type: {entity_label}")
    else:
        print("No entities found.")


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Downloading default packages for language: hi (Hindi) ...


Downloading https://huggingface.co/stanfordnlp/stanza-hi/resolve/v1.10.0/models/default.zip:   0%|          | …

INFO:stanza:Downloaded file to /root/stanza_resources/hi/default.zip
INFO:stanza:Finished downloading models and saved to /root/stanza_resources


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Downloading default packages for language: mr (Marathi) ...


Downloading https://huggingface.co/stanfordnlp/stanza-mr/resolve/v1.10.0/models/default.zip:   0%|          | …

INFO:stanza:Downloaded file to /root/stanza_resources/mr/default.zip
INFO:stanza:Finished downloading models and saved to /root/stanza_resources
INFO:stanza:Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Loading these models for language: hi (Hindi):
| Processor | Package      |
----------------------------
| tokenize  | hdtb         |
| pos       | hdtb_charlm  |
| ner       | ilner_charlm |

INFO:stanza:Using device: cpu
INFO:stanza:Loading: tokenize
INFO:stanza:Loading: pos
INFO:stanza:Loading: ner
INFO:stanza:Done loading processors!
INFO:stanza:Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Loading these models for language: mr (Marathi):
| Processor | Package     |
---------------------------
| tokenize  | ufal        |
| mwt       | ufal        |
| pos       | ufal_charlm |
| ner       | l3cube      |

INFO:stanza:Using device: cpu
INFO:stanza:Loading: tokenize
INFO:stanza:Loading: mwt
INFO:stanza:Loading: pos
INFO:stanza:Loading: ner
INFO:stanza:Done loading processors!


Choose language:
1. Hindi
2. Marathi
Enter 1 or 2: 1
Enter text in Hindi: नरेंद्र मोदी भारत के प्रधानमंत्री हैं।

🔹 Stanza NER Result (Hindi):
Entity: नरेंद्र मोदी, Type: NEP
Entity: भारत, Type: NEL
