# Named Entity Recognition (NER)
Named Entity Recognition (NER) is a natural language processing (NLP) task that involves identifying and categorizing named entities within a text. Named entities are specific objects or entities that have names, such as persons, organizations, locations, dates, and more. NER systems aim to automatically detect and classify these entities to extract meaningful information from unstructured text data.

NER involves several steps:

Tokenization: The text is split into individual words or tokens.
Part-of-Speech (POS) Tagging: Each token is assigned a part-of-speech tag (e.g., noun, verb, etc.).
Named Entity Classification: Based on contextual clues and patterns, tokens are classified into named entity categories such as person names, organization names, locations, dates, etc.
Output: The NER system generates output with identified named entities and their corresponding categories.

In [4]:
!pip install spacy



In [13]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 12.8/12.8 MB 1.0 MB/s eta 0:00:00
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [17]:
import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

# Input and output file paths
input_file_path = 'input_text.txt'
output_file_path = 'output_entities.txt'

# Read input text from file
try:
    with open(input_file_path, 'r') as file:
        input_text = file.read()
except FileNotFoundError:
    print(f"Error: Input file '{input_file_path}' not found.")
    exit()

# Process the text using spaCy
doc = nlp(input_text)

# Write NER output to file
with open(output_file_path, 'w') as output_file:
    for ent in doc.ents:
        output_file.write(f'{ent.text}: {ent.label_}\n')

print(f"Named entities extracted from '{input_file_path}' and saved to '{output_file_path}'.")

# Display NER output
for ent in doc.ents:
    print(f'{ent.text}: {ent.label_}')

Named entities extracted from 'input_text.txt' and saved to 'output_entities.txt'.
AI: ORG
recent years: DATE
AI: ORG
One: CARDINAL
AI: ORG
AI: ORG
AI: ORG
AI: ORG
AI: ORG
AI: ORG
AI: ORG


In [19]:
import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

# Input and output file paths
input_file_path = 'input_text1.txt'
output_file_path = 'output_entities1.txt'

# Read input text from file
try:
    with open(input_file_path, 'r') as file:
        input_text = file.read()
except FileNotFoundError:
    print(f"Error: Input file '{input_file_path}' not found.")
    exit()

# Process the text using spaCy
doc = nlp(input_text)

# Write NER output to file
with open(output_file_path, 'w') as output_file:
    for ent in doc.ents:
        output_file.write(f'{ent.text}: {ent.label_}\n')

print(f"Named entities extracted from '{input_file_path}' and saved to '{output_file_path}'.")

# Display NER output
for ent in doc.ents:
    print(f'{ent.text}: {ent.label_}')

Named entities extracted from 'input_text1.txt' and saved to 'output_entities1.txt'.
September 10th, 2023: DATE
Falcon 9: LAW
Cape Canaveral: GPE
Florida: GPE
Mars 2023: DATE
the Red Planet Rover to Mars: LOC
NASA: ORG
Martian: NORP
Elon Musk: PERSON
SpaceX: ORG
Hawthorne: GPE
California: GPE
Mars: LOC
Falcon 9: LAW
the International Space Station: ORG
