# Named Entity Recognition (NER)

Named Entity Recognition (NER) is a fundamental Natural Language Processing (NLP) task that involves identifying and classifying named entities (e.g., names of people, organizations, locations, dates, and more) in text data. NER plays a crucial role in various NLP applications and has significant real-world applications.


the main **difference between NER and "Ctrl+F**" is that NER is an automated, machine-driven technique for entity recognition and labeling within text, while "Ctrl+F" is a manual search function that relies on user input to find specific text or characters within a document. NER is more advanced and context-aware, whereas "Ctrl+F" is a simple text search tool for user-initiated queries.

NER is a subtask of information extraction that focuses on identifying and categorizing entities within unstructured text. The primary goal is to locate and classify words or phrases that represent specific types of entities. Common entity types include:

**Person**: Names of individuals (e.g., "John Smith").

**Organization**: Names of companies, institutions, or groups (e.g., "Google Inc.").

**Location**: Names of places (e.g., "New York City").

**Date**: Temporal expressions, including dates (e.g., "January 1, 2022").

**Time**: Expressions representing time (e.g., "3:00 PM").

**Money**: Monetary values (e.g., "$100 million").

**Percentage**: Percentage values (e.g., "20%").

**Miscellaneous**: Custom entity types or other categories.


Deep learning methods, particularly neural networks, have shown exceptional performance in NER tasks. Here's an overview of how deep learning is applied to NER:

Bidirectional LSTM (BiLSTM): BiLSTM networks process the input sequence in both forward and backward directions, capturing contextual information effectively. This architecture is often used for sequence labeling tasks like NER.

Transformer-Based Models: Models like BERT, GPT, and their variants have achieved state-of-the-art results in NER tasks. They utilize attention mechanisms to capture long-range dependencies in text.

Conditional Random Fields (CRF): CRF is often used in conjunction with neural networks to model the dependencies between neighboring entity labels. It helps ensure that the predicted labels form coherent named entities.

Applications of NER has wide-ranging applications across industries, including:

- Information Retrieval: NER helps search engines identify and highlight relevant named entities in search results.

- Question Answering: In QA systems, NER identifies entities in both questions and documents, improving the accuracy of answers.

- Document Summarization: NER is used to extract key entities from documents, aiding in document summarization.

- Social Media Monitoring: NER helps track mentions of brands, products, and individuals on social media platforms.

- Named Entity Linking: Linking recognized entities to knowledge bases like Wikipedia enriches the understanding of text.

- Financial Analysis: In finance, NER can extract company names, ticker symbols, and financial figures from news articles to inform investment decisions.

- Medical Records Analysis: NER is used to identify medical entities (e.g., diseases, medications) in electronic health records for healthcare analytics.

- Geospatial Analysis: In geospatial applications, NER extracts location names and coordinates from text for mapping and geolocation purposes.

- Legal Document Analysis: Legal professionals use NER to extract and categorize names, dates, and legal terms from legal documents.

- Content Tagging: Content creators use NER to automatically tag articles, blog posts, or products with relevant keywords.

NER is a versatile and essential component of NLP, enabling machines to understand and extract valuable information from text, making it applicable in a wide range of real-world scenarios.


Creating a Named Entity Recognition (NER) tool involves using NER models like spaCy or Hugging Face Transformers to identify and label named entities in text. Below is a Python code example using the spaCy library to perform NER, in this code:

- We load the pre-trained English NER model provided by spaCy.

Load the **pre-trained NER model**

nlp = spacy.load("**en_core_web_sm**")

- The extract_named_entities function processes the input text and extracts named entities, including their text, start and end positions, and labels.

- We provide an example text and call the extract_named_entities function to extract and print the named entities and their labels.

You can **replace the example text with your own text to extract named entities** from any given input.

In [4]:
!pip install spacy
!python -m spacy download en_core_web_sm


2024-01-10 17:53:28.175707: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-10 17:53:28.175776: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-10 17:53:28.177778: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-10 17:53:28.186559: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Collecting en-core-web-sm==3.6.0
  Downloading https:

In [5]:
import spacy

# Load the pre-trained NER model
nlp = spacy.load("en_core_web_sm")

def extract_named_entities(text):
    # Process the text with the NER model
    doc = nlp(text)

    # Extract named entities and their labels
    named_entities = []
    for ent in doc.ents:
        named_entities.append({
            "text": ent.text,
            "start": ent.start_char,
            "end": ent.end_char,
            "label": ent.label_
        })

    return named_entities

# Example text
text = "Apple Inc. was founded by Steve Jobs and Steve Wozniak in Cupertino, California."

# Extract named entities from the example text
named_entities = extract_named_entities(text)

# Print the extracted named entities
for entity in named_entities:
    print(f"Text: {entity['text']}, Label: {entity['label']}, Start: {entity['start']}, End: {entity['end']}")


Text: Apple Inc., Label: ORG, Start: 0, End: 10
Text: Steve Jobs, Label: PERSON, Start: 26, End: 36
Text: Steve Wozniak, Label: PERSON, Start: 41, End: 54
Text: Cupertino, Label: GPE, Start: 58, End: 67
Text: California, Label: GPE, Start: 69, End: 79
