NER identifies words or phrases in text that represent entities such as people, places, organizations, dates, monetary values, etc.

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")
# loads  a spacy model for processing the english text

The model includes word vectors and linguistic features tailored for English, allowing it to perform tasks like part-of-speech tagging, dependency parsing, and named entity recognition.

In [6]:
text = "I was sick for few days in Mumbai. yesterday I ate  an orange, a tomato and a cucumber which I bought for 100 rupees."

In [7]:
doc = nlp(text)   #processing the test

The result, doc, is a spaCy document object, containing tokens and annotations such as entities, parts of speech, and dependencies.

In [8]:
for ent in doc.ents:
    print(ent.text, ent.label_)

few days DATE
Mumbai GPE
yesterday DATE
100 CARDINAL


For each ent (entity) in doc.ents, ent.text gives the entity text, and ent.label_ provides the entity type label (e.g., PERSON, ORG for organization, GPE for geographical locations, MONEY for monetary values, etc.).


- Mumbai - GPE (Geopolitical Entity):

- The model identifies "Mumbai" as a GPE, which represents a geopolitical entity such as a city, state, or country. Here, it's correctly recognized as a city in India.

- Yesterday - DATE:

- The word "Yesterday" is classified as DATE, referring to a point in time (a past date). SpaCy recognizes common expressions of time, like "yesterday," "tomorrow," or specific dates like "March 1st."

- 100 rupees - MONEY:

- The phrase "100 rupees" is identified as MONEY, a monetary amount. SpaCy recognizes this as a specific quantity of money, and "rupees" signifies the currency (INR or Indian Rupee).

- Named Entity Recognition (NER) with spaCy is commonly used for extracting structured information from unstructured text, making it a powerful tool in various NLP applications.
- NER identifies specific types of information within a text, such as names, locations, dates, monetary amounts, and more.

- SpaCy is highly optimized for speed and efficiency, ideal for real-time applications and handling large volumes of text.

- eg :
information extraction,
text classification, improving search relevance, data structuring for ML etc.


- Other libraries to use instead of spacy ⁉
- NLTK
- Textblob
- Transformers like huggind face
- Flair
- gensim