### Named Entity Recognitiyon (NER) (w/ Spacy)

Named Entity Recognition (NER) is the task of identifying and classifying named entities (such as persons, locations, organizations, dates, etc.) within text. It is a key component of natural language processing (NLP), as it helps machines understand the context and structure of the text by recognizing specific, meaningful entities.

- Common examples of Named Entity Recognition include:

  - Person: Identifying names of individuals (e.g., "Barack Obama").
  - Location: Identifying geographic locations (e.g., "Paris").
  - Organization: Identifying names of organizations (e.g., "Microsoft").
  - Date: Identifying specific dates (e.g., "January 1, 2025").
  - Miscellaneous: Identifying other entities like events or products.

In short, NER focuses on recognizing and extracting named entities from text, making it easier to perform further analysis, like categorization, information retrieval, or relationship extraction.

#### Named Entity Recognition Examples:

- Person: "Barack Obama"
- Location: "Paris"
- Organization: "Microsoft"

#### Named Entity Recognition Flowchart:

!["named-entity-classification"](../images/5/5-named-entity-classification.png)


---


In [10]:
import pandas as pd
import spacy

nlp = spacy.load("en_core_web_sm")

content = "John works at Microsoft and lives in New York. He visited the National History Museum."
doc = nlp(content)

entities = [(ent.text, ent.label_, ent.lemma_) for ent in doc.ents]
df = pd.DataFrame(entities, columns=["text", "type", "lemma"])
print(df)

                          text    type                        lemma
0                         John  PERSON                         John
1                    Microsoft     ORG                    Microsoft
2                     New York     GPE                     New York
3  the National History Museum     ORG  the National History Museum
