### 1. Load the spaCy Model

We start by importing the `spacy` library and loading the `en_core_web_sm` model.

In [2]:
import spacy
nlp = spacy.load('en_core_web_sm')

### 2. Check Pipeline Components
Let's examine the components of the spaCy pipeline used by the loaded model.

In [3]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

It corresponds to the sequence of components in the spaCy processing pipeline for the en_core_web_sm model. Each component plays a specific role in analyzing and annotating the input text. Here's an explanation of each component:

1. tok2vec: This component is responsible for converting words into vectors (dense numerical representations) based on the surrounding context. These vectors are used to capture semantic meaning and contextual information of words.
2. tagger: The tagger assigns part-of-speech (POS) tags to each word in the text, indicating the grammatical category of the word (e.g., noun, verb, adjective).
3. parser: The parser analyzes the syntactic structure of sentences, parsing them into a dependency tree that shows how words are related to each other within the sentence.
4. attribute_ruler: This component applies rules to set custom attributes on tokens, enriching the token annotations with additional information derived from context or linguistic patterns.
5. lemmatizer: The lemmatizer reduces words to their base or dictionary form (lemma). For example, it transforms "running" to "run", "better" to "good", etc., which helps in standardizing tokens.
6. ner: Finally, the named entity recognizer (NER) identifies and classifies named entities (e.g., persons, organizations, locations) in the text. Each entity is labeled with a specific category (e.g., PERSON, ORG, GPE) based on the context and surrounding words.

### 3. Process Text and Extract Named Entities
We'll process various text documents using the spaCy model to extract named entities and visualize them.

In [4]:
# Process a document about Apple's headquarters
doc = nlp('Apple is headquartered in Cupertino, California.')

# Extract named entities and print details
for ent in doc.ents:
    print(ent.text, '|', ent.label_, '|', spacy.explain(ent.label_))

# Visualize named entities in the document using displacy
from spacy import displacy
displacy.render(doc, style='ent')

Apple | ORG | Companies, agencies, institutions, etc.
Cupertino | GPE | Countries, cities, states
California | GPE | Countries, cities, states


In [5]:
# Process a document about Microsoft's founders
doc = nlp('Microsoft was founded by Bill Gates and Paul Allen.')

# Extract named entities and print details
for ent in doc.ents:
    print(ent.text, '|', ent.label_, '|', spacy.explain(ent.label_))

# Visualize named entities in the document using displacy
displacy.render(doc, style='ent')

Microsoft | ORG | Companies, agencies, institutions, etc.
Bill Gates | PERSON | People, including fictional
Paul Allen | PERSON | People, including fictional


In [6]:
# Process a document mentioning the Eiffel Tower's location
doc = nlp('The Eiffel Tower is located in Paris, France.')

# Extract named entities and print details
for ent in doc.ents:
    print(ent.text, '|', ent.label_, '|', spacy.explain(ent.label_))

# Visualize named entities in the document using displacy
displacy.render(doc, style='ent')

The Eiffel Tower | FAC | Buildings, airports, highways, bridges, etc.
Paris | GPE | Countries, cities, states
France | GPE | Countries, cities, states
