# 🧠 spaCy POS Tagging - Beginner to Advanced Guide

spaCy provides a robust and efficient way to perform Part-of-Speech tagging, a key NLP task that identifies the grammatical role of each word in a sentence (e.g., noun, verb, adjective).

---

## 📦 Step 1: Install and Import spaCy

```python
!pip install spacy
import spacy

# Load English core model
nlp = spacy.load("en_core_web_sm")


# 🧠 Step 2: Understanding POS Tags with Examples

| POS Tag | Description         | Example Word |
| ------- | ------------------- | ------------ |
| NOUN    | Noun                | dog, car     |
| VERB    | Verb                | run, is      |
| ADJ     | Adjective           | beautiful    |
| ADV     | Adverb              | quickly      |
| PRON    | Pronoun             | she, it      |
| DET     | Determiner          | the, a       |
| ADP     | Adposition (prep.)  | in, on       |
| AUX     | Auxiliary verb      | is, was      |
| CCONJ   | Coordinating conj.  | and, but     |
| SCONJ   | Subordinating conj. | although     |
| PART    | Particle            | not, to      |
| INTJ    | Interjection        | wow, ouch    |
| NUM     | Number              | one, 50      |
| PUNCT   | Punctuation         | ., ?         |
| PROPN   | Proper Noun         | India, John  |
| SYM     | Symbol              | \$, %, =     |
| X       | Other/Unknown       | ugh, hmm     |


# 📝 Step 3: Apply POS Tagging
### 📚 POS vs TAG
- <code>.pos_</code>: Coarse-grained POS tag (universal POS)
- <code>.tag_</code>: Fine-grained POS tag (detailed, language-specific)
- <code>spacy.explain(tag)</code>: Gives description of tag

In [44]:
# import spacy

# # Load English core model
# nlp = spacy.load("en_core_web_sm")

text = "Apple is looking at buying a startup in the UK for $1 billion."
doc = nlp(text)

# Display token and POS tags
for token in doc:
    print(f"{token.text:<12} | {token.pos_:<10} | {token.tag_:<6} | {spacy.explain(token.tag_)}")


Apple        | PROPN      | NNP    | noun, proper singular
is           | AUX        | VBZ    | verb, 3rd person singular present
looking      | VERB       | VBG    | verb, gerund or present participle
at           | ADP        | IN     | conjunction, subordinating or preposition
buying       | VERB       | VBG    | verb, gerund or present participle
a            | DET        | DT     | determiner
startup      | NOUN       | NN     | noun, singular or mass
in           | ADP        | IN     | conjunction, subordinating or preposition
the          | DET        | DT     | determiner
UK           | PROPN      | NNP    | noun, proper singular
for          | ADP        | IN     | conjunction, subordinating or preposition
$            | SYM        | $      | symbol, currency
1            | NUM        | CD     | cardinal number
billion      | NUM        | CD     | cardinal number
.            | PUNCT      | .      | punctuation mark, sentence closer


# Step 4: Visualize POS Tags with displaCy

In [15]:
from spacy import displacy

displacy.render(doc, style="dep", jupyter=True, options={"compact": True, "color": "blue"})


# 🔍 Step 5: Filter by Specific POS

In [19]:
# Get only nouns and verbs
for token in doc:
    if token.pos_ in ("NOUN", "VERB"):
        print(token.text, "→", token.pos_)


looking → VERB
buying → VERB
startup → NOUN


# 📌 Step 6: Count POS Frequencies

In [46]:
from collections import Counter

pos_counts = Counter([token.pos_ for token in doc])
print(pos_counts)


Counter({'ADP': 3, 'PROPN': 2, 'VERB': 2, 'DET': 2, 'NUM': 2, 'AUX': 1, 'NOUN': 1, 'SYM': 1, 'PUNCT': 1})


In [58]:
POS_Count = doc.count_by(spacy.attrs.POS)
print(POS_Count)
print(type(POS_Count))
print(f"{'POS':<15} {'NumberOfOccurrence':<15}")
for key, value in POS_Count.items():
    print(f"{doc.vocab[key].text:<15} {value}")

{96: 2, 87: 1, 100: 2, 85: 3, 90: 2, 92: 1, 99: 1, 93: 2, 97: 1}
<class 'dict'>
POS             NumberOfOccurrence
PROPN           2
AUX             1
VERB            2
ADP             3
DET             2
NOUN            1
SYM             1
NUM             2
PUNCT           1


# 🧪 Step 7: Custom POS Tagging Function

In [32]:
def print_pos_tags(text):
    doc = nlp(text)
    print(f"{'Token':<15} {'POS':<10} {'Tag':<7} {'Explanation'}")
    print("="*50)
    for token in doc:
        print(f"{token.text:<15} {token.pos_:<10} {token.tag_:<7} {spacy.explain(token.tag_)}")
        
print_pos_tags("Elon Musk founded SpaceX and Tesla Motors in the early 2000s.")


Token           POS        Tag     Explanation
Elon            PROPN      NNP     noun, proper singular
Musk            PROPN      NNP     noun, proper singular
founded         VERB       VBD     verb, past tense
SpaceX          PROPN      NNP     noun, proper singular
and             CCONJ      CC      conjunction, coordinating
Tesla           PROPN      NNP     noun, proper singular
Motors          PROPN      NNPS    noun, proper plural
in              ADP        IN      conjunction, subordinating or preposition
the             DET        DT      determiner
early           ADJ        JJ      adjective (English), other noun-modifier (Chinese)
2000s           NUM        CD      cardinal number
.               PUNCT      .       punctuation mark, sentence closer


# 🧬 Step 8: Combining POS with Dependency Parsing

In [36]:
for token in doc:
    print(f"{token.text:<12} | POS: {token.pos_:<6} | DEP: {token.dep_:<10} | Head: {token.head.text}")


Apple        | POS: PROPN  | DEP: nsubj      | Head: looking
is           | POS: AUX    | DEP: aux        | Head: looking
looking      | POS: VERB   | DEP: ROOT       | Head: looking
at           | POS: ADP    | DEP: prep       | Head: looking
buying       | POS: VERB   | DEP: pcomp      | Head: at
a            | POS: DET    | DEP: det        | Head: startup
startup      | POS: NOUN   | DEP: dobj       | Head: buying
in           | POS: ADP    | DEP: prep       | Head: startup
the          | POS: DET    | DEP: det        | Head: UK
UK           | POS: PROPN  | DEP: pobj       | Head: in
for          | POS: ADP    | DEP: prep       | Head: buying
$            | POS: SYM    | DEP: quantmod   | Head: billion
1            | POS: NUM    | DEP: compound   | Head: billion
billion      | POS: NUM    | DEP: pobj       | Head: for
.            | POS: PUNCT  | DEP: punct      | Head: looking


# ⚙️ Advanced: Custom Pipeline to Extract POS-based Patterns

In [39]:
def extract_adj_noun_pairs(doc):
    pairs = []
    for i in range(len(doc) - 1):
        if doc[i].pos_ == "ADJ" and doc[i+1].pos_ == "NOUN":
            pairs.append((doc[i].text, doc[i+1].text))
    return pairs

text = "The quick brown fox jumps over the lazy dog."
doc = nlp(text)
print(extract_adj_noun_pairs(doc))


[('brown', 'fox'), ('lazy', 'dog')]


# 🧾 References
- [spaCy POS Documentation](https://spacy.io/api/data-formats#pos-tagging)
- [Universal POS Tags](https://universaldependencies.org/u/pos/)