<h1 style="text-align:center;color:mediumvioletred">Parts of Speech Tagging</h1>

### Parts of Speech in English
---

| Part of Speech        | Description                                             | Example                    |
|-----------------------|---------------------------------------------------------|----------------------------|
| Noun                  | Names a person, place, thing, or idea                   | *dog, London, book*        |
| Pronoun               | Replaces a noun                                         | *he, she, it, they*        |
| Verb                  | Expresses action or state of being                      | *run, eat, is, seem*       |
| Adjective             | Describes a noun or pronoun                             | *happy, tall, blue*        |
| Adverb                | Describes a verb, adjective, or another adverb          | *quickly, very, well*      |
| Preposition           | Shows relationship between a noun/pronoun and another word | *in, on, under, with*  |
| Conjunction           | Joins words, phrases, or clauses                        | *and, but, although*       |
| Interjection          | Expresses sudden emotion or feeling                     | *oh!, wow!, hey!*          |
| Determiner (optional) | Introduces nouns (sometimes grouped with adjectives)     | *the, a, this, my*         |
| Article (optional)    | Special kind of determiner                              | *a, an, the*               |


### Parts of Speech in spaCy
---

| Tag    | Meaning                        | Example                  |
|--------|--------------------------------|--------------------------|
| ADJ    | Adjective                      | *happy, quick*           |
| ADP    | Adposition (pre/postposition)  | *in, on, at*             |
| ADV    | Adverb                         | *quickly, very*          |
| AUX    | Auxiliary verb                 | *is, was, have*          |
| CCONJ  | Coordinating conjunction       | *and, but, or*           |
| DET    | Determiner                     | *the, a, an, this*       |
| INTJ   | Interjection                   | *oh, wow, hey*           |
| NOUN   | Noun                           | *dog, house*             |
| NUM    | Numeral                        | *one, 20, first*         |
| PART   | Particle                       | *to (in to go), not*     |
| PRON   | Pronoun                        | *he, she, it, they*      |
| PROPN  | Proper noun                    | *John, London*           |
| PUNCT  | Punctuation                    | *.,!?*                   |
| SCONJ  | Subordinating conjunction      | *because, although*      |
| SYM    | Symbol                         | *$, %, =*                |
| VERB   | Verb                           | *run, eat, play*         |
| X      | Other / unknown                | *etc., lorem*            |
| SPACE  | Whitespace                     | `"   "`                  |

___


## POS tagging

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_md")

In [11]:
doc = nlp("Although it was raining heavily, the children played football in the park.")

for token in doc:
    print(token.text, " | ", token.pos_, " | ", spacy.explain(token.pos_))

Although  |  SCONJ  |  subordinating conjunction
it  |  PRON  |  pronoun
was  |  AUX  |  auxiliary
raining  |  VERB  |  verb
heavily  |  ADV  |  adverb
,  |  PUNCT  |  punctuation
the  |  DET  |  determiner
children  |  NOUN  |  noun
played  |  VERB  |  verb
football  |  NOUN  |  noun
in  |  ADP  |  adposition
the  |  DET  |  determiner
park  |  NOUN  |  noun
.  |  PUNCT  |  punctuation


In [13]:
doc = nlp("Wow! Dr. Strange made 265 million $ on the very first day")

for token in doc:
    print(token.text, " | ", token.pos_, " | ", spacy.explain(token.pos_), " | ", token.tag_, " | ", spacy.explain(token.tag_))

Wow  |  INTJ  |  interjection  |  UH  |  interjection
!  |  PUNCT  |  punctuation  |  .  |  punctuation mark, sentence closer
Dr.  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
Strange  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
made  |  VERB  |  verb  |  VBD  |  verb, past tense
265  |  NUM  |  numeral  |  CD  |  cardinal number
million  |  NUM  |  numeral  |  CD  |  cardinal number
$  |  NOUN  |  noun  |  NN  |  noun, singular or mass
on  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
the  |  DET  |  determiner  |  DT  |  determiner
very  |  ADV  |  adverb  |  RB  |  adverb
first  |  ADJ  |  adjective  |  JJ  |  adjective (English), other noun-modifier (Chinese)
day  |  NOUN  |  noun  |  NN  |  noun, singular or mass


### Difference between past and present tense

In [16]:
doc = nlp("He quits the job")

print(doc[1].text, " | ", doc[1].tag_, " | ", spacy.explain(doc[1].tag_))

quits  |  VBZ  |  verb, 3rd person singular present


In [17]:
doc = nlp("He quit the job")

print(doc[1].text, " | ", doc[1].tag_, " | ", spacy.explain(doc[1].tag_))

quit  |  VBD  |  verb, past tense


### Counting all POS from the text

In [18]:
earning_report = '''Apple today announced financial results for its fiscal 2025 third quarter ended June 28, 2025.
The Company posted quarterly revenue of $94.0 billion, up 10 percent year over year, and quarterly diluted earnings per share of $1.57, up 12 percent year over year.
"Today Apple is proud to report a June quarter revenue record with double-digit growth in iPhone, Mac and Services and growth around the world, in every geographic segment," said Tim Cook, Apple’s CEO.
"At WWDC25, we were excited to introduce a beautiful new software design that extends across all of our platforms, and we announced even more great Apple Intelligence features."'''

In [19]:
doc = nlp(earning_report)

for token in doc:
    print(token.text, " | ", token.pos_, " | ", spacy.explain(token.pos_))

Apple  |  PROPN  |  proper noun
today  |  NOUN  |  noun
announced  |  VERB  |  verb
financial  |  ADJ  |  adjective
results  |  NOUN  |  noun
for  |  ADP  |  adposition
its  |  PRON  |  pronoun
fiscal  |  ADJ  |  adjective
2025  |  NUM  |  numeral
third  |  ADJ  |  adjective
quarter  |  NOUN  |  noun
ended  |  VERB  |  verb
June  |  PROPN  |  proper noun
28  |  NUM  |  numeral
,  |  PUNCT  |  punctuation
2025  |  NUM  |  numeral
.  |  PUNCT  |  punctuation

  |  SPACE  |  space
The  |  DET  |  determiner
Company  |  PROPN  |  proper noun
posted  |  VERB  |  verb
quarterly  |  ADJ  |  adjective
revenue  |  NOUN  |  noun
of  |  ADP  |  adposition
$  |  SYM  |  symbol
94.0  |  NUM  |  numeral
billion  |  NUM  |  numeral
,  |  PUNCT  |  punctuation
up  |  ADV  |  adverb
10  |  NUM  |  numeral
percent  |  NOUN  |  noun
year  |  NOUN  |  noun
over  |  ADP  |  adposition
year  |  NOUN  |  noun
,  |  PUNCT  |  punctuation
and  |  CCONJ  |  coordinating conjunction
quarterly  |  ADJ  |  adjecti

In [27]:
doc = nlp(earning_report)

filtered_tokens = []
for token in doc:
    if token.pos_ not in ["SPACE","PUNCT","X"]:
        filtered_tokens.append(token)
        #print(token.text, " | ", token.pos_, " | ", spacy.explain(token.pos_))

filtered_tokens

[Apple,
 today,
 announced,
 financial,
 results,
 for,
 its,
 fiscal,
 2025,
 third,
 quarter,
 ended,
 June,
 28,
 2025,
 The,
 Company,
 posted,
 quarterly,
 revenue,
 of,
 $,
 94.0,
 billion,
 up,
 10,
 percent,
 year,
 over,
 year,
 and,
 quarterly,
 diluted,
 earnings,
 per,
 share,
 of,
 $,
 1.57,
 up,
 12,
 percent,
 year,
 over,
 year,
 Today,
 Apple,
 is,
 proud,
 to,
 report,
 a,
 June,
 quarter,
 revenue,
 record,
 with,
 double,
 digit,
 growth,
 in,
 iPhone,
 Mac,
 and,
 Services,
 and,
 growth,
 around,
 the,
 world,
 in,
 every,
 geographic,
 segment,
 said,
 Tim,
 Cook,
 Apple,
 ’s,
 CEO,
 At,
 WWDC25,
 we,
 were,
 excited,
 to,
 introduce,
 a,
 beautiful,
 new,
 software,
 design,
 that,
 extends,
 across,
 all,
 of,
 our,
 platforms,
 and,
 we,
 announced,
 even,
 more,
 great,
 Apple,
 Intelligence,
 features]

In [28]:
counts = doc.count_by(spacy.attrs.POS)
counts

{96: 15,
 92: 24,
 100: 10,
 84: 12,
 85: 13,
 95: 6,
 93: 8,
 97: 19,
 103: 3,
 90: 5,
 99: 2,
 86: 4,
 89: 4,
 87: 2,
 94: 3}

In [29]:
for key, counts in counts.items():
    print(doc.vocab[key].text, " | ", counts)

PROPN  |  15
NOUN  |  24
VERB  |  10
ADJ  |  12
ADP  |  13
PRON  |  6
NUM  |  8
PUNCT  |  19
SPACE  |  3
DET  |  5
SYM  |  2
ADV  |  4
CCONJ  |  4
AUX  |  2
PART  |  3


# Exercise

### Question:
You are parsing a news story from cnbc.com. News story is stores in news_story.txt which is available in this same folder on github. You need to,

- Extract all NOUN tokens from this story. You will have to read the file in python first to collect all the text and then extract NOUNs in a python list
- Extract all numbers (NUM POS type) in a python list
- Print a count of all POS tags in this story

In [31]:
with open("news_story.txt") as f:
    text = f.read()
text

'Inflation rose again in April, continuing a climb that has pushed consumers to the brink and is threatening the economic expansion, the Bureau of Labor Statistics reported Wednesday.\n\nThe consumer price index, a broad-based measure of prices for goods and services, increased 8.3% from a year ago, higher than the Dow Jones estimate for an 8.1% gain. That represented a slight ease from Marchâ€™s peak but was still close to the highest level since the summer of 1982.\n\nRemoving volatile food and energy prices, so-called core CPI still rose 6.2%, against expectations for a 6% gain, clouding hopes that inflation had peaked in March.\n\nThe month-over-month gains also were higher than expectations â€” 0.3% on headline CPI versus the 0.2% estimate and a 0.6% increase for core, against the outlook for a 0.4% gain.\n\nThe price gains also meant that workers continued to lose ground. Real wages adjusted for inflation decreased 0.1% on the month despite a nominal increase of 0.3% in average h

In [44]:
doc = nlp(text)

nouns = []
numbers = []

for token in doc:
    if token.pos_ == 'NOUN':
        nouns.append(token)
    if token.pos_ == 'NUM':
        numbers.append(token)

In [45]:
nouns

[Inflation,
 climb,
 consumers,
 brink,
 expansion,
 consumer,
 price,
 index,
 measure,
 prices,
 goods,
 services,
 %,
 year,
 estimate,
 %,
 gain,
 ease,
 Marchâ€,
 ™,
 peak,
 level,
 summer,
 food,
 energy,
 prices,
 core,
 %,
 expectations,
 %,
 gain,
 inflation,
 month,
 month,
 gains,
 expectations,
 %,
 headline,
 %,
 estimate,
 %,
 increase,
 core,
 outlook,
 %,
 gain,
 price,
 gains,
 workers,
 ground,
 wages,
 inflation,
 %,
 month,
 increase,
 %,
 earnings,
 year,
 earnings,
 %,
 earnings,
 %,
 Inflation,
 threat,
 recovery,
 pandemic,
 economy,
 year,
 growth,
 level,
 prices,
 pump,
 grocery,
 stores,
 problem,
 inflation,
 areas,
 housing,
 auto,
 sales,
 host,
 areas,
 officials,
 problem,
 interest,
 rate,
 hikes,
 year,
 pledges,
 inflation,
 bankâ€,
 ™,
 s,
 %,
 goal,
 ™,
 data,
 job,
 Credits]

In [46]:
numbers

[8.3,
 8.1,
 1982,
 6.2,
 6,
 â€,
 0.3,
 0.2,
 0.6,
 0.4,
 0.1,
 0.3,
 2.6,
 5.5,
 2021,
 1984,
 one,
 two,
 two,
 2]

In [47]:
counts = doc.count_by(spacy.attrs.POS)
counts

{92: 99,
 100: 29,
 86: 15,
 85: 39,
 96: 15,
 97: 32,
 90: 34,
 95: 4,
 87: 13,
 89: 10,
 84: 23,
 103: 7,
 93: 20,
 94: 3,
 98: 8,
 101: 1}

In [48]:
for k, v in counts.items():
    print(doc.vocab[k].text, " | ", v)

NOUN  |  99
VERB  |  29
ADV  |  15
ADP  |  39
PROPN  |  15
PUNCT  |  32
DET  |  34
PRON  |  4
AUX  |  13
CCONJ  |  10
ADJ  |  23
SPACE  |  7
NUM  |  20
PART  |  3
SCONJ  |  8
X  |  1
