<a href="https://colab.research.google.com/github/samvillasmith/ML-for-NLP/blob/main/POS_Tagging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parts of Speech Tagging

Parts of speech (POS) tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. This is a fundamental task in Natural Language Processing (NLP) and is used in various applications like information extraction, sentiment analysis, and machine translation.

1. **CC**: Coordinating conjunction (e.g., `and`, `but`, `or`)
2. **CD**: Cardinal number (e.g., `one`, `two`, `three`)
3. **DT**: Determiner (e.g., `the`, `a`, `this`)
4. **EX**: Existential `there` (e.g., `there` is)
5. **FW**: Foreign word (e.g., `zeitgeist`)
6. **IN**: Preposition or subordinating conjunction (e.g., `in`, `of`, `on`, `if`, `while`)
7. **JJ**: Adjective (e.g., `big`, `red`, `happy`)
8. **JJR**: Adjective, comparative (e.g., `bigger`, `redder`, `happier`)
9. **JJS**: Adjective, superlative (e.g., `biggest`, `reddest`, `happiest`)
10. **LS**: List item marker (e.g., `1)`, `a)`)
11. **MD**: Modal (e.g., `can`, `will`, `may`)
12. **NN**: Noun, singular or mass (e.g., `dog`, `cat`, `air`)
13. **NNS**: Noun, plural (e.g., `dogs`, `cats`)
14. **NNP**: Proper noun, singular (e.g., `John`, `London`)
15. **NNPS**: Proper noun, plural (e.g., `Americans`, `Canadians`)
16. **PDT**: Predeterminer (e.g., `all` the, `both` my)
17. **POS**: Possessive ending (e.g., `dog's`)
18. **PRP**: Personal pronoun (e.g., `I`, `he`, `she`)
19. **PRP\$**: Possessive pronoun (e.g., `my`, `his`, `her`)
20. **RB**: Adverb (e.g., `quickly`, `happily`, `very`)
21. **RBR**: Adverb, comparative (e.g., `quicker`, `happier`)
22. **RBS**: Adverb, superlative (e.g., `quickest`, `happiest`)
23. **RP**: Particle (e.g., `up`, `down`, `off`)
24. **SYM**: Symbol (e.g., `$`, `%`, `#`)
25. **TO**: `to` (e.g., `to` the store)
26. **UH**: Interjection (e.g., `oh`, `wow`, `ouch`)
27. **VB**: Verb, base form (e.g., `run`, `eat`, `sing`)
28. **VBD**: Verb, past tense (e.g., `ran`, `ate`, `sang`)
29. **VBG**: Verb, gerund or present participle (e.g., `running`, `eating`, `singing`)
30. **VBN**: Verb, past participle (e.g., `run`, `eaten`, `sung`)
31. **VBP**: Verb, non-3rd person singular present (e.g., `run`, `eat`, `sing`)
32. **VBZ**: Verb, 3rd person singular present (e.g., `runs`, `eats`, `sings`)
33. **WDT**: Wh-determiner (e.g., `which`, `what`)
34. **WP**: Wh-pronoun (e.g., `who`, `whom`)
35. **WP\$**: Possessive wh-pronoun (e.g., `whose`)
36. **WRB**: Wh-adverb (e.g., `where`, `when`, `why`)

In [1]:
paragraph = '''
  So, first of all, let me assert my firm belief that the only thing we have to fear is fear itself — nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance. In every dark hour of our national life a leadership of frankness and of vigor has met with that understanding and support of the people themselves which is essential to victory. And I am convinced that you will again give that support to leadership in these critical days. More important, a host of unemployed citizens face the grim problem of existence, and an equally great number toil with little return. Only a foolish optimist can deny the dark realities of the moment. Our greatest primary task is to put people to work. This is no unsolvable problem if we face it wisely and courageously. There are many ways in which it can be helped, but it can never be helped merely by talking about it. We must act and act quickly.
I am prepared under my constitutional duty to recommend the measures that a stricken Nation in the midst of a stricken world may require. These measures, or such other measures as the Congress may build out of its experience and wisdom, I shall seek, within my constitutional authority, to bring to speedy adoption. But in the event that the Congress shall fail to take one of these two courses, and in the event that the national emergency is still critical, I shall not evade the clear course of duty that will then confront me. I shall ask the Congress for the one remaining instrument to meet the crisis — broad Executive power to wage a war against the emergency, as great as the power that would be given to me if we were in fact invaded by a foreign foe.
'''

In [19]:
import nltk
from nltk.corpus import stopwords
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


True

In [20]:
sentences = nltk.sent_tokenize(paragraph)

In [21]:
# Find the POS tags
for i in range(len(sentences)):
  words = nltk.word_tokenize(sentences[i])
  words = [word for word in words if word not in set(stopwords.words('english'))]
  pos_tag = nltk.pos_tag(words)
  print(pos_tag)

[('So', 'RB'), (',', ','), ('first', 'RB'), (',', ','), ('let', 'VB'), ('assert', 'JJ'), ('firm', 'JJ'), ('belief', 'NN'), ('thing', 'NN'), ('fear', 'NN'), ('fear', 'VBP'), ('—', 'NNP'), ('nameless', 'NN'), (',', ','), ('unreasoning', 'JJ'), (',', ','), ('unjustified', 'JJ'), ('terror', 'NN'), ('paralyzes', 'VBZ'), ('needed', 'VBN'), ('efforts', 'NNS'), ('convert', 'VBP'), ('retreat', 'NN'), ('advance', 'NN'), ('.', '.')]
[('In', 'IN'), ('every', 'DT'), ('dark', 'JJ'), ('hour', 'NN'), ('national', 'JJ'), ('life', 'NN'), ('leadership', 'NN'), ('frankness', 'JJ'), ('vigor', 'NN'), ('met', 'VBD'), ('understanding', 'JJ'), ('support', 'NN'), ('people', 'NNS'), ('essential', 'JJ'), ('victory', 'NN'), ('.', '.')]
[('And', 'CC'), ('I', 'PRP'), ('convinced', 'VBD'), ('give', 'JJ'), ('support', 'NN'), ('leadership', 'NN'), ('critical', 'JJ'), ('days', 'NNS'), ('.', '.')]
[('More', 'RBR'), ('important', 'JJ'), (',', ','), ('host', 'NN'), ('unemployed', 'JJ'), ('citizens', 'NNS'), ('face', 'VBP')

In [22]:
for i in "The Eiffel Tower is a beautiful monument".split():
  print(nltk.pos_tag([i]))

[('The', 'DT')]
[('Eiffel', 'NN')]
[('Tower', 'NN')]
[('is', 'VBZ')]
[('a', 'DT')]
[('beautiful', 'NN')]
[('monument', 'NN')]
