# Parts of Speech Tagging

Source: https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72

For any language, syntax and structure usually go hand in hand, where a set of specific rules, conventions, and principles govern the way words are combined into phrases; phrases get combines into clauses; and clauses get combined into sentences. 

Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.

__Parts of speech (POS)__ are specific lexical categories to which words are assigned, based on their syntactic context and role. Usually, words can fall into one of the following major categories.

+ __N(oun)__: This usually denotes words that depict some object or entity, which may be living or nonliving. Some examples would be fox , dog , book , and so on. The POS tag symbol for nouns is N.

+ __V(erb)__: Verbs are words that are used to describe certain actions, states, or occurrences. There are a wide variety of further subcategories, such as auxiliary, reflexive, and transitive verbs (and many more). Some typical examples of verbs would be running , jumping , read , and write . The POS tag symbol for verbs is V.

+ __Adj(ective)__: Adjectives are words used to describe or qualify other words, typically nouns and noun phrases. The phrase beautiful flower has the noun (N) flower which is described or qualified using the adjective (ADJ) beautiful . The POS tag symbol for adjectives is ADJ .

+ __Adv(erb)__: Adverbs usually act as modifiers for other words including nouns, adjectives, verbs, or other adverbs. The phrase very beautiful flower has the adverb (ADV) very , which modifies the adjective (ADJ) beautiful , indicating the degree to which the flower is beautiful. The POS tag symbol for adverbs is ADV.

Besides these four major categories of parts of speech , there are other categories that occur frequently in the English language. These include pronouns, prepositions, interjections, conjunctions, determiners, and many others. Furthermore, each POS tag like the noun (N) can be further subdivided into categories like __singular nouns (NN)__, __singular proper nouns (NNP)__, and __plural nouns (NNS)__.

The process of classifying and labeling POS tags for words called parts of speech tagging or POS tagging . 

In [1]:
sentence = 'This is NLP text processsing checker'
sentence

'This is NLP text processsing checker'

In [2]:
import nltk

nltk_pos_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))
nltk_pos_tagged

[('This', 'DT'),
 ('is', 'VBZ'),
 ('NLP', 'NNP'),
 ('text', 'NN'),
 ('processsing', 'NN'),
 ('checker', 'NN')]

In [3]:
import pandas as pd

pd.DataFrame(nltk_pos_tagged, 
             columns=['Word', 'POS tag']).T

Unnamed: 0,0,1,2,3,4,5
Word,This,is,NLP,text,processsing,checker
POS tag,DT,VBZ,NNP,NN,NN,NN


In [5]:
import spacy

nlp = spacy.load('en_core_web_sm', parse=False, tag=False, entity=False)

In [6]:
sentence_nlp = nlp(sentence)
spacy_pos_tagged = [(word, word.tag_, word.pos_) for word in sentence_nlp]
spacy_pos_tagged

[(This, 'DT', 'DET'),
 (is, 'VBZ', 'VERB'),
 (NLP, 'NNP', 'PROPN'),
 (text, 'NN', 'NOUN'),
 (processsing, 'VBG', 'VERB'),
 (checker, 'NN', 'NOUN')]

In [7]:
pd.DataFrame(spacy_pos_tagged, 
             columns=['Word', 'POS tag', 'Tag type']).T

Unnamed: 0,0,1,2,3,4,5
Word,This,is,NLP,text,processsing,checker
POS tag,DT,VBZ,NNP,NN,VBG,NN
Tag type,DET,VERB,PROPN,NOUN,VERB,NOUN
