C035
Krisha Goti

# Implementing POS tagging using Python

**B.1 Tasks given in PART A to be completed here**


In [None]:
import nltk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

from nltk.tokenize import word_tokenize
from nltk import pos_tag

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
sentence = "krisha is in love with Python."

words = word_tokenize(sentence)
pos_tags = pos_tag(words)
pos_tags

[('krisha', 'NN'),
 ('is', 'VBZ'),
 ('in', 'IN'),
 ('love', 'NN'),
 ('with', 'IN'),
 ('Python', 'NNP'),
 ('.', '.')]

# Implementing NER using python

In [None]:
import nltk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk import ne_chunk

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


In [None]:
sentence = "John is in love with python language and is working at Google."

words = word_tokenize(sentence)
pos_tags = pos_tag(words)
ner_tags = ne_chunk(pos_tags)

named_entities = []
for subtree in ner_tags.subtrees():
    if subtree.label() in ['ORGANIZATION', 'PERSON', 'LOCATION']:
        entity = ' '.join([word for word, tag in subtree.leaves()])
        named_entities.append((entity, subtree.label()))

print(named_entities)

[('John', 'PERSON'), ('Google', 'ORGANIZATION')]


**B.2 Observations and Learning:**

In this experiment we have observed and studied about
1. Understand POS Tagging and NER
2. Design and Implementation of POS and NER


**B.3 Conclusion:**

After successfully completing this experiment we are able to:

a) Part of speech tagging - like noun, pronoun, verb, adjective, adverb, preposition, conjunction, and interjection

b) Identify the Named Entity Recognition (NER) in text data.


**B.4 Question of curiosity:**


1. **How HMM can be used in POS tagging?**
* Hidden Markov Models (HMMs) can be used in Part-of-Speech (POS) tagging to predict the most likely POS tags for a sequence of words in a sentence. HMMs are a type of probabilistic model that can learn the statistical relationships between words and their corresponding POS tags from annotated training data.
* In POS tagging, the HMM model considers each word in a sentence as an observation and assigns a POS tag to each observation based on the probabilities learned from the training data. The model uses a set of hidden states to represent the possible POS tags for each observation. The HMM algorithm then calculates the probability of each hidden state sequence given the observed sequence of words, using the Viterbi algorithm. The most likely hidden state sequence is chosen as the predicted POS tags for the sentence.


2. **What are the Applications of NER?**
* Information Extraction: NER can be used to automatically extract structured information from unstructured text, such as identifying names of people, organizations, locations, and other entities from news articles, social media posts, and other sources.
* Search and recommendation: NER can improve the accuracy of search results and recommendations by identifying relevant entities in a query or user profile, and matching them with relevant documents or products.
* Machine Translation: NER can be used to identify named entities in the source language and map them to their corresponding entities in the target language, which can improve the quality of machine translation.
* Chatbots and virtual assistants: NER can be used to identify entities in user queries, such as names of places, dates, and times, and provide relevant responses or actions.
* Sentiment Analysis: NER can help identify entities that are related to specific opinions or sentiments in text, which can improve the accuracy of sentiment analysis.

3. **‘That former Sri Lanka skipper and ace batsman Aravinda De Silva is a man of few words was very much evident on Wednesday when the legendary batsman, who has always let his bat talk, struggled to answer a barrage of questions at a function to promote the cricket league in the city’.**

* The above is a news item in Times of India (9/8/12). Assume you have only 4 tags N (noun), V (verb), J (adjective), R (adverb). Manually POS tag the above text.

That (R) former (J) Sri Lanka (N) skipper (N) and (R) ace (J) batsman (N) Aravinda De Silva (N) is (V) a (R) man (N) of (R) few (J) words (N) was (V) very (R) much (J) evident (J) on (R) Wednesday (N) when (R) the (J) legendary (J) batsman (N), who (N) has (V) always (R) let (V) his (J) bat (N) talk (V), struggled (V) to (R) answer (V) a (R) barrage (N) of (R) questions (N) at (R) a (J) function (N) to (R) promote (V) the (N) cricket (N) league (N) in (R) the (N) city (N).


