<a href="https://colab.research.google.com/github/prad69/NLP/blob/main/Applications_using_NLTK_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis
### Using the Pre-Trained VADER Model
https://github.com/cjhutto/vaderSentiment

VADER (Valence-Aware Dictionary for Sentiment Reasoning)

This code first downloads the VADER lexicon using NLTK. Then it creates an instance of the SentimentIntensityAnalyzer, which will be used to classify the sentiment of the text. Then it loads a small dataset that includes a few examples of text along with their corresponding sentiment labels.

For the classification, it uses the compound score provided by the VADER library, and checks if the score is above 0.05 for positive, below -0.05 for negative and in between for neutral.

It is important to note that VADER is based on a lexicon and it is a rule-based approach, which may not always provide the best results in all cases, and you may need to tweak the code to suit your needs.

In [3]:
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

# Download the VADER lexicon
nltk.download('vader_lexicon')

# Instantiate the SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


VADER is a lexicon and rule-based feeling analysis instrument that is explicitly sensitive to suppositions communicated in web-based media. VADER utilizes a mix of lexical highlights (e.g., words) that are, for the most part, marked by their semantic direction as one or the other positive or negative. Thus, VADER not only tells about the Polarity score yet, in addition, it tells us concerning how positive or negative a conclusion is.

* VADER assigns a sentiment score to each lexicon. Every word in the vocabulary is appraised with respect to whether it is positive or negative, and, how +ve or -ve.

* Note: Not every word (token) is present in the lexicon. The lexicon needs to have great inclusion of the words in your content, else, it will not be extremely accurate.

## Let's apply VADER on a sample text.

In [4]:
text = 'This is an amazing product!'
vader.polarity_scores(text) #Predict the polarity of the text
#You can see that VADER returns a total of 4 values in a dictionary.

{'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.6239}

> The first three keys {neg, neu, pos} denote the probability scores for the negative, neutral, and positive sentiment respectively.

> The last value {'compound'} calculates the sum of all the lexicon ratings which have been **normalized between -1(most extreme negative) and +1 (most extreme positive)**

# Let's analyze the ratings for the tokens in the input text

In [8]:
import nltk
nltk.download('punkt_tab')

text = 'This is an amazing product!'
tokenized_sentence = nltk.word_tokenize(text)
print(tokenized_sentence)

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


['This', 'is', 'an', 'amazing', 'product', '!']


When VADER examines a piece of text, it verifies whether any of the words in the content are available in its lexicon.

In [9]:
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]

for word in tokenized_sentence:
    if (vader.polarity_scores(word)['compound']) >= 0.05: #A positive word will have a score greater than 0.
        pos_word_list.append(word)
        pos_word_list.append(vader.polarity_scores(word)['compound'])
    elif (vader.polarity_scores(word)['compound']) <= -0.05: #A negative word will have a score less than 0.
        neg_word_list.append(word)
        neg_word_list.append(vader.polarity_scores(word)['compound'])
    else:
        neu_word_list.append(word)
        neu_word_list.append(vader.polarity_scores(word)['compound'])

print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list)

# You can see that only the word 'amazing' was present in the lexicon with a rating of +0.5859.
# A score of 0 means that the word was not present in the lexicon.

Positive: ['amazing', 0.5859]
Neutral: ['This', 0.0, 'is', 0.0, 'an', 0.0, 'product', 0.0, '!', 0.0]
Negative: []


In [10]:
sentiment = vader.polarity_scores(text)
sentiment

{'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.6239}

The compound score is computed as the sum of ratings of the lexicons that have been **normalized between -1 (most extreme negative) and +1 (most extreme positive)**.

In [12]:
# decide the sentiment using the sentiment probabilities
# Choose the sentiment with the max probability

text = 'This is an amazing product!'
sentiment_dict = vader.polarity_scores(text)
print(sentiment_dict)

neg_prob  = sentiment_dict["neg"]
pos_prob = sentiment_dict["pos"]
neu_prob  = sentiment_dict["neu"]

#Find the sentiment with the max probability
ind = [neg_prob, pos_prob, neu_prob].index(max([neg_prob, pos_prob, neu_prob]))

if(ind==0):
    predicted_sentiment = "negative"
elif(ind==1):
    predicted_sentiment = "positive"
else:
    predicted_sentiment = "neutral"

predicted_sentiment

{'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.6239}


'positive'

In [13]:
# You can also decide the sentiment using the compound score

text = 'This is an amazing product!'
sentiment_dict = vader.polarity_scores(text)
print(sentiment_dict)

if sentiment_dict['compound'] >= 0.05: #The threshold should be chosen carefully!
    print("Positive")

elif sentiment_dict['compound'] <= -0.05:
    print("Negative")

else :
    print("Neutral")

{'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.6239}
Positive


In [14]:
# Taking a sample dataset
dataset = [("This is an amazing product!", "positive"),
           ("I hate this product!", "negative"),
           ("I am indifferent to this product.", "neutral"),
           ("I'm not sure how I feel about this product.", "neutral"),
           ("This product is terrible.", "negative"),
          ("Doesn't match match my expectations","negative"),
          ("The product could be much better","neutral"),
          ("I appreciate how terrible it is!", "negative"),
          ("I hate this product but I love the design","neutral")]

In [15]:
# Test the pre-trained vader classifier on the above dataset
#Evaluate the performance using the metric Accuracy

correct = 0 #Counter for correctly predicted samples
incorrect = 0 #Counter for incorrectly predicted samples
total = 0 #Counter for total samples
for text, sentiment in dataset: #Each tuple in the dataset contains the <text> and the corresponding <sentiment>
    prediction = vader.polarity_scores(text)

    neg_prob  = prediction["neg"]
    pos_prob = prediction["pos"]
    neu_prob  = prediction["neu"]

    ind = [neg_prob, pos_prob, neu_prob].index(max([neg_prob, pos_prob, neu_prob]))
    if(ind==0):
        predicted_sentiment = "negative"
    elif(ind==1):
        predicted_sentiment = "positive"
    else:
        predicted_sentiment = "neutral"

    if predicted_sentiment == sentiment:
        #print(text, sentiment, predicted_sentiment)
        correct += 1
    total += 1

    #For misclassified samples
    if predicted_sentiment != sentiment:
        incorrect = incorrect + 1
        print('-----------------------------')
        print(str(incorrect)+')',"Text:",text, '\nActual Sentiment:',sentiment, "\nPredicted Sentiment:",predicted_sentiment)

print('-----------------------------\n')
print(f"Accuracy on the dataset: {correct/total}")

-----------------------------
1) Text: Doesn't match match my expectations 
Actual Sentiment: negative 
Predicted Sentiment: neutral
-----------------------------
2) Text: I hate this product but I love the design 
Actual Sentiment: neutral 
Predicted Sentiment: positive
-----------------------------

Accuracy on the dataset: 0.7777777777777778


### More about VADER scoring here
https://github.com/cjhutto/vaderSentiment#code-examples

# Word Sense Disambiguation (WSD)
Deals with determining the intended meaning of a word in a given context. It is the process of identifying the correct sense of a word from a set of possible senses, based on the context in which the word appears.

In [17]:
# Install pre-requisites

# Download the WordNet corpus
nltk.download('wordnet')

# Download the Lesk algorithm
nltk.download('omw')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw to /root/nltk_data...


True


## Lesk Algorithm

Performs the classic Lesk algorithm for Word Sense Disambiguation (WSD) using the definitions of the ambiguous word.

Given an ambiguous word and the context in which the word occurs, Lesk returns a Synset with the highest number of overlapping words between the context sentence and different definitions from each Synset

In [18]:
from nltk.wsd import lesk
sentence = ['I', 'went', 'to', 'the', 'bank', 'to', 'deposit', 'money', '.']

print(lesk(sentence, 'bank'))#'Bank' is the ambiguous word here.

Synset('savings_bank.n.02')


In [19]:
from nltk.corpus import wordnet as wn
#Let's see all the possible definitions of the word 'Bank'
for ss in wn.synsets('bank'):
    print(ss, ss.definition())

Synset('bank.n.01') sloping land (especially the slope beside a body of water)
Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
Synset('bank.n.03') a long ridge or pile
Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Synset('bank.n.09') a building in which the business of banking transacted
Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
Synset('bank.v.01') tip laterally
Sy

In [20]:
sent = 'people should be able to marry a person of their choice'.split()
print(lesk(sent, 'able'))
print(lesk(sent, 'able').definition())

Synset('able.s.04')
having a strong healthy body


In [21]:
#When calling lesk, if you can provide the part of speech for the word
# to be disambiguated it can provide better results.
print(lesk(sent, 'able', pos='a')) #pos argument denotes part of speech. Here, 'a' is an adjective.
print(lesk(sent, 'able', pos='a').definition())

Synset('able.a.01')
(usually followed by `to') having the necessary means or skill or know-how or authority to do something


In [22]:
# Input text
text_examples = [("I have a bank account with the SBI bank", "bank"),
                 ("We went for a picnic on the eve of new year","eve"),
                 ("I play the guitar", "play"),
                ("I prepared a report for my manager",'report')]

# Perform word sense disambiguation using the Lesk algorithm
from nltk.wsd import lesk
for text, target_word in text_examples:
    tokens = nltk.word_tokenize(text)
    sense = lesk(tokens, target_word)# Provide the pos for better results
    if sense:
        print(f'{target_word} : {sense.definition()}')
        print(f'Examples: {sense.examples()}')
        print()

bank : do business with a bank or keep an account at a bank
Examples: ['Where do you bank in this town?']

eve : the latter part of the day (the period of decreasing daylight from late afternoon until nightfall)
Examples: ['he enjoyed the evening light across the lake']

play : (in games or plays or other performances) the time during which play proceeds
Examples: ['rain stopped play in the 4th inning']

report : the general estimation that the public has for a person
Examples: ['he acquired a reputation as an actor before he started writing', 'he was a person of bad report']

