# Sentiment Analysis of Reviews using NLTK

Sentiment analysis is a technique used to determine the sentiment or opinion expressed in a piece of text. It has various applications, such as analyzing customer reviews, social media sentiment, and market research. In this blog, we will explore how to perform sentiment analysis on a dataset of reviews using the NLTK library and Pandas.

To begin with, we need to install the NLTK library, specifically the VADER (Valence Aware Dictionary and sEntiment Reasoner) module, which is a pre-trained sentiment analysis model. We will also need the Pandas library for data manipulation and analysis. Ensure that you have these libraries installed before running the code.

### Setup and Dependencies

In [1]:
# import libraries
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download("vader_lexicon")
import pandas as pd

# download data
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/reviews%20data.csv")
data = data.dropna()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### Sentiment Analysis using NLTK

Next, we initialize the SentimentIntensityAnalyzer from NLTK's VADER module. This analyzer assigns a sentiment polarity score to each review in the dataset. The polarity score consists of positive, negative, neutral, and compound scores. Positive, negative, and neutral scores represent the proportion of each sentiment in the review, while the compound score combines all three sentiments into a single value.

In [2]:
# add new column
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Review"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Review"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Review"]]
data['Compound'] = [sentiments.polarity_scores(i)["compound"] for i in data["Review"]]
data.head()

Unnamed: 0,Review,Positive,Negative,Neutral,Compound
0,nice hotel expensive parking got good deal sta...,0.285,0.072,0.643,0.9747
1,ok nothing special charge diamond member hilto...,0.189,0.11,0.701,0.9787
2,nice rooms not 4* experience hotel monaco seat...,0.219,0.081,0.7,0.9889
3,"unique, great stay, wonderful time hotel monac...",0.385,0.06,0.555,0.9912
4,"great stay great stay, went seahawk game aweso...",0.221,0.135,0.643,0.9797


### Categorizing Sentiments

To categorize the sentiment of each review, we assign labels based on the compound score. If the compound score is greater than or equal to 0.05, we classify the review as "Positive." If the compound score is less than or equal to -0.05, we classify it as "Negative." Otherwise, we label it as "Neutral."

In [3]:
# calculating sentiment scores
score = data["Compound"].values
sentiment = []
for i in score:
    if i >= 0.05 :
        sentiment.append('Positive')
    elif i <= -0.05 :
        sentiment.append('Negative')
    else:
        sentiment.append('Neutral')
data["Sentiment"] = sentiment
data.head()

Unnamed: 0,Review,Positive,Negative,Neutral,Compound,Sentiment
0,nice hotel expensive parking got good deal sta...,0.285,0.072,0.643,0.9747,Positive
1,ok nothing special charge diamond member hilto...,0.189,0.11,0.701,0.9787,Positive
2,nice rooms not 4* experience hotel monaco seat...,0.219,0.081,0.7,0.9889,Positive
3,"unique, great stay, wonderful time hotel monac...",0.385,0.06,0.555,0.9912,Positive
4,"great stay great stay, went seahawk game aweso...",0.221,0.135,0.643,0.9797,Positive


### Analyzing the Results

After assigning sentiment labels to each review, we can examine the distribution of sentiments in the dataset. By calling value_counts() on the "Sentiment" column, we obtain a count of reviews for each sentiment category.

In [4]:
# calculate each class
data["Sentiment"].value_counts()

Positive    18831
Negative     1569
Neutral        91
Name: Sentiment, dtype: int64

### Conclusion

Sentiment analysis is a powerful technique for understanding the opinions and attitudes expressed in text data. In this blog, we demonstrated how to perform sentiment analysis using the NLTK library and Pandas. By utilizing NLTK's VADER module, we calculated sentiment scores and categorized reviews as positive, negative, or neutral. This analysis can be helpful for businesses to gain insights from customer reviews, analyze social media sentiment, and make data-driven decisions.

Remember, sentiment analysis is not perfect and can have limitations. It relies on predefined sentiment lexicons and may not capture the full context or nuances of human language. Nevertheless, it provides a valuable starting point for understanding sentiment patterns in text data.

By combining the power of NLTK, Pandas, and other natural language processing techniques, you can further enhance sentiment analysis and extract deeper insights from text data. Experimenting with different models, preprocessing techniques, and data visualization approaches can help you uncover valuable information and make informed decisions based on sentiment analysis.

In conclusion, sentiment analysis is a valuable tool in the field of natural language processing, and this code provides a foundation for conducting sentiment analysis on text data.