# CLASSWORK-4 SENTIMENT ANALYSIS

## SENTIMENT ANALYSIS

Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotion expressed in a piece of text. The goal is to identify whether the expressed sentiment is positive, negative, or neutral.

In [11]:
#This dataset contains information about arts and crafts and reviewrs from different websites
import pandas as pd 
df = pd.read_json( 'Arts_Crafts_and_Sewing_5.json',lines=True)
df.sample(10)


Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,vote,image
476419,5,True,"12 23, 2014",A29XP7EH860KL6,B00LQL4D46,,Mary Kay,Looked through them when the arrived and are v...,Very nice fabric,1419292800,2.0,
248900,5,True,"09 1, 2013",AVUZS1HR1YQIA,B006VO5NRE,,vickie evans,I am new to painting and I have had some brush...,Great starter burshes.,1377993600,,
462633,5,True,"03 5, 2015",AO1TA54MWD5FV,B00D8IWZRM,,judith a.,Great!!!,great,1425513600,,
122978,5,True,"11 18, 2013",A287IGR5MHN6BB,B001687ZDA,{'Size:': ' By The Yard'},MeganD,"Again, this was for my Halloween Costume and m...",Stayed on like a champ.,1384732800,,
389753,5,True,"09 2, 2016",A1P46Y314WIU65,B015VW38FU,"{'Size:': ' 6 Sizes (4-10mm)', 'Color:': ' Sil...",M Nelson,Lots of rings! Great for projects. It was smal...,Great for projects,1472774400,,
485894,5,True,"01 8, 2017",AFL3ZU1KWP3LD,B0107VDKEO,{'Color:': ' Metal Style#1 Clear CrystalAB'},Linda Stiles,KIND OF EXPENSIVE... BUT GREAT LOOKING!!!,BUT GREAT LOOKING!,1483833600,,
115387,5,True,"09 22, 2015",A1PAGHECG401K1,B0013JNBIK,{'Color:': ' Dove Gray'},Chel Micheline,This gray ink is so light you can barely see i...,Dove Grey: great faint gray ink for no-line co...,1442880000,,
149605,5,True,"02 8, 2014",A1OMG69AAAJPJO,B001H83B9Q,{'Size:': ' 1 Pack'},Linda S. Marche,thanks alot,Five Stars,1391817600,,
299242,5,True,"11 8, 2017",A3SCMXMA72PCX9,B00DG8VAYS,{'Size:': ' 5-Inch'},Star,These rings are sturdy and great for crafts!,Five Stars,1510099200,,
152012,4,True,"08 30, 2016",A3O44KT3HDB7ZB,B001KZH232,{'Size:': ' 50-Yard by 36-Inch Wide'},mg,Serves its purpose. Bought it for tracing out ...,Four Stars,1472515200,,


## IMPORTING LIBRARIES

In [12]:
#Importing preprocessing module from sklearn
from sklearn import preprocessing
import nltk

## DOWNLOADING OPINION LEXICON

In [13]:
#downloads the Opinion Lexicon dataset from NLTK
nltk.download('opinion_lexicon')


[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


True

## IMPORTING COMPONENTS FROM NLTK

In [14]:
#opinion_lexicon provides access to positive and negative words
from nltk.corpus import opinion_lexicon
from nltk.tokenize import word_tokenize #word_tokenize is used for tokenization.



## PRINTING INFORMATION

In [15]:
# prints total number of words in the opinion lexicon
print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',
      opinion_lexicon.positive()[:10])     #display examples of positive and negative words from the opinion lexicon
print('Examples of negative words in opinion lexicon',
      opinion_lexicon.negative()[:10])


Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


## DOWNLOADING  'PUNTK' TOKENIZER'

In [16]:
#'punkt' tokenizer is a pre-trained unsupervised machine learning model for tokenizing text into words
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/likithareddykotla/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## CREATING SENTIMENT SCORING DICTIONARY FOR REVIEW TEXT

In [17]:
# Let's create a dictionary which we can use for scoring our review text
df.rename(columns={"reviewText": "text"}, inplace=True)

## INITIALIZING SCORES

In [18]:
#set the positive and negative scores to be assigned to words in the word_dict dictionary.
pos_score = 1
neg_score = -1


##  CREATING WORD DICTIONARY

In [24]:
#Adding the positive words to the dictionary
word_dict = {}
for word in opinion_lexicon.positive():
        word_dict[word] = pos_score       #iterates through the positive words in the Opinion Lexicon and assigns each word a positive score in the dictionary
      
# Adding the negative words to the dictionary
for word in opinion_lexicon.negative():
        word_dict[word] = neg_score       #iterates throug -ve words and assigns each a -ve score in dictionary


## BING_LIU_SCORE 

The bing_liu_score function is a sentiment scoring function that assigns a sentiment score to a piece of text based on a predefined dictionary (word_dict). 

In [25]:
# Sentiment analysis function using a simple bag-of-words approach
def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())  #word_tokenize splits the text into individual words.
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score  


## BING_LU_SCORE() FOR TEXT DF

In [21]:
# Replace missing values in the 'text' column with the string 'no review'
df['text'].fillna('no review', inplace=True)
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)


## TOP10 ROWS

In [22]:
#Previewing the sentiment analysis results for the top 10 rows
df[['overall',"text", 'Bing_Liu_Score']].head(10)


Unnamed: 0,overall,text,Bing_Liu_Score
0,4,Contains some interesting stitches.,1
1,5,I'm a fairly experienced knitter of the one-co...,22
2,4,Great book but the index is terrible. Had to w...,0
3,5,I purchased the Kindle edition which is incred...,4
4,5,Very well laid out and very easy to read.\n\nT...,5
5,5,"Beginning her career as a freelance knitter, M...",15
6,5,This is a terrific stitch handbook (and I have...,9
7,4,The book needs to be coil bound. The content i...,1
8,5,I really am enjoying this book! I like the siz...,12
9,5,Just received this book and looked over it cov...,6


## AVERAGE BING LU SENTIMENT SCORES GROUPED BY OVERALL RATINGS

In [23]:
#Calculating the mean Bing Liu sentiment score
df.groupby('overall').agg({'Bing_Liu_Score':'mean'})


Unnamed: 0_level_0,Bing_Liu_Score
overall,Unnamed: 1_level_1
1,-0.255049
2,0.566098
3,1.158796
4,2.028146
5,2.130005
