## CLASSWORK 4
## SENTIMENT ANALYSIS

Sentiment analysis gauges emotions from text, classifying as positive, negative, or neutral, aiding in understanding user feedback or market sentiment. By employing machine learning algorithms, it assigns sentiment scores to text based on the language's tone and context.

### IMPORT PANDAS AND READ .JSON FILE

In [6]:
# Import pandas package and reading .json file

import pandas as pd 
df = pd.read_json( 'Arts_Crafts_and_Sewing_5.json',lines=True)
df.sample(10)

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,vote,image
340047,5,True,"02 15, 2016",AY50B98IYQDZS,B00KN9Q3W2,,Ron,Wow! a great product.,a great product.,1455494400,,
397686,5,True,"01 9, 2017",A2HUQ6PDINCHJQ,B0197F6XX6,{'Color:': ' Decadent Pies'},Shelby P,These paints are such amazing quality! I absol...,Must Buy!,1483920000,,
339701,5,True,"09 17, 2016",A1WV8CKF9ZAIHK,B00KKWCRB8,,Kindle Customer,Very pretty,Five Stars,1474070400,,
347332,5,True,"03 13, 2015",A3N0IVMFP86TA7,B00MMSLEIA,,Linda J. Vincent,I love crafting with the Hampton Art Impressio...,Great stamps for great fun cards.,1426204800,,
411101,4,True,"02 9, 2017",AZ2QB6OHV12HP,B01EO701N0,{'Color:': ' 8x8mm/Ball Flower 02'},Amazon Customer,good,Four Stars,1486598400,,
89624,5,True,"04 23, 2012",A1HYD5ONR0BCTZ,B000WWML6W,{'Color:': ' Cloud White'},Nicole Michelle Stagg,It's perfect for light and dark colors. Even o...,Very White,1335139200,3.0,
481649,5,True,"01 30, 2017",A1HGYLF2PVMBGR,B00S16WTLS,"{'Size:': ' 500 yd/475m', 'Color:': ' Iced Cof...",Cheryl,Th I is the best thread,Threa,1485734400,,
230580,5,True,"01 2, 2014",A3684FFQPI7LRE,B005D7WTK4,,antgranma,"I got hooked on beading, and this Caddy is so ...",Excellent,1388620800,,
140905,5,False,"10 3, 2014",A2MDKZSZ7RCD0R,B001CE38Z2,,Miss Melinda,"Miracle worker, Is what I call it. Anyone who ...",Anyone who gave it bad marks Don't know what t...,1412294400,,
90429,5,True,"07 23, 2014",A2RBZ2EC25FFA6,B000WYZV6W,{'Size:': ' Size 11'},Happy in the OC,I have knitted quite a few blankets and sweate...,Loyal to Clover Bamboo Needles,1406073600,11.0,


### NLTK AND OPINION LEXICON 

The Natural Language Toolkit is used to access the Opinion Lexicon, which is a lexicon of positive and negative opinion words or sentiment words. 

In [None]:
# Importing ntlk package 

from sklearn import preprocessing
import nltk

### DOWNLOADING OPINION LEXICON

In [None]:
# Downloading opinion lexicon

nltk.download('opinion_lexicon')

### PRINTING INFORMATION

In [7]:
# Importing components and printing

from nltk.corpus import opinion_lexicon                     # using opinion lexicon dataset from nltk.corpus
from nltk.tokenize import word_tokenize

print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',      # printing 10 positive opinion lexicons
      opinion_lexicon.positive()[:10])
print('Examples of negative words in opinion lexicon',      # printing 10 negative opinion lexicons
      opinion_lexicon.negative()[:10])


Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     /Users/sid/nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


### RENAMING DATAFRAME

In [None]:
# RENAME DATAFRAME

nltk.download('punkt')
df.rename(columns={"reviewText": "text"}, inplace=True)

### INTIALIZING SCORES

In [None]:
# Initializing Scores

pos_score = 1
neg_score = -1

### DICTIONARY FOR SCORING

The code creates a sentiment scoring dictionary by leveraging NLTK's opinion lexicon, assigning positive and negative scores to words for analyzing review texts.

In [8]:
#  Let's create a dictionary which we can use it for scoring our review text

word_dict = {}
 
# Adding the positive words to the dictionary

for word in opinion_lexicon.positive():
        word_dict[word] = pos_score
      
# Adding the negative words to the dictionary

for word in opinion_lexicon.negative():
        word_dict[word] = neg_score


[nltk_data] Downloading package punkt to /Users/sid/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### bing_liu_score FUNCTION

To group a dataframe df by unique values in the 'overall' column and calculate the mean of the 'Bing_Liu_Score' column for each group to give avg sentiment score.

In [9]:
# bing_liu_score function 

def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score  

### REPLACING NULL VALUES

In [10]:
# filling empty claues with 'no review'

df['text'].fillna('no review', inplace=True)
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)

### HEAD METHOD


In [11]:
# Using head() on dataframe

df[['overall',"text", 'Bing_Liu_Score']].head(10)

Unnamed: 0,overall,text,Bing_Liu_Score
0,4,Contains some interesting stitches.,1
1,5,I'm a fairly experienced knitter of the one-co...,22
2,4,Great book but the index is terrible. Had to w...,0
3,5,I purchased the Kindle edition which is incred...,4
4,5,Very well laid out and very easy to read.\n\nT...,5
5,5,"Beginning her career as a freelance knitter, M...",15
6,5,This is a terrific stitch handbook (and I have...,9
7,4,The book needs to be coil bound. The content i...,1
8,5,I really am enjoying this book! I like the siz...,12
9,5,Just received this book and looked over it cov...,6


### GROUPBY OVERALL

In [12]:
# grouping by unique values 

df.groupby('overall').agg({'Bing_Liu_Score':'mean'})

Unnamed: 0_level_0,Bing_Liu_Score
overall,Unnamed: 1_level_1
1,-0.255049
2,0.566098
3,1.158796
4,2.028146
5,2.130005
