# NLTK Vader model for sentiment analysis of Amazon customer dataset

In [17]:
import nltk
import pandas as pd

from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# download all the nltk corpora only once
nltk.download('all')

# load the amazon review dataset
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/amazon.csv')
df

Unnamed: 0,reviewText,Positive
0,This is a one of the best apps acording to a b...,1
1,This is a pretty good version of the game for ...,1
2,this is a really cool game. there are a bunch ...,1
3,"This is a silly game and can be frustrating, b...",1
4,This is a terrific game on any pad. Hrs of fun...,1
...,...,...
19995,this app is fricken stupid.it froze on the kin...,0
19996,Please add me!!!!! I need neighbors! Ginger101...,1
19997,love it! this game. is awesome. wish it had m...,1
19998,I love love love this app on my side of fashio...,1


Most  common methods for sentiment analysis:
* lexicon-based - using a set  of predefined rules
* ML - training a model to identify the sentiment of a pice of text based on a set of labeled  training data
* Pre-trained transformer-based deep learning - BERT, GPT-4

## Pre-processing Text

Pre-processing first cleans raw text and then transforms it with 

* tokenization - braking  down text into individual tokens
* stop word removal - nltk has a built-in list of stop words to  be used  as filter
* stemming - reducing  words to their root forms
* lemmatization - reducing words to root form based on their part of speech

![Preprocessing Text](preprocessing_text.png)

In [18]:
def preprocess_text(text):
    """ A function that tokenizes the input text,  rremove step wotds and lemmatize the tokens"""
    
    # tokenize the text
    tokens = word_tokenize(text.lower())
    
    # remove stop words
    stop_words = stopwords.words('english')
    filtered_tokens = [token for token in tokens if token not in stop_words]
    
    # lemmatize the tokens
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]

    # join tokens back into a string
    preprocessed_text = ' '.join(lemmatized_tokens)
    
    return preprocessed_text

In [19]:
# preprocess the text
df['preprocessText'] = df['reviewText'].apply(preprocess_text)
df

Unnamed: 0,reviewText,Positive,preprocessText
0,This is a one of the best apps acording to a b...,1,one best apps acording bunch people agree bomb...
1,This is a pretty good version of the game for ...,1,pretty good version game free . lot different ...
2,this is a really cool game. there are a bunch ...,1,really cool game . bunch level find golden egg...
3,"This is a silly game and can be frustrating, b...",1,"silly game frustrating , lot fun definitely re..."
4,This is a terrific game on any pad. Hrs of fun...,1,terrific game pad . hr fun . grandkids love . ...
...,...,...,...
19995,this app is fricken stupid.it froze on the kin...,0,app fricken stupid.it froze kindle wont allow ...
19996,Please add me!!!!! I need neighbors! Ginger101...,1,please add ! ! ! ! ! need neighbor ! ginger101...
19997,love it! this game. is awesome. wish it had m...,1,love ! game . awesome . wish free stuff house ...
19998,I love love love this app on my side of fashio...,1,love love love app side fashion story fight wo...


## Lexicon-based Bag of Words Model

 In a Bag of Words model each document is represented as a "bag of words" - structure of words and their meaning in the context are removed, where each word is a separate numerical feature with a value the number of times the word appears in the text. Each word has a subjectivity score which is looked up in a sentiment lexicon. A sentence is tokenized and each token is matched with the available words in the model to find out its context and sentiment. A combining function such as sum or average is taken to make the final prediction regarding the total text component.

 VADER - Valence Aware Dictionary and sEntiment Reasoner, is a lexicon and rule-based prrretrained sentiment analysis tool that is specifically attuned to sentiments expressed in social media -  shortsentences with some slang and abbreviations

In [21]:
# initialie SentimentIntensityAnalyzer object from nltk.sentiment.vader library
sia = SentimentIntensityAnalyzer()

# get back a dictionary of different scores
sia.polarity_scores('I love this movie')

{'neg': 0.0, 'neu': 0.323, 'pos': 0.677, 'compound': 0.6369}

In [20]:
def get_sentiment(text):
    """ A function that returns the sentiment score of the input text:
        if  a score is greater than 0 and returns a sentiment score of 1,
        and a 0 otherwise. """
        
    # get score with polarity_scores() method od  the SentimentIntensityAnalyzer object
    # to obtain  a dictionary with the negative, neutral, and positive sentiment#
    score = sia.polarity_scores(text)['compound']
        
    # determine sentiment 
    if score > 0:
        sentiment = 1
    else:
        sentiment = 0
        
    return sentiment


# apply get_sentiment() function to the reviewText column toobtain a new  column called sentiment
df['sentiment'] = df['preprocessText'].apply(get_sentiment)
df

Unnamed: 0,reviewText,Positive,preprocessText,sentiment
0,This is a one of the best apps acording to a b...,1,one best apps acording bunch people agree bomb...,1
1,This is a pretty good version of the game for ...,1,pretty good version game free . lot different ...,1
2,this is a really cool game. there are a bunch ...,1,really cool game . bunch level find golden egg...,1
3,"This is a silly game and can be frustrating, b...",1,"silly game frustrating , lot fun definitely re...",1
4,This is a terrific game on any pad. Hrs of fun...,1,terrific game pad . hr fun . grandkids love . ...,1
...,...,...,...,...
19995,this app is fricken stupid.it froze on the kin...,0,app fricken stupid.it froze kindle wont allow ...,0
19996,Please add me!!!!! I need neighbors! Ginger101...,1,please add ! ! ! ! ! need neighbor ! ginger101...,1
19997,love it! this game. is awesome. wish it had m...,1,love ! game . awesome . wish free stuff house ...,1
19998,I love love love this app on my side of fashio...,1,love love love app side fashion story fight wo...,1
