# ---- NLP SENTIMENT ANALYSIS ----

In [1]:
import pandas as pd
import nltk                                                         # ---> Libraries to be used 
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\jairo\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [2]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings('ignore')

In [3]:
df_csvreviews = pd.read_csv('./Datasets/processing/reviews.csv')    # ---> Reading of CSV file
pd.set_option('display.max_colwidth', None)                         # ---> Option to allow maximun content visualization
df_csvreviews.head()

Unnamed: 0,item_id,recommend,review,user_id,posted_year
0,1250,True,"Simple yet with great replayability. In my opinion does ""zombie"" hordes and team work better than left 4 dead plus has a global leveling system. Alot of down to earth ""zombie"" splattering fun for the whole family. Amazed this sort of FPS is so rare.",76561197970982479,2011
1,22200,True,It's unique and worth a playthrough.,76561197970982479,2011
2,43110,True,Great atmosphere. The gunplay can be a bit chunky at times but at the end of the day this game is definitely worth it and I hope they do a sequel...so buy the game so I get a sequel!,76561197970982479,2011
3,251610,True,"I know what you think when you see this title ""Barbie Dreamhouse Party"" but do not be intimidated by it's title, this is easily one of my GOTYs. You don't get any of that cliche game mechanics that all the latest games have, this is simply good core gameplay. Yes, you can't 360 noscope your friends, but what you can do is show them up with your bad ♥♥♥ dance moves and put them to shame as you show them what true fashion and color combinations are.I know this game says for kids but, this is easily for any age range and any age will have a blast playing this.8/8",js41637,2014
4,227300,True,"For a simple (it's actually not all that simple but it can be!) truck driving Simulator, it is quite a fun and relaxing game. Playing on simple (or easy?) its just the basic WASD keys for driving but (if you want) the game can be much harder and realistic with having to manually change gears, much harder turning, etc. And reversing in this game is a ♥♥♥♥♥, as I imagine it would be with an actual truck. Luckily, you don't have to reverse park it but you get extra points if you do cause it is bloody hard. But this is suprisingly a nice truck driving game and I had a bit of fun with it.",js41637,2013


In [4]:
df_csvreviews.info()                                                

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48498 entries, 0 to 48497
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   item_id      48498 non-null  int64 
 1   recommend    48498 non-null  bool  
 2   review       48463 non-null  object
 3   user_id      48498 non-null  object
 4   posted_year  48498 non-null  int64 
dtypes: bool(1), int64(2), object(2)
memory usage: 1.5+ MB


In [5]:
print((df_csvreviews['review'] == '').sum())                        # ---> Number of empty fields in 'review' column

0


In [6]:
print(df_csvreviews['review'].isna().sum())                         # ---> Number of NaN fields in 'review' column

35


In [7]:
nan_review = df_csvreviews[df_csvreviews['review'].isna() | (df_csvreviews['review'] == '')]  # ---> List of NaN fields in 'review' column
nan_review.head(38)

Unnamed: 0,item_id,recommend,review,user_id,posted_year
482,570,True,,76561198070263209,2013
716,215530,True,,Azrafael,2013
3577,550,True,,76561198093337643,2014
7177,233840,True,,BomberThink,2013
7178,211820,True,,BomberThink,2014
12271,218620,True,,terencemok,2014
17009,211820,True,,shez13,2014
17010,227320,True,,shez13,2014
17667,208090,True,,rpsntc,2014
17695,620,True,,damo4lyf,2014


##  I. Sentiment Analysis Conditions

The 'sentiment_analysis' column must be created applying sentiment analysis with NLP to the 'review' column, according to the following scale:
-> '0' if it is bad.
-> '1' if it is neutral.
-> '2' if it is positive.
In case the written review field is empty, the value to be recorded in the field is 1.

In [8]:
df_csvreviews['review'] = df_csvreviews['review'].astype(str)       # ---> Conversion of 'review' column to 'str' type

In [9]:
def sentiment_analysis(paragraph):                                  # ---> Creation of function 'sentiment_analysis'
    '''
    This function works with the NLTK library to perform sentiment analysis of the texts contained in the 'review' column.
    The word 'paragraph' represents the input parameters (type str) of the function.
    The returns of the function are categorized numerically (type int) according to the analysis done on each text:
    -> 0: Negative
    -> 1: Neutral
    -> 2: Positive
    '''
    reader = SentimentIntensityAnalyzer()
    scores = reader.polarity_scores(paragraph)

    if paragraph is None or paragraph.strip() == '':
        return 1
   
    if scores['compound'] >= 0.05:
        return 2
    elif scores['compound'] <= -0.05:
        return 0
    else:
        return 1    

`About NLK and Vader's sentiment analyzer`

Ease of Implementation: 
NLTK is a popular natural language processing library in Python, and the Vader sentiment analyzer is easy to implement. It requires less configuration compared to some more advanced approaches and is suitable for practical applications.

Computational Efficiency: 
Vader's sentiment analyzer is computationally efficient, meaning that it can handle large data sets without excessive computational cost. This is crucial for applications in enterprise environments where processing efficiency is essential.

Composite Polarity: 
The composite score provided by Vader's sentiment analyzer offers a unique and easy-to-interpret representation of review polarity. This composite score is used to assign numerical values to sentiment categories according to the proposed scale.

Acceptable Levels of Accuracy: 
Although the Vader sentiment analyzer is not perfect and may have limitations in certain contexts, it is known to perform acceptably in a variety of practical situations. Accuracy may vary by text type and domain, but in general, it provides reasonable results for sentiment analysis tasks.

##  II. Applying Sentiment Analysis function to 'review' column

In [10]:
df_csvreviews['sentiment'] = df_csvreviews['review'].apply(sentiment_analysis)

In [11]:
df_csvreviews.head()

Unnamed: 0,item_id,recommend,review,user_id,posted_year,sentiment
0,1250,True,"Simple yet with great replayability. In my opinion does ""zombie"" hordes and team work better than left 4 dead plus has a global leveling system. Alot of down to earth ""zombie"" splattering fun for the whole family. Amazed this sort of FPS is so rare.",76561197970982479,2011,2
1,22200,True,It's unique and worth a playthrough.,76561197970982479,2011,2
2,43110,True,Great atmosphere. The gunplay can be a bit chunky at times but at the end of the day this game is definitely worth it and I hope they do a sequel...so buy the game so I get a sequel!,76561197970982479,2011,2
3,251610,True,"I know what you think when you see this title ""Barbie Dreamhouse Party"" but do not be intimidated by it's title, this is easily one of my GOTYs. You don't get any of that cliche game mechanics that all the latest games have, this is simply good core gameplay. Yes, you can't 360 noscope your friends, but what you can do is show them up with your bad ♥♥♥ dance moves and put them to shame as you show them what true fashion and color combinations are.I know this game says for kids but, this is easily for any age range and any age will have a blast playing this.8/8",js41637,2014,2
4,227300,True,"For a simple (it's actually not all that simple but it can be!) truck driving Simulator, it is quite a fun and relaxing game. Playing on simple (or easy?) its just the basic WASD keys for driving but (if you want) the game can be much harder and realistic with having to manually change gears, much harder turning, etc. And reversing in this game is a ♥♥♥♥♥, as I imagine it would be with an actual truck. Luckily, you don't have to reverse park it but you get extra points if you do cause it is bloody hard. But this is suprisingly a nice truck driving game and I had a bit of fun with it.",js41637,2013,2


In [12]:
print(df_csvreviews['review'].isna().sum()) 

0


##  III. Removal of the 'review' column and uploading of the new CSV file 

In [13]:
df_csvreviews_sa = df_csvreviews.drop(columns = 'review')
df_csvreviews_sa.head()

Unnamed: 0,item_id,recommend,user_id,posted_year,sentiment
0,1250,True,76561197970982479,2011,2
1,22200,True,76561197970982479,2011,2
2,43110,True,76561197970982479,2011,2
3,251610,True,js41637,2014,2
4,227300,True,js41637,2013,2


In [14]:
df_csvreviews_sa.shape

(48498, 5)

In [15]:
df_csvreviews_sa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48498 entries, 0 to 48497
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   item_id      48498 non-null  int64 
 1   recommend    48498 non-null  bool  
 2   user_id      48498 non-null  object
 3   posted_year  48498 non-null  int64 
 4   sentiment    48498 non-null  int64 
dtypes: bool(1), int64(3), object(1)
memory usage: 1.5+ MB


In [16]:
df_csvreviews_sa1 = df_csvreviews_sa.drop('user_id', axis=1)

In [17]:
df_csvreviews_sa1.to_csv('./Datasets/processing/reviews_sa.csv', encoding='utf-8', index=False)