# Sentiment Analysis

> Perform sentiment analyses on the free text responses that are scraped from Glassdoor

In this notebook, I will use 3 sentiment analysis techniques to provide sentiment scores to the pros, cons, advice to management, and combination free text response sections.

https://www.analyticsvidhya.com/blog/2021/06/rule-based-sentiment-analysis-in-python/

In [1]:
import warnings
warnings.simplefilter('ignore')

import pandas as pd
import numpy as np
import statistics

# sentiment and word processing
import re
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('stopwords')
from nltk.corpus import stopwords
nltk.download('wordnet')
from nltk.corpus import wordnet
import nltk
nltk.download('averaged_perceptron_tagger')
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
nltk.download('sentiwordnet')
from nltk.corpus import sentiwordnet as swn
from nltk.probability import FreqDist

# plotting
import seaborn as sns
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', None)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\19012\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\19012\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\19012\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\19012\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     C:\Users\19012\AppData\Roaming\nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


In [2]:
# load data from the clean scrape notebook
data = pd.read_csv('cleaned_reviews.csv')
print(data.shape)
data.head()

(668597, 23)


Unnamed: 0,company,rating,sub_work_life_balance,sub_culture_values,sub_diversity_inclusion,sub_career_opportunities,sub_compensation_benefits,sub_senior_management,recommend,ceo_approval,outlook,pros,cons,advice_management,date,title,city,state,years,current_employee,free_text_response,year,month
0,AMERICAN AIRLINES GROUP INC,5.0,,,,,,,,,,flexiblitiy is great amongest staff,the work load is overwhelming at times,,2022-03-15,Customer Relations,,,,1.0,flexiblitiy is great amongest staff the work l...,2022,3.0
1,AMERICAN AIRLINES GROUP INC,3.0,4.0,3.0,5.0,3.0,3.0,3.0,yes,no,neutral,"Flexible schedules, great flight benefits, gre...","Constantly understaffed in all areas, Poor man...",Hire more people so we aren't constantly delay...,2022-03-15,American Airlines Flight Attendant,New York,NY,5.0,1.0,"Flexible schedules, great flight benefits, gre...",2022,3.0
2,AMERICAN AIRLINES GROUP INC,5.0,5.0,5.0,5.0,5.0,5.0,5.0,yes,yes,yes,Love my job . Love the freedom not being micro...,Reserve reserve reserve not being able to hold...,,2022-03-15,Flight Attendant,,,,1.0,Love my job . Love the freedom not being micro...,2022,3.0
3,AMERICAN AIRLINES GROUP INC,1.0,,,,,,,,,,Quit after one month with the company.,"Working conditions, customer service, Terrance",Stop being a company full of uneducated low-li...,2022-03-15,Pilot,,,,0.0,Quit after one month with the company. Working...,2022,3.0
4,AMERICAN AIRLINES GROUP INC,3.0,4.0,2.0,5.0,3.0,3.0,3.0,,yes,yes,Resources love to stay with this company for d...,Retention of resources Didn’t seem important I...,Offer retention bonuses for those loyal people...,2022-03-14,Senior Project Manager,Fort Worth,TX,10.0,0.0,Resources love to stay with this company for d...,2022,3.0


In [3]:
# find the number of NA values in our subratings to determine if we need to handle these or just exclude those observations
print('Number of rows:', data.shape[0])
# work/life
print('NAs in work/life:', data.sub_work_life_balance.isna().sum()/data.shape[0])

# culture
print('NAs in culture:', data.sub_culture_values.isna().sum()/data.shape[0])

# diversity inclusion
print('NAs in diversity inclusion:', data.sub_diversity_inclusion.isna().sum()/data.shape[0])

# career opportunities
print('NAs in career opportunities:', data.sub_career_opportunities.isna().sum()/data.shape[0])

# comp benefits
print('NAs in comp benefits:', data.sub_compensation_benefits.isna().sum()/data.shape[0])

# senior management
print('NAs in senior management:', data.sub_senior_management.isna().sum()/data.shape[0])

Number of rows: 668597
NAs in work/life: 0.21570692061136978
NAs in culture: 0.2241469824124248
NAs in diversity inclusion: 0.3031706693269638
NAs in career opportunities: 0.22542877099358807
NAs in comp benefits: 0.2320994560250794
NAs in senior management: 0.5148183434864351


In [4]:
# make a bar graph to represent this

In [5]:
# Define a function to clean the text
def clean(text):
    '''
    removes all special characters and repplaces with a space
    input: a string column
    output: string column with special characters removed
    '''
    # Removes all special characters and numericals, replace with a space
    text = re.sub(r'[^A-Za-z]+', ' ', str(text))
    return text

# Cleaning the text in the review column
data['pros'] = data['pros'].apply(clean)
data['cons'] = data['cons'].apply(clean)
data['advice_management'] = data['advice_management'].apply(clean)
data['free_text_response'] = data['pros'] + ' ' + data['cons'] + ' ' + data['advice_management']#data['free_text_response'].apply(clean)

data.head()

Unnamed: 0,company,rating,sub_work_life_balance,sub_culture_values,sub_diversity_inclusion,sub_career_opportunities,sub_compensation_benefits,sub_senior_management,recommend,ceo_approval,outlook,pros,cons,advice_management,date,title,city,state,years,current_employee,free_text_response,year,month
0,AMERICAN AIRLINES GROUP INC,5.0,,,,,,,,,,flexiblitiy is great amongest staff,the work load is overwhelming at times,,2022-03-15,Customer Relations,,,,1.0,flexiblitiy is great amongest staff the work l...,2022,3.0
1,AMERICAN AIRLINES GROUP INC,3.0,4.0,3.0,5.0,3.0,3.0,3.0,yes,no,neutral,Flexible schedules great flight benefits great...,Constantly understaffed in all areas Poor mana...,Hire more people so we aren t constantly delay...,2022-03-15,American Airlines Flight Attendant,New York,NY,5.0,1.0,Flexible schedules great flight benefits great...,2022,3.0
2,AMERICAN AIRLINES GROUP INC,5.0,5.0,5.0,5.0,5.0,5.0,5.0,yes,yes,yes,Love my job Love the freedom not being microma...,Reserve reserve reserve not being able to hold...,,2022-03-15,Flight Attendant,,,,1.0,Love my job Love the freedom not being microma...,2022,3.0
3,AMERICAN AIRLINES GROUP INC,1.0,,,,,,,,,,Quit after one month with the company,Working conditions customer service Terrance,Stop being a company full of uneducated low li...,2022-03-15,Pilot,,,,0.0,Quit after one month with the company Working...,2022,3.0
4,AMERICAN AIRLINES GROUP INC,3.0,4.0,2.0,5.0,3.0,3.0,3.0,,yes,yes,Resources love to stay with this company for d...,Retention of resources Didn t seem important I...,Offer retention bonuses for those loyal people...,2022-03-14,Senior Project Manager,Fort Worth,TX,10.0,0.0,Resources love to stay with this company for d...,2022,3.0


In [6]:
# POS tagger dictionary
pos_dict = {'J':wordnet.ADJ, 'V':wordnet.VERB, 'N':wordnet.NOUN, 'R':wordnet.ADV}
def token_stop_pos(text):
    '''
    assigns each word in a column to a part of speech
    input: column
    output: column with parts of speech tagged
    '''
    tags = pos_tag(word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in set(stopwords.words('english')):
            newlist.append(tuple([word, pos_dict.get(tag[0])]))
    return newlist

# remove stop words
def remove_stop(text):
    '''
    removes stop words from a column
    input: column
    output: coulumn without stop words
    '''
    words = word_tokenize(text)
    newlist = []
    words_list = ' '.join([word for word in words if not word.lower() in stopwords.words('english')])
    return words_list

data['pros_pos_tagged'] = data['pros'].apply(token_stop_pos)
data['cons_pos_tagged'] = data['cons'].apply(token_stop_pos)
data['advice_management_pos_tagged'] = data['advice_management'].apply(token_stop_pos)
data['free_text_pos_tagged'] = data['free_text_response'].apply(token_stop_pos)

data['pros_remove_stop'] = data['pros'].apply(remove_stop)
data['cons_remove_stop'] = data['cons'].apply(remove_stop)
data['advice_management_remove_stop'] = data['advice_management'].apply(remove_stop)
data['free_text_remove_stop'] = data['pros_remove_stop'] + ' ' + data['cons_remove_stop'] + ' ' + data['advice_management_remove_stop']#data['free_text_response'].apply(remove_stop)

data.head()

Unnamed: 0,company,rating,sub_work_life_balance,sub_culture_values,sub_diversity_inclusion,sub_career_opportunities,sub_compensation_benefits,sub_senior_management,recommend,ceo_approval,outlook,pros,cons,advice_management,date,title,city,state,years,current_employee,free_text_response,year,month,pros_pos_tagged,cons_pos_tagged,advice_management_pos_tagged,free_text_pos_tagged,pros_remove_stop,cons_remove_stop,advice_management_remove_stop,free_text_remove_stop
0,AMERICAN AIRLINES GROUP INC,5.0,,,,,,,,,,flexiblitiy is great amongest staff,the work load is overwhelming at times,,2022-03-15,Customer Relations,,,,1.0,flexiblitiy is great amongest staff the work l...,2022,3.0,"[(flexiblitiy, n), (great, a), (amongest, a), ...","[(work, n), (load, n), (overwhelming, v), (tim...","[(nan, n)]","[(flexiblitiy, n), (great, a), (amongest, a), ...",flexiblitiy great amongest staff,work load overwhelming times,,flexiblitiy great amongest staff work load ove...
1,AMERICAN AIRLINES GROUP INC,3.0,4.0,3.0,5.0,3.0,3.0,3.0,yes,no,neutral,Flexible schedules great flight benefits great...,Constantly understaffed in all areas Poor mana...,Hire more people so we aren t constantly delay...,2022-03-15,American Airlines Flight Attendant,New York,NY,5.0,1.0,Flexible schedules great flight benefits great...,2022,3.0,"[(Flexible, a), (schedules, n), (great, a), (f...","[(Constantly, r), (understaffed, v), (areas, n...","[(Hire, n), (people, n), (constantly, r), (del...","[(Flexible, a), (schedules, n), (great, a), (f...",Flexible schedules great flight benefits great...,Constantly understaffed areas Poor management ...,Hire people constantly delayed cancelled,Flexible schedules great flight benefits great...
2,AMERICAN AIRLINES GROUP INC,5.0,5.0,5.0,5.0,5.0,5.0,5.0,yes,yes,yes,Love my job Love the freedom not being microma...,Reserve reserve reserve not being able to hold...,,2022-03-15,Flight Attendant,,,,1.0,Love my job Love the freedom not being microma...,2022,3.0,"[(Love, v), (job, n), (Love, n), (freedom, n),...","[(Reserve, n), (reserve, v), (reserve, n), (ab...","[(nan, n)]","[(Love, v), (job, n), (Love, n), (freedom, n),...",Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve rese...
3,AMERICAN AIRLINES GROUP INC,1.0,,,,,,,,,,Quit after one month with the company,Working conditions customer service Terrance,Stop being a company full of uneducated low li...,2022-03-15,Pilot,,,,0.0,Quit after one month with the company Working...,2022,3.0,"[(Quit, n), (one, None), (month, n), (company,...","[(Working, v), (conditions, n), (customer, n),...","[(Stop, n), (company, n), (full, a), (uneducat...","[(Quit, n), (one, None), (month, n), (company,...",Quit one month company,Working conditions customer service Terrance,Stop company full uneducated low lives,Quit one month company Working conditions cust...
4,AMERICAN AIRLINES GROUP INC,3.0,4.0,2.0,5.0,3.0,3.0,3.0,,yes,yes,Resources love to stay with this company for d...,Retention of resources Didn t seem important I...,Offer retention bonuses for those loyal people...,2022-03-14,Senior Project Manager,Fort Worth,TX,10.0,0.0,Resources love to stay with this company for d...,2022,3.0,"[(Resources, n), (love, v), (stay, v), (compan...","[(Retention, n), (resources, n), (seem, v), (i...","[(Offer, n), (retention, n), (bonuses, n), (lo...","[(Resources, n), (love, v), (stay, v), (compan...",Resources love stay company decades,Retention resources seem important saw good re...,Offer retention bonuses loyal people stay,Resources love stay company decades Retention ...


In [7]:
wordnet_lemmatizer = WordNetLemmatizer()
def lemmatize(pos_data):
    lemma_rew = " "
    for word, pos in pos_data:
        if not pos:
            lemma = word
            lemma_rew = lemma_rew + " " + lemma
        else:
            lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
            lemma_rew = lemma_rew + " " + lemma
    return lemma_rew

data['pros_Lemma'] = data['pros_pos_tagged'].apply(lemmatize)
data['cons_Lemma'] = data['cons_pos_tagged'].apply(lemmatize)
data['advice_management_Lemma'] = data['advice_management_pos_tagged'].apply(lemmatize)
data['free_text_Lemma'] = data['pros_Lemma'] + ' ' + data['cons_Lemma'] + ' ' + data['advice_management_Lemma']#data['free_text_pos_tagged'].apply(lemmatize)

data.head()

Unnamed: 0,company,rating,sub_work_life_balance,sub_culture_values,sub_diversity_inclusion,sub_career_opportunities,sub_compensation_benefits,sub_senior_management,recommend,ceo_approval,outlook,pros,cons,advice_management,date,title,city,state,years,current_employee,free_text_response,year,month,pros_pos_tagged,cons_pos_tagged,advice_management_pos_tagged,free_text_pos_tagged,pros_remove_stop,cons_remove_stop,advice_management_remove_stop,free_text_remove_stop,pros_Lemma,cons_Lemma,advice_management_Lemma,free_text_Lemma
0,AMERICAN AIRLINES GROUP INC,5.0,,,,,,,,,,flexiblitiy is great amongest staff,the work load is overwhelming at times,,2022-03-15,Customer Relations,,,,1.0,flexiblitiy is great amongest staff the work l...,2022,3.0,"[(flexiblitiy, n), (great, a), (amongest, a), ...","[(work, n), (load, n), (overwhelming, v), (tim...","[(nan, n)]","[(flexiblitiy, n), (great, a), (amongest, a), ...",flexiblitiy great amongest staff,work load overwhelming times,,flexiblitiy great amongest staff work load ove...,flexiblitiy great amongest staff,work load overwhelm time,,flexiblitiy great amongest staff work load...
1,AMERICAN AIRLINES GROUP INC,3.0,4.0,3.0,5.0,3.0,3.0,3.0,yes,no,neutral,Flexible schedules great flight benefits great...,Constantly understaffed in all areas Poor mana...,Hire more people so we aren t constantly delay...,2022-03-15,American Airlines Flight Attendant,New York,NY,5.0,1.0,Flexible schedules great flight benefits great...,2022,3.0,"[(Flexible, a), (schedules, n), (great, a), (f...","[(Constantly, r), (understaffed, v), (areas, n...","[(Hire, n), (people, n), (constantly, r), (del...","[(Flexible, a), (schedules, n), (great, a), (f...",Flexible schedules great flight benefits great...,Constantly understaffed areas Poor management ...,Hire people constantly delayed cancelled,Flexible schedules great flight benefits great...,Flexible schedule great flight benefit great...,Constantly understaffed area Poor management...,Hire people constantly delay cancel,Flexible schedule great flight benefit great...
2,AMERICAN AIRLINES GROUP INC,5.0,5.0,5.0,5.0,5.0,5.0,5.0,yes,yes,yes,Love my job Love the freedom not being microma...,Reserve reserve reserve not being able to hold...,,2022-03-15,Flight Attendant,,,,1.0,Love my job Love the freedom not being microma...,2022,3.0,"[(Love, v), (job, n), (Love, n), (freedom, n),...","[(Reserve, n), (reserve, v), (reserve, n), (ab...","[(nan, n)]","[(Love, v), (job, n), (Love, n), (freedom, n),...",Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve rese...,Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve ...
3,AMERICAN AIRLINES GROUP INC,1.0,,,,,,,,,,Quit after one month with the company,Working conditions customer service Terrance,Stop being a company full of uneducated low li...,2022-03-15,Pilot,,,,0.0,Quit after one month with the company Working...,2022,3.0,"[(Quit, n), (one, None), (month, n), (company,...","[(Working, v), (conditions, n), (customer, n),...","[(Stop, n), (company, n), (full, a), (uneducat...","[(Quit, n), (one, None), (month, n), (company,...",Quit one month company,Working conditions customer service Terrance,Stop company full uneducated low lives,Quit one month company Working conditions cust...,Quit one month company,Working condition customer service Terrance,Stop company full uneducated low life,Quit one month company Working condition c...
4,AMERICAN AIRLINES GROUP INC,3.0,4.0,2.0,5.0,3.0,3.0,3.0,,yes,yes,Resources love to stay with this company for d...,Retention of resources Didn t seem important I...,Offer retention bonuses for those loyal people...,2022-03-14,Senior Project Manager,Fort Worth,TX,10.0,0.0,Resources love to stay with this company for d...,2022,3.0,"[(Resources, n), (love, v), (stay, v), (compan...","[(Retention, n), (resources, n), (seem, v), (i...","[(Offer, n), (retention, n), (bonuses, n), (lo...","[(Resources, n), (love, v), (stay, v), (compan...",Resources love stay company decades,Retention resources seem important saw good re...,Offer retention bonuses loyal people stay,Resources love stay company decades Retention ...,Resources love stay company decade,Retention resource seem important saw good r...,Offer retention bonus loyal people stay,Resources love stay company decade Retenti...


# Sentiment Analysis using TextBlob

In [8]:
# polarity-positive or negative
# subjectivity-how subjective the review is
# function to calculate subjectivity
# def getSubjectivity(review):
#     return TextBlob(review).sentiment.subjectivity
# function to calculate polarity
def getPolarity(review):
    return TextBlob(review).sentiment.polarity

# function to analyze the reviews
def analysis(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'
    
# fin_data = pd.DataFrame(data[['free_text_response', 'Lemma']])

# data['pros_Subjectivity'] = data['pros_Lemma'].apply(getSubjectivity) 
data['pros_Polarity'] = data['pros_Lemma'].apply(getPolarity) 
data['pros_Analysis'] = data['pros_Polarity'].apply(analysis)

# data['cons_Subjectivity'] = data['cons_Lemma'].apply(getSubjectivity) 
data['cons_Polarity'] = data['cons_Lemma'].apply(getPolarity) 
data['cons_Analysis'] = data['cons_Polarity'].apply(analysis)

# data['advice_Subjectivity'] = data['advice_management_Lemma'].apply(getSubjectivity) 
data['advice_Polarity'] = data['advice_management_Lemma'].apply(getPolarity) 
data['advice_Analysis'] = data['advice_Polarity'].apply(analysis)

# data['free_text_Subjectivity'] = data['free_text_Lemma'].apply(getSubjectivity) 
data['free_text_Polarity'] = data['free_text_Lemma'].apply(getPolarity) 
data['free_text_Analysis'] = data['free_text_Polarity'].apply(analysis)

data.head()

Unnamed: 0,company,rating,sub_work_life_balance,sub_culture_values,sub_diversity_inclusion,sub_career_opportunities,sub_compensation_benefits,sub_senior_management,recommend,ceo_approval,outlook,pros,cons,advice_management,date,title,city,state,years,current_employee,free_text_response,year,month,pros_pos_tagged,cons_pos_tagged,advice_management_pos_tagged,free_text_pos_tagged,pros_remove_stop,cons_remove_stop,advice_management_remove_stop,free_text_remove_stop,pros_Lemma,cons_Lemma,advice_management_Lemma,free_text_Lemma,pros_Polarity,pros_Analysis,cons_Polarity,cons_Analysis,advice_Polarity,advice_Analysis,free_text_Polarity,free_text_Analysis
0,AMERICAN AIRLINES GROUP INC,5.0,,,,,,,,,,flexiblitiy is great amongest staff,the work load is overwhelming at times,,2022-03-15,Customer Relations,,,,1.0,flexiblitiy is great amongest staff the work l...,2022,3.0,"[(flexiblitiy, n), (great, a), (amongest, a), ...","[(work, n), (load, n), (overwhelming, v), (tim...","[(nan, n)]","[(flexiblitiy, n), (great, a), (amongest, a), ...",flexiblitiy great amongest staff,work load overwhelming times,,flexiblitiy great amongest staff work load ove...,flexiblitiy great amongest staff,work load overwhelm time,,flexiblitiy great amongest staff work load...,0.8,Positive,0.0,Neutral,0.0,Neutral,0.8,Positive
1,AMERICAN AIRLINES GROUP INC,3.0,4.0,3.0,5.0,3.0,3.0,3.0,yes,no,neutral,Flexible schedules great flight benefits great...,Constantly understaffed in all areas Poor mana...,Hire more people so we aren t constantly delay...,2022-03-15,American Airlines Flight Attendant,New York,NY,5.0,1.0,Flexible schedules great flight benefits great...,2022,3.0,"[(Flexible, a), (schedules, n), (great, a), (f...","[(Constantly, r), (understaffed, v), (areas, n...","[(Hire, n), (people, n), (constantly, r), (del...","[(Flexible, a), (schedules, n), (great, a), (f...",Flexible schedules great flight benefits great...,Constantly understaffed areas Poor management ...,Hire people constantly delayed cancelled,Flexible schedules great flight benefits great...,Flexible schedule great flight benefit great...,Constantly understaffed area Poor management...,Hire people constantly delay cancel,Flexible schedule great flight benefit great...,0.8,Positive,-0.266667,Negative,0.0,Neutral,0.133333,Positive
2,AMERICAN AIRLINES GROUP INC,5.0,5.0,5.0,5.0,5.0,5.0,5.0,yes,yes,yes,Love my job Love the freedom not being microma...,Reserve reserve reserve not being able to hold...,,2022-03-15,Flight Attendant,,,,1.0,Love my job Love the freedom not being microma...,2022,3.0,"[(Love, v), (job, n), (Love, n), (freedom, n),...","[(Reserve, n), (reserve, v), (reserve, n), (ab...","[(nan, n)]","[(Love, v), (job, n), (Love, n), (freedom, n),...",Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve rese...,Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve ...,0.5,Positive,0.5,Positive,0.0,Neutral,0.5,Positive
3,AMERICAN AIRLINES GROUP INC,1.0,,,,,,,,,,Quit after one month with the company,Working conditions customer service Terrance,Stop being a company full of uneducated low li...,2022-03-15,Pilot,,,,0.0,Quit after one month with the company Working...,2022,3.0,"[(Quit, n), (one, None), (month, n), (company,...","[(Working, v), (conditions, n), (customer, n),...","[(Stop, n), (company, n), (full, a), (uneducat...","[(Quit, n), (one, None), (month, n), (company,...",Quit one month company,Working conditions customer service Terrance,Stop company full uneducated low lives,Quit one month company Working conditions cust...,Quit one month company,Working condition customer service Terrance,Stop company full uneducated low life,Quit one month company Working condition c...,0.0,Neutral,0.0,Neutral,0.175,Positive,0.175,Positive
4,AMERICAN AIRLINES GROUP INC,3.0,4.0,2.0,5.0,3.0,3.0,3.0,,yes,yes,Resources love to stay with this company for d...,Retention of resources Didn t seem important I...,Offer retention bonuses for those loyal people...,2022-03-14,Senior Project Manager,Fort Worth,TX,10.0,0.0,Resources love to stay with this company for d...,2022,3.0,"[(Resources, n), (love, v), (stay, v), (compan...","[(Retention, n), (resources, n), (seem, v), (i...","[(Offer, n), (retention, n), (bonuses, n), (lo...","[(Resources, n), (love, v), (stay, v), (compan...",Resources love stay company decades,Retention resources seem important saw good re...,Offer retention bonuses loyal people stay,Resources love stay company decades Retention ...,Resources love stay company decade,Retention resource seem important saw good r...,Offer retention bonus loyal people stay,Resources love stay company decade Retenti...,0.5,Positive,0.55,Positive,0.333333,Positive,0.483333,Positive


Something to think about: is a review that receives a Polarity score of -0.001 that different than a Polarity of 0. One receives a 'negative' while the other receives' neutral'. Better yet, how different are two reviews that receive -0.001 and 0.001 respectively. Probably similar in tone, yet one is marked positive, the other negative. This would be fun to play around with the thresholds.

# Sentiment Analysis using Vader

Changes the threshold for sentiment.

In [9]:
# should be positive > 0.5 is positive, negative < 0.5 is negative, and everything else is neutral
analyzer = SentimentIntensityAnalyzer()
# function to calculate vader sentiment
def vadersentimentanalysis(review):
    vs = analyzer.polarity_scores(review)
    return vs['compound']

data['pros_vader_sentiment'] = data['pros_Lemma'].apply(vadersentimentanalysis)
data['cons_vader_sentiment'] = data['cons_Lemma'].apply(vadersentimentanalysis)
data['advice_management_vader_sentiment'] = data['advice_management_Lemma'].apply(vadersentimentanalysis)
data['free_text_vader_sentiment'] = data['free_text_Lemma'].apply(vadersentimentanalysis)

# function to analyse
def vader_analysis(compound):
    if compound > 0.5:
        return 'Positive'
    elif compound < -0.5:
        return 'Negative'
    else:
        return 'Neutral'
    
data['pros_vader_analysis'] = data['pros_vader_sentiment'].apply(vader_analysis)
data['cons_vader_analysis'] = data['cons_vader_sentiment'].apply(vader_analysis)
data['advice_management_vader_analysis'] = data['advice_management_vader_sentiment'].apply(vader_analysis)
data['free_text_vader_analysis'] = data['free_text_vader_sentiment'].apply(vader_analysis)

data.head()

Unnamed: 0,company,rating,sub_work_life_balance,sub_culture_values,sub_diversity_inclusion,sub_career_opportunities,sub_compensation_benefits,sub_senior_management,recommend,ceo_approval,outlook,pros,cons,advice_management,date,title,city,state,years,current_employee,free_text_response,year,month,pros_pos_tagged,cons_pos_tagged,advice_management_pos_tagged,free_text_pos_tagged,pros_remove_stop,cons_remove_stop,advice_management_remove_stop,free_text_remove_stop,pros_Lemma,cons_Lemma,advice_management_Lemma,free_text_Lemma,pros_Polarity,pros_Analysis,cons_Polarity,cons_Analysis,advice_Polarity,advice_Analysis,free_text_Polarity,free_text_Analysis,pros_vader_sentiment,cons_vader_sentiment,advice_management_vader_sentiment,free_text_vader_sentiment,pros_vader_analysis,cons_vader_analysis,advice_management_vader_analysis,free_text_vader_analysis
0,AMERICAN AIRLINES GROUP INC,5.0,,,,,,,,,,flexiblitiy is great amongest staff,the work load is overwhelming at times,,2022-03-15,Customer Relations,,,,1.0,flexiblitiy is great amongest staff the work l...,2022,3.0,"[(flexiblitiy, n), (great, a), (amongest, a), ...","[(work, n), (load, n), (overwhelming, v), (tim...","[(nan, n)]","[(flexiblitiy, n), (great, a), (amongest, a), ...",flexiblitiy great amongest staff,work load overwhelming times,,flexiblitiy great amongest staff work load ove...,flexiblitiy great amongest staff,work load overwhelm time,,flexiblitiy great amongest staff work load...,0.8,Positive,0.0,Neutral,0.0,Neutral,0.8,Positive,0.6249,-0.1779,0.0,0.5267,Positive,Neutral,Neutral,Positive
1,AMERICAN AIRLINES GROUP INC,3.0,4.0,3.0,5.0,3.0,3.0,3.0,yes,no,neutral,Flexible schedules great flight benefits great...,Constantly understaffed in all areas Poor mana...,Hire more people so we aren t constantly delay...,2022-03-15,American Airlines Flight Attendant,New York,NY,5.0,1.0,Flexible schedules great flight benefits great...,2022,3.0,"[(Flexible, a), (schedules, n), (great, a), (f...","[(Constantly, r), (understaffed, v), (areas, n...","[(Hire, n), (people, n), (constantly, r), (del...","[(Flexible, a), (schedules, n), (great, a), (f...",Flexible schedules great flight benefits great...,Constantly understaffed areas Poor management ...,Hire people constantly delayed cancelled,Flexible schedules great flight benefits great...,Flexible schedule great flight benefit great...,Constantly understaffed area Poor management...,Hire people constantly delay cancel,Flexible schedule great flight benefit great...,0.8,Positive,-0.266667,Negative,0.0,Neutral,0.133333,Positive,0.9201,-0.8176,-0.5106,0.3182,Positive,Negative,Negative,Neutral
2,AMERICAN AIRLINES GROUP INC,5.0,5.0,5.0,5.0,5.0,5.0,5.0,yes,yes,yes,Love my job Love the freedom not being microma...,Reserve reserve reserve not being able to hold...,,2022-03-15,Flight Attendant,,,,1.0,Love my job Love the freedom not being microma...,2022,3.0,"[(Love, v), (job, n), (Love, n), (freedom, n),...","[(Reserve, n), (reserve, v), (reserve, n), (ab...","[(nan, n)]","[(Love, v), (job, n), (Love, n), (freedom, n),...",Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve rese...,Love job Love freedom micromanage,Reserve reserve reserve able hold fll,,Love job Love freedom micromanage Reserve ...,0.5,Positive,0.5,Positive,0.0,Neutral,0.5,Positive,0.9274,0.0,0.0,0.9274,Positive,Neutral,Neutral,Positive
3,AMERICAN AIRLINES GROUP INC,1.0,,,,,,,,,,Quit after one month with the company,Working conditions customer service Terrance,Stop being a company full of uneducated low li...,2022-03-15,Pilot,,,,0.0,Quit after one month with the company Working...,2022,3.0,"[(Quit, n), (one, None), (month, n), (company,...","[(Working, v), (conditions, n), (customer, n),...","[(Stop, n), (company, n), (full, a), (uneducat...","[(Quit, n), (one, None), (month, n), (company,...",Quit one month company,Working conditions customer service Terrance,Stop company full uneducated low lives,Quit one month company Working conditions cust...,Quit one month company,Working condition customer service Terrance,Stop company full uneducated low life,Quit one month company Working condition c...,0.0,Neutral,0.0,Neutral,0.175,Positive,0.175,Positive,0.0,0.0,-0.5106,-0.5106,Neutral,Neutral,Negative,Negative
4,AMERICAN AIRLINES GROUP INC,3.0,4.0,2.0,5.0,3.0,3.0,3.0,,yes,yes,Resources love to stay with this company for d...,Retention of resources Didn t seem important I...,Offer retention bonuses for those loyal people...,2022-03-14,Senior Project Manager,Fort Worth,TX,10.0,0.0,Resources love to stay with this company for d...,2022,3.0,"[(Resources, n), (love, v), (stay, v), (compan...","[(Retention, n), (resources, n), (seem, v), (i...","[(Offer, n), (retention, n), (bonuses, n), (lo...","[(Resources, n), (love, v), (stay, v), (compan...",Resources love stay company decades,Retention resources seem important saw good re...,Offer retention bonuses loyal people stay,Resources love stay company decades Retention ...,Resources love stay company decade,Retention resource seem important saw good r...,Offer retention bonus loyal people stay,Resources love stay company decade Retenti...,0.5,Positive,0.55,Positive,0.333333,Positive,0.483333,Positive,0.6369,0.7096,0.765,0.9493,Positive,Positive,Positive,Positive


In [10]:
# potentiallly making a column of the negative/positive words per observation

In [11]:
# export data with sentiment analysis in it
data.to_csv('C:/Users/19012/Documents/sp22-capstone/sentiment_analysis.csv', index=False)