# Sentiment Analysis for Employee Satisfaction during COVID-19 Pandemic on Glassdoor Panel Dataset

The COVID-19 pandemic and the following lockdowns disrupted company operations and lead to an unprecedented impact on employees. Companies adopting different strategies in dealing with this urgent condition may have different results on employee satisfaction. Employee satisfaction reviews during the pandemic may contain useful information for investors. In this homework, we will use Sentiment Analysis to analyze employee satisfaction reviews on the Glassdoor dataset from major Silicon Valley firms.


In [1]:
# print all the outputs in a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
import pandas as pd
import string

## Step 1: Import the datasets
1. data_glassdoor_hw1.csv
2. positive.txt
3. negative.txt
4. stopwords.txt

### Import data_glassdoor_hw1.csv

In [3]:
df = pd.read_csv('data_glassdoor.csv')

In [4]:
df.head(10)

Unnamed: 0,Firm,date,employee_title,location,review_title,pros,cons
0,Facebook,"Aug 31, 2020",Engineering Manager,"Menlo Park, CA",Neat place. Org struggles,"Well funded, lots of resources, focus on engin...","Organization is a challenge, senior people fig..."
1,Facebook,"Aug 31, 2020",Software Engineer,"Menlo Park, CA",Feel lucky to join FB,- Lovely people must be the first pros.\n- Boo...,I don't feel any cons so far\nBe the first to ...
2,Facebook,"Aug 31, 2020",Senior Data Analyst,"Menlo Park, CA","great, but could be better","great tech, smartest ppl, great food\n",lack of policing political bigotry\n
3,Facebook,"Aug 31, 2020",Critical Facility Engineer,"Huntsville, AL",Facebook is a great place to work.,"you can be yourself, more diversity than my ot...",no cons to report at all.\n
4,Facebook,"Aug 30, 2020",Anonymous,"San Francisco, CA",Good management,Good managers with experience in hand\nLot to ...,Too much politics in the company\nBe the first...
5,Facebook,"Aug 30, 2020",Software Engineer,"San Jose, CA",Nice job,Competitive compensation and good benefits\n,a little bit overtime possibly\nBe the first t...
6,Facebook,"Aug 28, 2020",Privacy Program Manager,"New York, NY",Phenomenal place to work!,"Compensation, benefits and work-life balance a...","Learning curve can be steep, particularly in t..."
7,Facebook,"Aug 28, 2020",Manager,"Austin, TX",Good Place to Grow and Employee Focused,"Solid Growth Trend, Lots of great perks, Compa...",Back office tools and processes need some work...
8,Facebook,"Aug 28, 2020",Director,"New York, NY",Left but Returned,- Best people\n- Challenging / interesting wor...,- None to think of\n
9,Facebook,"Aug 27, 2020",Product Specialist,"Seattle, WA",Ambiguity,Lots of documentation available online\nTalent...,You’re a generalist in a sea of specialists. V...


#### Get 6 lists from the dataframe: 
1. apple_pros 
2. apple_cons 
3. facebook_pros 
4. facebook_cons 
5. google_pros 
6. google_cons

In [5]:
apple_pros=list(df.loc[(df['Firm'] == 'Apple')]['pros'])
apple_cons=list(df.loc[(df['Firm'] == 'Apple')]['cons'])

facebook_pros=list(df.loc[(df['Firm'] == 'Facebook')]['pros'])
facebook_cons=list(df.loc[(df['Firm'] == 'Facebook')]['cons'])

google_pros=list(df.loc[(df['Firm'] == 'Google')]['pros'])
google_cons=list(df.loc[(df['Firm'] == 'Google')]['cons'])

Please print out first 10 reviews for each list

In [6]:
apple_pros[:10]

['fun ppl and exciting work days\n',
 'Easy paced\nCustomer interactions\nTechnology based\nFun to work in\nFrequent breaks\n',
 "Infinite complex work, never-ending series of problems to solve, never bored. Great for the under achiever types, because you'll never actually finish anything.\n",
 'Cool people and decent benefits\n',
 'Best retail company to work for\n',
 "It's very awesome, indeed it is.\n",
 'flexible schedule great starting pay many perks diverse environment\n',
 'Great at everything yeah oh yah\n',
 'It was pretty good time and staff was great.\n',
 'Get to work with a lot of different people with different interests/expertises, healthy work life balance, discounts on stocks & other benefits\n']

### Import Positive words 

Save import words in a list

Hint: Files I/O

<div class="alert alert-block alert-info">Open 'positive.txt' file in read mode. Create a list 'import_positive_words' using the file object method <font color=blue>'f.readlines()'</font> and strip whitespace.</div>

In [7]:
with open('positive.txt', 'r') as f:
    import_positive_words = [word.strip() for word in f.readlines()]

### Import Negative words

Save import words in a list

Hint: Files I/O

<div class="alert alert-block alert-info">Open 'negative.txt' file in read mode. Create a list 'import_negative_words' using the file object method <font color=blue>'f.readlines()'</font> and strip whitespace.</div>

In [8]:
with open('negative.txt', 'r') as f:
    import_negative_words = [word.strip() for word in f.readlines()]

### Import Stopwords

Save import words in a list

Hint: Files I/O

<div class="alert alert-block alert-info">Open 'stopwords.txt' file in read mode. Create a list 'import_stop_words' using the file object method <font color=blue>'f.readlines()'</font> and strip whitespace.</div>

In [9]:
with open('stopwords.txt','r') as f:
    import_stop_words = [word.strip() for word in f.readlines()]

## Step 2: Text data preprocessing

Generate a function or functions to preprocess the 6 lists (i.e., 1. apple_pros, 2. apple_cons, 3. facebook_pros, 4. facebook_cons, 5. google_pros, and 6. google_cons) with the following purposes:
1. Lower case
2. Remove leading and ending spaces
2. Remove punctuations like @ : . , ?
3. Remove new line character \n and replace it with space
4. Convert all reviews to words

Explain how your functions work using Markdown cells

<div class="alert alert-block alert-info">
    
<font color=blue>review_preprocess(reviews)</font>: This function take list of 'reviews' as parameter. It iterates through each element (review) of the list (reviews) using the 'for' loop and performes the following functions:
    <ul>
    <li>**review.lower()** method converts each element of the list into lower case. </li>
    <li>**review.strip()** method removes whitespaces at the beginning and at the end of the string.</li>
    <li><font color=blue>remove_punctuations(review)</font>: This function iterates through each punctuation character in the     *string.punctuation (in-built function of string)*. Inside the loop, the replace() is used to replace each punctuation with empty string.</li>
    <li>**review.replace()** method replaces newline characters '\n' with a space.</li>
    <li>To convert all reviews to words, another for loop is used to split the review into words and append it into an empty list named 'preprocessed_reviews'.</li>
    </ul>
    
</div>

In [10]:
def review_preprocess(reviews):
    preprocessed_reviews = []
    
    for review in reviews:
        review = review.lower()
        review = review.strip()
        review = remove_punctuations(review)
        review = review.replace('\n',' ')
        for word in review.split():
            preprocessed_reviews.append(word)  
    return preprocessed_reviews     
        
def remove_punctuations(review):
    for p in string.punctuation:
        review = review.replace(p,'')
    return review

Plug the 6 lists into your functions and get 6 proprocessed word lists

In [11]:
apple_pros_preprocessed_words_list = review_preprocess(apple_pros)
apple_cons_preprocessed_words_list = review_preprocess(apple_cons)

facebook_pros_preprocessed_words_list = review_preprocess(facebook_pros)
facebook_cons_preprocessed_words_list = review_preprocess(facebook_cons)

google_pros_preprocessed_words_list = review_preprocess(google_pros)
google_cons_preprocessed_words_list = review_preprocess(google_cons)

Plaase print out the **number of words** in each word list

In [12]:
print("Apple's pros words list: %s" %len(apple_pros_preprocessed_words_list))
print("Apple's cons words list: %s" %len(apple_cons_preprocessed_words_list))

print("Facebook's pros words list: %s" %len(facebook_pros_preprocessed_words_list))
print("Facebook's cons words list: %s" %len(facebook_cons_preprocessed_words_list))

print("Google's pros words list: %s" %len(google_pros_preprocessed_words_list))
print("Google's cons words list: %s" %len(google_cons_preprocessed_words_list))

Apple's pros words list: 5216
Apple's cons words list: 10377
Facebook's pros words list: 5687
Facebook's cons words list: 7265
Google's pros words list: 3848
Google's cons words list: 8231


## Step 3: Employee Satisfaction Score Calculation

### Goal: 
1. Calculate the **positive score** for each firm from their **pro reviews**
2. Calculate the **negative score** for each firm from their **con reviews**
3. Calculate **Employee Satisfaction Score** = positive score - negative score

### Analyze Apple, Facebook, and Google's pro review word lists

Generate a function or functions to get the

**1. positive word count**, 
**2. stop word count**, 
**3. total word count** 

from each firm's **pro** review word list.

 <div class="alert alert-block alert-info">

1. <font color=blue>positive_words_count(preprocessed_words_list)</font>: This function takes 'preprocessed_words_list' as parameter, iterates through each word, and counts (positive_words_count) the occurrences of word that are present in the list of import_positive_words ('postitive.txt' file). The function returns the total count of positive words present in preprocessed_words_list.
    
2. <font color=blue>stop_words_count(preprocessed_words_list)</font>: This function takes 'preprocessed_words_list' as parameter, iterates through each word, and counts (stop_words_count) the occurrences of word that are present in the list of import_stop_words ('stopwords.txt' file). The function returns the total count of stop words present in preprocessed_words_list.
    
3. <font color=blue>total_words_count(preprocessed_words_list)</font>: This function takes 'preprocessed_words_list' as parameter, and returns the length of the preprocessed_words_list which consists the total word count.

4. <font color=blue>get_pros_score(preprocessed_words_list)</font>: This function takes 'preprocessed_words_list' as parameter, and  calculates the positive score of a company, by dividing positive words count by (total words count - stop words count).</div>

In [13]:
def positive_words_count(preprocessed_words_list):
    positive_words_count = 0
    for word in preprocessed_words_list:
        if word in import_positive_words:
            positive_words_count += 1      
    return positive_words_count

def stop_words_count(preprocessed_words_list):
    stop_words_count = 0
    for word in preprocessed_words_list:
        if word in import_stop_words:
            stop_words_count += 1
    return stop_words_count

def total_words_count(preprocessed_words_list):
    return len(preprocessed_words_list) 

def get_pros_score(preprocessed_words_list):
    positive_count = positive_words_count(preprocessed_words_list)
    stop_count = stop_words_count(preprocessed_words_list)
    total_count = total_words_count(preprocessed_words_list)
    positive_score = positive_count / (total_count - stop_count)
    return positive_score

1. Use the functions to calculate the **1. positive word count**, **2. stop word count**, **3. total word count**
2. Calculate each firm's **Positive score = Positive word count/ (Total word count - Stop word count)**

Print out Apple's 
1. Positive word count
2. Stop word count
3. Total word count 
4. Firm positive score

In [14]:
print("Apple's positive words count is: %s" %positive_words_count(apple_pros_preprocessed_words_list))
print("Apple's stop words count is: %s" %stop_words_count(apple_pros_preprocessed_words_list))
print("Apple's total words count is: %s" %total_words_count(apple_pros_preprocessed_words_list))
print("Apple's positive score is: %s" %get_pros_score(apple_pros_preprocessed_words_list))

Apple's positive words count is: 1121
Apple's stop words count is: 1987
Apple's total words count is: 5216
Apple's positive score is: 0.3471663053576959


Print out Facebook's
1. Positive word count
2. Stop word count
3. Total word count 
4. Firm positive score

In [15]:
print("Facebook's positive words count is: %s" %positive_words_count(facebook_pros_preprocessed_words_list))
print("Facebook's stop words count is: %s" %stop_words_count(facebook_pros_preprocessed_words_list))
print("Facebook's total words count is: %s" %total_words_count(facebook_pros_preprocessed_words_list))
print("Facebook's positive score is: %s" %get_pros_score(facebook_pros_preprocessed_words_list))

Facebook's positive words count is: 849
Facebook's stop words count is: 2434
Facebook's total words count is: 5687
Facebook's positive score is: 0.2609898555179834


Print out Google's 
1. Positive word count
2. Stop word count
3. Total word count 
4. Firm positive score

In [16]:
print("Google's positive words count is: %s" %positive_words_count(google_pros_preprocessed_words_list))
print("Google's stop words count is: %s" %stop_words_count(google_pros_preprocessed_words_list))
print("Google's total words count is: %s" %total_words_count(google_pros_preprocessed_words_list))
print("Google's positive score is: %s" %get_pros_score(google_pros_preprocessed_words_list))

Google's positive words count is: 911
Google's stop words count is: 1370
Google's total words count is: 3848
Google's positive score is: 0.367635189669088


### Analyze Apple, Facebook, and Google's con review word lists

Generate a function or functions to get the

**1. Negative word count**, 
**2. stop word count**, 
**3. total word count** 

from each firm's **con** review word list.

<div class="alert alert-block alert-info">
    
1. <font color=blue>negative_words_count(preprocessed_words_list)</font>: This function takes 'preprocessed_words_list' as parameter, iterates through each word, and counts (negative_words_count) the occurrences of word that are present in the list of import_negative_words ('negative.txt' file). The function returns the total count of negative words present in preprocessed_words_list.

2. <font color=blue>get_cons_score(preprocessed_words_list)</font>: This function takes 'preprocessed_words_list' as parameter, and calculates the negative score of a company, by dividing negative words count by (total words count - stop words count).
</div>

In [17]:
def negative_words_count(preprocessed_words_list):    
    negative_words_count = 0
    for word in preprocessed_words_list:
        if word in import_negative_words:
            negative_words_count += 1
    return negative_words_count

def get_cons_score(preprocessed_words_list):
    negative_count = negative_words_count(preprocessed_words_list)
    stop_count = stop_words_count(preprocessed_words_list)
    total_count = total_words_count(preprocessed_words_list)
    negative_score = negative_count / (total_count - stop_count)
    return negative_score

1. Use the function to calculate the **1. Negative word count**, **2. stop word count**, **3. total word count**
2. Calculate each firm's **Negative score = Negative word count/ (Total word count - Stop word count)**

Print out Apple's 
1. Negative word count
2. Stop word count
3. Total word count 
4. Firm negative score

In [18]:
print("Apple's negative words count is: %s" %negative_words_count(apple_cons_preprocessed_words_list))
print("Apple's stop words count is: %s" %stop_words_count(apple_cons_preprocessed_words_list))
print("Apple's total words count is: %s" %total_words_count(apple_cons_preprocessed_words_list))
print("Apple's negative score is: %s" %get_cons_score(apple_cons_preprocessed_words_list))

Apple's negative words count is: 340
Apple's stop words count is: 5099
Apple's total words count is: 10377
Apple's negative score is: 0.06441834028040924


Print out Facebook's 
1. Negative word count
2. Stop word count
3. Total word count 
4. Firm negative score

In [19]:
print("Facebook's negative words count is: %s" %negative_words_count(facebook_cons_preprocessed_words_list))
print("Facebook's stop words count is: %s" %stop_words_count(facebook_cons_preprocessed_words_list))
print("Facebook's total words count is: %s" %total_words_count(facebook_cons_preprocessed_words_list))
print("Facebook's negative score is: %s" %get_cons_score(facebook_cons_preprocessed_words_list))

Facebook's negative words count is: 266
Facebook's stop words count is: 3601
Facebook's total words count is: 7265
Facebook's negative score is: 0.07259825327510917


Print out Google's 
1. Negative word count
2. Stop word count
3. Total word count 
4. Firm negative score

In [20]:
print("Google's negative words count is: %s" %negative_words_count(google_cons_preprocessed_words_list))
print("Google's stop words count is: %s" %stop_words_count(google_cons_preprocessed_words_list))
print("Google's total words count is: %s" %total_words_count(google_cons_preprocessed_words_list))
print("Google's negative score is: %s" %get_cons_score(google_cons_preprocessed_words_list))

Google's negative words count is: 268
Google's stop words count is: 4129
Google's total words count is: 8231
Google's negative score is: 0.06533398342272062


### Calculate Apple, Facebook, and Google's employee satisfaction scores

Recall the Employee Satisfaction Score = Firm positive score - Firm negative score
<br>Print out each firm's Employee Satisfaction Score

<div class="alert alert-block alert-info">

1. <font color=blue>employee_satisfaction_score(pros_preprocessed_reviews, cons_preprocessed_reviews)</font>: This function takes two lists of pros and cons preprocessed reviews as parameter, calculates the positive and negative scores by calling pre-defined functions (get_pros_score() and get_cons_score()), and then computes the overall employee satisfaction score by subtracting the negative score from the positive score. This function returns the calculated employee satisfaction score.
    
2. Print each firm's Employee Satisfaction Score
</div>    

In [21]:
def employee_satisfaction_score(pros_preprocessed_reviews, cons_preprocessed_reviews):
    positive_score = get_pros_score(pros_preprocessed_reviews)
    negitive_score = get_cons_score(cons_preprocessed_reviews)
    employee_satisfaction_score = positive_score - negitive_score
    return employee_satisfaction_score

print("Apple's Satisfaction Score: %s" %employee_satisfaction_score(apple_pros_preprocessed_words_list, apple_cons_preprocessed_words_list))
print("Facebook's Satisfaction Score: %s" %employee_satisfaction_score(facebook_pros_preprocessed_words_list, facebook_cons_preprocessed_words_list))
print("Google's Satisfaction Score: %s" %employee_satisfaction_score(google_pros_preprocessed_words_list, google_cons_preprocessed_words_list))

Apple's Satisfaction Score: 0.2827479650772866
Facebook's Satisfaction Score: 0.18839160224287427
Google's Satisfaction Score: 0.30230120624636736


## Conclusion

Which firm has better employee satisfaction during COVID-19 pandemic?

<div class="alert alert-block alert-success">After conducting sentiment analysis to analyze employee satisfaction reviews on the Glassdoor dataset during the pandemic, it can be concluded that <b>Google's employee satisfaction is better</b> than Apple and Facebook.</div>    

## Limitation

<div class="alert alert-block alert-warning">

There are certain points that I have noticed, which could make a slight difference in the numbers in this analysis:
    <ul>
        <li>While replacing punctuations with an empty string ('') or space (' ') from the review list, there are certain words like 'hard-working', 'self-improvement', 'fast-paced', 'micro-management' present in pros and cons review lists (for example, 'hard-working' will become 'hardworking' or 'hard working') will not match with the same words with hyphen (-) listed in the positive.txt/negative.txt file.</li> 
        <li>Certain special characters like apostrophes and bullets (’, ”, “, •) are present in the reviews. The analysis doesn't specify the removal of such characters. As a result, there might be a slight change in the Employee Satisfaction Score.</li>
</ul>
    
Given the small size of the dataset in this analysis, the Employee Satisfaction Score may not show a significant variation. However, in the case of a larger dataset, it might have more noticeable impact on the numbers. 
</div>