# HW 1 - Sentiment Analysis for Employee Satisfaction during COVID-19 Pandemic on Glassdoor Panel Dataset

The COVID-19 pandemic and the following lockdowns disrupted company operations and lead to an unprecedented impact on employees. Companies adopting different strategies in dealing with this urgent condition may have different results on employee satisfaction. Employee satisfaction reviews during the pandemic may contain useful information for investors. In this homework, we will use Sentiment Analysis to analyze employee satisfaction reviews on the Glassdoor dataset from major Silicon Valley firms.

**Submission**
Submit one Jupyter notebook file (.ipynb) with your code; be sure to explain your code using Markdown cells and comments.  No need to attach any other file.

In [4]:
# print all the outputs in a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [7]:
import pandas as pd

## Step 1: Import the datasets
1. data_glassdoor_hw1.csv
2. positive.txt
3. negative.txt
4. stopwords.txt

### Import data_glassdoor_hw1.csv

In [8]:
df = pd.read_csv('data_glassdoor_hw1.csv')

In [9]:
df.head(10)

Unnamed: 0,Firm,date,employee_title,location,review_title,pros,cons
0,Facebook,"Aug 31, 2020",Engineering Manager,"Menlo Park, CA",Neat place. Org struggles,"Well funded, lots of resources, focus on engin...","Organization is a challenge, senior people fig..."
1,Facebook,"Aug 31, 2020",Software Engineer,"Menlo Park, CA",Feel lucky to join FB,- Lovely people must be the first pros.\n- Boo...,I don't feel any cons so far\nBe the first to ...
2,Facebook,"Aug 31, 2020",Senior Data Analyst,"Menlo Park, CA","great, but could be better","great tech, smartest ppl, great food\n",lack of policing political bigotry\n
3,Facebook,"Aug 31, 2020",Critical Facility Engineer,"Huntsville, AL",Facebook is a great place to work.,"you can be yourself, more diversity than my ot...",no cons to report at all.\n
4,Facebook,"Aug 30, 2020",Anonymous,"San Francisco, CA",Good management,Good managers with experience in hand\nLot to ...,Too much politics in the company\nBe the first...
5,Facebook,"Aug 30, 2020",Software Engineer,"San Jose, CA",Nice job,Competitive compensation and good benefits\n,a little bit overtime possibly\nBe the first t...
6,Facebook,"Aug 28, 2020",Privacy Program Manager,"New York, NY",Phenomenal place to work!,"Compensation, benefits and work-life balance a...","Learning curve can be steep, particularly in t..."
7,Facebook,"Aug 28, 2020",Manager,"Austin, TX",Good Place to Grow and Employee Focused,"Solid Growth Trend, Lots of great perks, Compa...",Back office tools and processes need some work...
8,Facebook,"Aug 28, 2020",Director,"New York, NY",Left but Returned,- Best people\n- Challenging / interesting wor...,- None to think of\n
9,Facebook,"Aug 27, 2020",Product Specialist,"Seattle, WA",Ambiguity,Lots of documentation available online\nTalent...,You’re a generalist in a sea of specialists. V...


#### Get 6 lists from the dataframe: 
1. apple_pros 
2. apple_cons 
3. facebook_pros 
4. facebook_cons 
5. google_pros 
6. google_cons

In [10]:
apple_pros=list(df.loc[(df['Firm'] == 'Apple')]['pros'])
apple_cons=list(df.loc[(df['Firm'] == 'Apple')]['cons'])

facebook_pros=list(df.loc[(df['Firm'] == 'Facebook')]['pros'])
facebook_cons=list(df.loc[(df['Firm'] == 'Facebook')]['cons'])

google_pros=list(df.loc[(df['Firm'] == 'Google')]['pros'])
google_cons=list(df.loc[(df['Firm'] == 'Google')]['cons'])

Please print out first 10 reviews for each list

In [11]:
apple_pros[:10]

['fun ppl and exciting work days\n',
 'Easy paced\nCustomer interactions\nTechnology based\nFun to work in\nFrequent breaks\n',
 "Infinite complex work, never-ending series of problems to solve, never bored. Great for the under achiever types, because you'll never actually finish anything.\n",
 'Cool people and decent benefits\n',
 'Best retail company to work for\n',
 "It's very awesome, indeed it is.\n",
 'flexible schedule great starting pay many perks diverse environment\n',
 'Great at everything yeah oh yah\n',
 'It was pretty good time and staff was great.\n',
 'Get to work with a lot of different people with different interests/expertises, healthy work life balance, discounts on stocks & other benefits\n']

### Import Positive words 

Save import words in a list

Hint: Files I/O

Opening 'positive.txt' file in read mode and creating a list 'import_positive_words' using the file object method 'f.readlines()'

In [27]:
with open('positive.txt', 'r') as f:
    import_positive_words = f.readlines()

### Import Negative words

Save import words in a list

Hint: Files I/O

Opening 'negative.txt' file in read mode and creating a list 'import_negative_words' using the file object method 'f.readlines()'

In [26]:
with open('negative.txt', 'r') as f:
    import_negative_words = f.readlines()

### Import Stopwords

Save import words in a list

Hint: Files I/O

Opening 'stopwords.txt' file in read mode and creating a list 'import_stop_words' using the file object method 'f.readlines()'

In [25]:
with open('stopwords.txt','r') as f:
    import_stop_words = f.readlines()

## Step 2: Text data preprocessing

Generate a function or functions to preprocess the 6 lists (i.e., 1. apple_pros, 2. apple_cons, 3. facebook_pros, 4. facebook_cons, 5. google_pros, and 6. google_cons) with the following purposes:
1. Lower case
2. Remove leading and ending spaces
2. Remove punctuations like @ : . , ?
3. Remove new line character \n and replace it with space
4. Convert all reviews to words

Explain how your functions work using Markdown cells

In [38]:
def review_preprocess(reviews):
    for item in range(len(reviews)):
        reviews[item] = reviews[item].lower()
        reviews[item] = reviews[item].strip()
        
        
    return reviews

review_preprocess(apple_pros[:10])

['fun ppl and exciting work days',
 'easy paced\ncustomer interactions\ntechnology based\nfun to work in\nfrequent breaks',
 "infinite complex work, never-ending series of problems to solve, never bored. great for the under achiever types, because you'll never actually finish anything.",
 'cool people and decent benefits',
 'best retail company to work for',
 "it's very awesome, indeed it is.",
 'flexible schedule great starting pay many perks diverse environment',
 'great at everything yeah oh yah',
 'it was pretty good time and staff was great.',
 'get to work with a lot of different people with different interests/expertises, healthy work life balance, discounts on stocks & other benefits']

Plug the 6 lists into your functions and get 6 proprocessed word lists

Plaase print out the **number of words** in each word list

In [None]:
print()
print()

print()
print()

print()
print()

## Step 3: Employee Satisfaction Score Calculation

### Goal: 
1. Calculate the **positive score** for each firm from their **pro reviews**
2. Calculate the **negative score** for each firm from their **con reviews**
3. Calculate **Employee Satisfaction Score** = positive score - negative score

### Analyze Apple, Facebook, and Google's pro review word lists

Generate a function or functions to get the

**1. positive word count**, 
**2. stop word count**, 
**3. total word count** 

from each firm's **pro** review word list.

In [None]:
def get_pro_scores(preprocessed_reviews):
    '''
    Generate a function or functions to get 
    1. positive word count
    2. stop word count
    3. total word count
    from each firm's pro review word list
    
    You can combine all stpes in one function or seperate them into multiple functions
    '''

    return 

1. Use the functions to calculate the **1. positive word count**, **2. stop word count**, **3. total word count**
2. Calculate each firm's **Positive score = Positive word count/ (Total word count - Stop word count)**

Print out Apple's 
1. Positive word count
2. Stop word count
3. Total word count 
4. Firm positive score

In [None]:
print ()
print ()
print ()
print ()

Print out Facebook's
1. Positive word count
2. Stop word count
3. Total word count 
4. Firm positive score

In [None]:
print ()
print ()
print ()
print ()

Print out Google's 
1. Positive word count
2. Stop word count
3. Total word count 
4. Firm positive score

In [None]:
print ()
print ()
print ()
print ()

### Analyze Apple, Facebook, and Google's con review word lists

Generate a function or functions to get the

**1. Negative word count**, 
**2. stop word count**, 
**3. total word count** 

from each firm's **con** review word list.

In [None]:
def get_con_scores(preprocessed_reviews):
    '''
    Generate a function or functions to get 
    1. negative word count
    2. stop word count
    3. total word count
    from each firm's con review word list
    
    You can combine all stpes in one function or seperate them into multiple functions
    '''
    return 

1. Use the function to calculate the **1. Negative word count**, **2. stop word count**, **3. total word count**
2. Calculate each firm's **Negative score = Negative word count/ (Total word count - Stop word count)**

Print out Apple's 
1. Negative word count
2. Stop word count
3. Total word count 
4. Firm negative score

In [None]:
print ()
print ()
print ()
print ()

Print out Facebook's 
1. Negative word count
2. Stop word count
3. Total word count 
4. Firm negative score

In [None]:
print ()
print ()
print ()
print ()

Print out Google's 
1. Negative word count
2. Stop word count
3. Total word count 
4. Firm negative score

In [None]:
print ()
print ()
print ()
print ()

### Calculate Apple, Facebook, and Google's employee satisfaction scores

Recall the Employee Satisfaction Score = Firm positive score - Firm negative score
<br>Print out each firm's Employee Satisfaction Score

In [None]:
print ()
print ()
print ()

## Conclusion (Write your conclusion in the markdown cell below)

Which firm has better employee satisfaction during COVID-19 pandemic?

## Limitation

Please list at least one limitation of this analysis