# Question: Does the company treat its employees well? 

This section relates to cultural aspects of the companies being examined, including the chief concern: how employees are treated. We also consider worker safety; fair pay and benefits; opportunities for development, training, and advancement; and other aspects that impact the company's workers.

Employees are the lifeblood of a company. Happy, healthy, and valued employees are more willing and able to do higher-quality work, be enthusiastic brand evangelists, unleash their creativity to invent better services and solutions, and innovate to improve the company. Employees should be viewed as valuable assets to invest in continually, not expendable "resources" or drags on the bottom line. (The "bottom line" is net profit -- a company's income after all expenses have been deducted from revenue.)

Companies that excel at engaging their employees actually achieve per-share earnings growth more than four times that of their rivals, according to Gallup. Compared to companies in the bottom quartile, the top-quartile companies (based on employee engagement) generate higher customer engagement, higher productivity, better retention, fewer accidents, and 21% higher profitability.

Even though Wall Street tends to cheer when companies lay off workers, high employee turnover is actually an expense to be avoided. Not only is it a financial cost -- think about severance packages, and the costs of recruiting and training new employees as well as retraining remaining workers -- but the loss of intellectual capital is also a poor outcome for employers.

Company websites and sustainability reports can help you assess this factor. Also look for publications from organizations that rate companies on worker treatment, such as Fortune's annual list of "100 Best Companies to Work For" and Forbes' "Just 100" list.

We also look for negative elements, like shoddy employment treatment; contentious relationships with unions; lawsuits or controversies about discrimination; harassment or wage theft; and other behavior that indicates poor employee relations, like serial layoffs or constant restructuring. These kinds of red flags might disqualify a company for inclusion in our ESG portfolio.

# Data Sources

Possible Sites to Aggregate Employee Reviews:
- Glassdoor 
- Indeed
- Vault 
- CareerBliss
- Kununu
- JobAdvisor
- Ratemyemployer
- TheJobCrowd
- LookBeforeYouLeap
- Comparably
- Yelp

# Imports

In [109]:
# All necessary imports
from bs4 import BeautifulSoup
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time
import requests
import pandas as pd
pd.set_option("display.max_columns", 50)
import pickle
import newspaper
import re
from functools import reduce

# Data Collection

For the time being, only Apple will be used. Once the process is finished for a single stock, we can run back through to get a bunch for the industry. 

In [110]:
companies = ['Apple',
             'Best Buy Co Inc',
             'GameStop Corporation',
             'Sony Corporation']

## Indeed

In [128]:
def get_indeed_reviews(companies):
    
    # Create empty dataframe
    indeed_reviews = pd.DataFrame(columns=['company','work_life_balance','pay_and_benefits','job_security_and_advancement','management','culture','composite_score'])
    
    # Set path to chromedriver
    PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
    # Define options 
    options = Options()
    # Remove pop up window
    options.add_argument("--headless")
    # Define driver
    driver = webdriver.Chrome(PATH, options=options)
    # # Define driver
    # driver = webdriver.Chrome(PATH)
    driver.set_window_size(1080,800)
    # Define url
    url= "https://www.indeed.com/companies?from=gnav-acme--discovery-webapp"
    # Get website
    driver.get(url)
    
    for company in companies:
        try:
            # Find search bar
            search_bar = driver.find_element_by_xpath('//*[@id="exploreCompaniesWhat"]')
            # Clear search bar
            search_bar.clear()
            # Enter company name into search bar
            search_bar.send_keys(company)
            # Search company
            search_bar.send_keys(Keys.ENTER)
            search_bar.send_keys(Keys.ENTER)
            time.sleep(3)
            try:
                exit_button = driver.find_element_by_xpath('//*[@id="popover-x"]/button')
                exit_button.click()
            except:
                pass
            # Get company
            driver.find_element_by_xpath('//*[@id="cmp-discovery"]/div[2]/div/div[2]/div/div[1]/div[2]').click()
            time.sleep(2)
            # Define variables for employee ratings
            work_life_balance = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[1]/a/span[1]').text)
            pay_and_benefits = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/a/span[1]').text)
            job_security_and_advancement = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[3]/a/span[1]').text)
            management = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[4]/a/span[1]').text)
            culture = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[5]/a/span[1]').text)
            composite_score = (work_life_balance+pay_and_benefits+job_security_and_advancement+management+culture) / 25

            # Append new data to dataframe
            indeed_reviews = indeed_reviews.append({'company': company,
                                                    'work_life_balance': work_life_balance,
                                                    'pay_and_benefits': pay_and_benefits,
                                                    'job_security_and_advancement': job_security_and_advancement,
                                                    'management': management,
                                                    'culture': culture,
                                                    'composite_score': composite_score}, ignore_index=True)
        
            time.sleep(1)
            driver.find_element_by_link_text('Company Reviews').click()
            time.sleep(2)
#             driver.back()
            time.sleep(1)
            
        except:
            pass
    
    return indeed_reviews
    

In [129]:
indeed_reviews = get_indeed_reviews(companies)

## Vault

In [131]:
def get_vault_reviews(companies):
    
    vault_reviews = pd.DataFrame(columns=['company','uppers','downers'])
    
    # Set path to chromedriver
    PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
    # Define options 
    options = Options()
    # Remove pop up window
    options.add_argument("disable-infobars")
    options.add_argument("--headless")
    # Define driver
    driver = webdriver.Chrome(PATH, options=options)
    # # Define driver
    # driver = webdriver.Chrome(PATH)
    driver.set_window_size(1080,800)
    # Define url
    url= "https://www.vault.com/"
    # Get website
    driver.get(url)
    time.sleep(2)

    for company in companies:
        # Find search bar
        search_bar = driver.find_element_by_xpath('//*[@id="HeroSearchBox"]')
        # Clear search bar
        search_bar.clear()
        # Enter company name into search bar
        search_bar.send_keys(company)
        time.sleep(3)
        # Search company
        search_bar.send_keys(Keys.ENTER)
        time.sleep(4)
        # Find first company listed
        likely_company = driver.find_element_by_xpath('//*[@id="feed-article-1"]/div/h2/a')
        # Click on company
        likely_company.click()
        time.sleep(3)
        # Get company positives
        uppers = (driver.find_element_by_xpath('//*[@id="main-content"]/div[1]/div[2]/div[1]/div[1]').text).replace('\n',' ').lstrip('Uppers').strip()
        # Get company negatives
        downers = (driver.find_element_by_xpath('//*[@id="main-content"]/div[1]/div[2]/div[1]/div[2]').text).replace('\n',' ').lstrip('Downers').strip()
        
        # Append new data to dataframe
        vault_reviews = vault_reviews.append({'company': company,
                                                'uppers': uppers,
                                                'downers': downers}, ignore_index=True)
        
        time.sleep(2)
        driver.back()
        time.sleep(1)
        driver.back()
        time.sleep(3)
    
    vault_reviews.uppers = vault_reviews.uppers.apply(lambda x: x.lower())
    vault_reviews.downers = vault_reviews.downers.apply(lambda x: x.lower())
        
    return vault_reviews   

In [132]:
vault_reviews = get_vault_reviews(companies)

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="main-content"]/div[1]/div[2]/div[1]/div[2]"}
  (Session info: headless chrome=88.0.4324.150)


# Career Bliss 

In [133]:
def get_careerbliss_reviews(companies):
    
    careerbliss_reviews = pd.DataFrame(columns=['company','company_culture','coworkers','rewards','way_you_work','growth_opp','person_you_work_for','support','work_setting'])
    
    # Set path to chromedriver
    PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
    # Define options 
    options = Options()
    # Remove pop up window
    options.add_argument("disable-infobars")
    options.add_argument("--headless")
    # Define driver
    driver = webdriver.Chrome(PATH, options=options)
    # # Define driver
    # driver = webdriver.Chrome(PATH)
    driver.set_window_size(1080,800)
    # Define url
    url= "https://www.careerbliss.com/reviews/"
    # Get website
    driver.get(url)
    time.sleep(2)
    for company in companies:
        # Find search bar
        search_bar = driver.find_element_by_xpath('//*[@id="search-q"]')
        # Clear search bar
        search_bar.clear()
        # Enter company name into search bar
        search_bar.send_keys(company)
        time.sleep(3)
        # Search company
        search_bar.send_keys(Keys.ENTER)
        time.sleep(4)
        try:
            # Find first company listed
            likely_company = driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div[1]/div[3]/div[2]/div[2]/div[2]/div/div/span/a')
            # Click on company
            likely_company.click()
        except:
            driver.refresh()
            time.sleep(3)
            # Find first company listed
            likely_company = driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div[1]/div[3]/div[2]/div[2]/div[2]/div/div/span/a')
            # Click on company
            likely_company.click()
        time.sleep(3)
        company_culture = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[1]/div[1]/div/span[3]').text)
        coworkers = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[2]/div[1]/div/span[3]').text)
        rewards = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[3]/div[1]/div/span[3]').text)
        way_you_work = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[4]/div[1]/div/span[3]').text)
        growth_opp = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[1]/div[2]/div/span[3]').text)
        person_you_work_for = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[2]/div[2]/div/span[3]').text)
        support = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[3]/div[2]/div/span[3]').text)
        work_setting = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[4]/div[2]/div/span[3]').text)
        
        time.sleep(2)
        driver.back()
        time.sleep(2)
        driver.back()
        time.sleep(2)
        
        careerbliss_reviews = careerbliss_reviews.append({'company': company,
                                                          'company_culture': company_culture,
                                                          'coworkers': coworkers,
                                                          'rewards': rewards,
                                                          'way_you_work': way_you_work,
                                                          'growth_opp': growth_opp,
                                                          'person_you_work_for': person_you_work_for,
                                                          'support': support,
                                                          'work_setting': work_setting}, ignore_index=True)
    
    return careerbliss_reviews

In [134]:
careerbliss_reviews = get_careerbliss_reviews(companies)

# Comparably

In [136]:
def get_comparably_reviews(companies):
    
    comparably_reviews = pd.DataFrame(columns=['company','culture','ceo_score','net_promoter_scale','perks_and_benefits','outlook','executive_team','work_culture','compensation','leadership','diversity','team','happiness','environment','gender','manager','retention','meetings','professional_development','office_culture'])
    
    # Set path to chromedriver
    PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
    # Define options 
    options = Options()
    # Remove pop up window
    options.add_argument("disable-infobars")
    options.add_argument("--headless")
    # Define driver
    driver = webdriver.Chrome(PATH, options=options)
    # # Define driver
    # driver = webdriver.Chrome(PATH)
    driver.set_window_size(1080,800)
    # Define url
    url= "https://www.comparably.com/"
    # Get website
    driver.get(url)
    time.sleep(2)
    for company in companies:
        # Find search bar
        search_bar = driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div/div/div/div/form/div/div/input')
        # Clear search bar
        search_bar.clear()
        # Enter company name into search bar
        search_bar.send_keys(company)
        time.sleep(1)
        # Search company
        search_bar.send_keys(Keys.ENTER)
        time.sleep(4)
        # Find first company listed
        likely_company = driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div/div[2]/div[1]/div/div[2]/a[1]/div/div/div[1]')
        # Click on company
        likely_company.click()
        time.sleep(2)
        try:
            culture = float(driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[4]/div/div/div/div/div[1]/div[1]/a/div[2]/div[1]/div[2]/b').text)
        except:
            culture - 'NaN'
        try:
            ceo_score = float(driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[4]/div/div/div/div/div[1]/div[3]/a/div[4]/span/span[1]/span').text)
        except:
            ceo_score = 'NaN'
        try:
            net_promoter_scale = float(driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[4]/div/div/div/div/div[2]/div[2]/a[3]/div[2]/div/div[2]/div[1]').text)
        except:
            net_promoter_scale = 'NaN'
        time.sleep(2)
        driver.find_element_by_link_text('Culture').click()
        time.sleep(2)
        # Getting page content
        content = driver.page_source.encode('utf-8').strip()
        # Getting page content in html
        soup = BeautifulSoup(content,"html.parser")
        # perks_and_benefits = soup.find_all("a", href=re.compile("perks-and-benefits"))
        table = soup.find(class_="gs-row offset cppCultureGrades-Grades")
        ratings_squares = table.find_all(class_="gs-col gs-col-1-2")
        for i in range(0,18):
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Perks And Benefits':
                perks_and_benefits = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Outlook':
                outlook = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Executive Team':
                executive_team = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Work Culture':
                work_culture = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Compensation':
                compensation = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Leadership':
                leadership = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Diversity':
                diversity = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Team':
                team = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Happiness':
                happiness = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Environment':
                environment = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Gender':
                gender = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Manager':
                manager = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Retention':
                retention = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Meetings':
                meetings = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Professional Development':
                professional_development = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Office Culture':
                office_culture = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
            time.sleep(3)
            driver.find_element_by_xpath('//*[@id="body"]/header/div/div/div[1]/div/a').click()
            time.sleep(2)
        
        comparably_reviews = comparably_reviews.append({'company': company,
                                                        'culture': culture,
                                                        'ceo_score': ceo_score,
                                                        'net_promoter_scale': net_promoter_scale,
                                                        'perks_and_benefits': perks_and_benefits,
                                                        'outlook': outlook,
                                                        'executive_team': executive_team,
                                                        'work_culture': work_culture,
                                                        'compensation': compensation,
                                                        'leadership': leadership,
                                                        'diversity': diversity,
                                                        'team': team,
                                                        'happiness': happiness,
                                                        'environment': environment,
                                                        'gender': gender,
                                                        'manager': manager,
                                                        'retention': retention,
                                                        'meetings': meetings,
                                                        'professional_development': professional_development,
                                                        'office_culture': office_culture}, ignore_index=True)

    return comparably_reviews

In [137]:
comparably_reviews = get_comparably_reviews(companies)

# Merging Dataframes

In [15]:
data_frames = [indeed_reviews, vault_reviews, careerbliss_reviews, comparably_reviews]

In [16]:
question_1 = reduce(lambda  left,right: pd.merge(left,right,on=['company'],how='outer'), data_frames)

In [19]:
question_1

Unnamed: 0,company,work_life_balance,pay_and_benefits,job_security_and_advancement,management,culture_x,composite_score,uppers,downers,company_culture,coworkers,rewards,way_you_work,growth_opp,person_you_work_for,support,work_setting,culture_y,ceo_score,net_promoter_scale,perks_and_benefits,outlook,executive_team,work_culture,compensation,leadership,diversity,team,happiness,environment,gender,manager,retention,meetings,professional_development,office_culture
0,Apple,3.8,4.1,3.7,3.7,4.1,0.776,working for a top brand that helps influence p...,some employees say that promotion and finding ...,4.1,4.6,4.0,4.2,3.7,4.3,4.3,4.1,4.3,80.0,24.0,81.0,78.0,74.0,76.0,74.0,73.0,73.0,77.0,75.0,73.0,73.0,69.0,69.0,68.0,58.0,59.0


# Making Sense Out of Data

In [93]:
question_1['culture'] = ((question_1.culture_x *20) + 
                         (question_1.culture_y *20) + 
                         question_1.work_culture + 
                         question_1.office_culture + 
                         (question_1.company_culture * 20)) / 5
org_data = question_1.drop(columns=['culture_x',
                                    'culture_y',
                                    'work_culture',
                                    'office_culture',
                                    'company_culture',
                                    'composite_score'], axis=1)

In [94]:
org_data['opportunity'] = ((org_data.job_security_and_advancement * 20) + 
                           (org_data.growth_opp * 20) + 
                           org_data.professional_development) / 3
org_data.drop(columns = ['job_security_and_advancement',
                         'growth_opp',
                         'professional_development'], axis=1, inplace=True)

In [95]:
org_data['leadership_and_management'] = ((org_data.management*20) + 
                                         (org_data.person_you_work_for*20) + 
                                         org_data.ceo_score + 
                                         org_data.executive_team + 
                                         org_data.leadership + 
                                         org_data.manager) / 6
org_data.drop(columns=['management',
                       'person_you_work_for',
                       'ceo_score',
                       'executive_team',
                       'leadership',
                       'manager'], axis=1, inplace=True)

In [96]:
org_data['pay_perks_and_benefits'] = ((org_data.work_life_balance*20) + 
                                      (org_data.pay_and_benefits*20) + 
                                      org_data.perks_and_benefits + 
                                      org_data.compensation + 
                                      (org_data.rewards*20)) / 4
org_data.drop(columns=['work_life_balance',
                       'pay_and_benefits',
                       'perks_and_benefits',
                       'compensation',
                       'rewards'], axis=1, inplace=True)

In [97]:
org_data['day_to_day'] = ((org_data.coworkers*20) + 
                          (org_data.way_you_work*20) + 
                          org_data.team + 
                          org_data.meetings + 
                          (org_data.work_setting*20)) / 5
org_data.drop(columns=['coworkers',
                       'way_you_work',
                       'team',
                       'meetings',
                       'work_setting'], axis=1, inplace=True)

In [98]:
org_data['mental_health'] = ((org_data.support*20) +
                             org_data.outlook + 
                             org_data.happiness + 
                             org_data.retention) / 4
org_data.drop(columns=['support',
                       'outlook',
                       'happiness',
                       'retention'], axis=1, inplace=True)

In [99]:
org_data['diversity_inclusion'] = (org_data.diversity + 
                                   org_data.gender) / 2
org_data.drop(columns = ['diversity','gender'], axis=1, inplace=True)

In [100]:
org_data.rename(columns={'net_promoter_scale': 'refer_to_friend'}, inplace=True)

In [102]:
org_data.drop(columns='uppers',axis=1, inplace=True)

In [103]:
org_data

Unnamed: 0,company,downers,refer_to_friend,environment,culture,opportunity,leadership_and_management,pay_perks_and_benefits,day_to_day,mental_health,diversity_inclusion
0,Apple,some employees say that promotion and finding ...,24.0,73.0,77.0,68.666667,76.0,98.25,80.6,77.0,73.0


In [106]:
def check_downers_list(input_str):
    split_string = input_str.split()
    presence = 0
    downers_list = ['discrimination',
                    'safety',
                    'diversity',
                    'training',
                    'development',
                    'fair']
    for i in split_string:
        if i in downers_list:
            presence = 1
        else:
            pass
    
    return presence

In [107]:
org_data.downers = org_data.downers.apply(lambda x: check_downers_list(x))

In [108]:
org_data

Unnamed: 0,company,downers,refer_to_friend,environment,culture,opportunity,leadership_and_management,pay_perks_and_benefits,day_to_day,mental_health,diversity_inclusion
0,Apple,0,24.0,73.0,77.0,68.666667,76.0,98.25,80.6,77.0,73.0
