# Question: Does the company treat its employees well? 

This section relates to cultural aspects of the companies being examined, including the chief concern: how employees are treated. We also consider worker safety; fair pay and benefits; opportunities for development, training, and advancement; and other aspects that impact the company's workers.

Employees are the lifeblood of a company. Happy, healthy, and valued employees are more willing and able to do higher-quality work, be enthusiastic brand evangelists, unleash their creativity to invent better services and solutions, and innovate to improve the company. Employees should be viewed as valuable assets to invest in continually, not expendable "resources" or drags on the bottom line. (The "bottom line" is net profit -- a company's income after all expenses have been deducted from revenue.)

Companies that excel at engaging their employees actually achieve per-share earnings growth more than four times that of their rivals, according to Gallup. Compared to companies in the bottom quartile, the top-quartile companies (based on employee engagement) generate higher customer engagement, higher productivity, better retention, fewer accidents, and 21% higher profitability.

Even though Wall Street tends to cheer when companies lay off workers, high employee turnover is actually an expense to be avoided. Not only is it a financial cost -- think about severance packages, and the costs of recruiting and training new employees as well as retraining remaining workers -- but the loss of intellectual capital is also a poor outcome for employers.

Company websites and sustainability reports can help you assess this factor. Also look for publications from organizations that rate companies on worker treatment, such as Fortune's annual list of "100 Best Companies to Work For" and Forbes' "Just 100" list.

We also look for negative elements, like shoddy employment treatment; contentious relationships with unions; lawsuits or controversies about discrimination; harassment or wage theft; and other behavior that indicates poor employee relations, like serial layoffs or constant restructuring. These kinds of red flags might disqualify a company for inclusion in our ESG portfolio.

Source:  
https://www.fool.com/investing/2019/04/09/going-for-great-returns-and-the-greater-good-fools.aspx

# Data Sources

**Indeed**  

https://www.indeed.com/companies

Metrics Available:
- Work-Life Balance
- Pay & Benefits
- Job Security & Advancement
- Management
- Culture

**Comparably**  

https://www.comparably.com/companies

Metrics Available:
- Culture Score
- CEO Score
- Perks And Benefits
- Executive Team
- Outlook
- Work Culture
- Compensation
- Leadership
- Diversity
- Team
- Happiness
- Environment
- Gender
- Manager
- Retention
- Meetings
- Professional Development
- Office Culture


# Imports

In [2]:
# All necessary imports
from bs4 import BeautifulSoup
import requests

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys

import numpy as np
import pandas as pd
pd.set_option("display.max_columns", 50)
pd.set_option("display.max_rows", 100)

import re
import time
import pickle

import warnings
warnings.simplefilter("ignore")

# Getting Companies

In [3]:
# Load in pickled list of companies and corresponding tickers
with open('./generated_data/companies_n_tickers.pickle','rb') as f:
    companies_n_tickers = pickle.load(f)

# Getting Employee Data 

In [6]:
'''
Because of the likelihood and trouble getting blocked by Indeed and 
Comparably, I used randomized delays to decrease the bot-like nature
of the scraping tool.
'''
delays = [5, 2, 4, 7, 8, 9]
delay = np.random.choice(delays)

## Indeed

In [7]:
def get_indeed_reviews(companies):
    
    '''
    This function takes in a list of companie and returns a 
    data frame including all available Indeed metrics
    '''
    
    # Create empty dataframe
    indeed_reviews = pd.DataFrame(columns=['company','work_life_balance','pay_and_benefits','job_security_and_advancement','management','culture','composite_score'])
    
    # Set path to chromedriver
    PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
    # Define options 
    options = Options()
    # Remove pop up window
    options.add_argument("--headless")
    # Define driver
    driver = webdriver.Chrome(PATH, options=options)
    # # Define driver
    # driver = webdriver.Chrome(PATH)
    driver.set_window_size(1080,800)
    # Define url
    url= "https://www.indeed.com/companies?from=gnav-acme--discovery-webapp"
    # Get website
    driver.get(url)
    
    for a,b in companies:
        try:
            # Find search bar
            search_bar = driver.find_element_by_xpath('//*[@id="exploreCompaniesWhat"]')
            # Clear search bar
            search_bar.clear()
            # Random time delay
            time.sleep(delay)
            # Enter company name into search bar
            search_bar.send_keys(a[:3])
            # Random time delay
            time.sleep(delay)
            # Enter company name into search bar
            search_bar.send_keys(a[3:])
            # Random time delay
            time.sleep(delay)
            # Search company
            search_bar.send_keys(Keys.ENTER)
            search_bar.send_keys(Keys.ENTER)
            # Random time delay
            time.sleep(delay)
            try:
                # Get company
                driver.find_element_by_xpath('//*[@id="cmp-discovery"]/div[2]/div/div[2]/div/div[1]/div[2]').click()
                # Random time delay
                time.sleep(delay)
            except:
                time.sleep(delay)
                exit_button = driver.find_element_by_xpath('//*[@id="popover-x"]/button')
                exit_button.click()
            time.sleep(delay)
            # Define variables for employee ratings
            try:
                # Random time delay
                time.sleep(delay)
                work_life_balance = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[1]/a/span[1]').text)
            except:
                work_life_balance = 'NaN'
            try:
                # Random time delay
                time.sleep(delay)
                pay_and_benefits = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[2]/a/span[1]').text)
            except:
                # Random time delay
                time.sleep(delay)
                pay_and_benefits = 'NaN'
            try:
                # Random time delay
                time.sleep(delay)
                job_security_and_advancement = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[3]/a/span[1]').text)
            except:
                job_security_and_advancement = 'NaN'
            try:
                # Random time delay
                time.sleep(delay)
                management = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[4]/a/span[1]').text)
            except:
                management = 'NaN'
            try:
                # Random time delay
                time.sleep(delay)
                culture = float(driver.find_element_by_xpath('//*[@id="cmp-container"]/div/div[1]/main/div/div[2]/div[1]/div[2]/div[2]/div[5]/a/span[1]').text)
            except:
                culture = 'NaN'
            try:
                # Random time delay
                time.sleep(delay)
                composite_score = (work_life_balance+pay_and_benefits+job_security_and_advancement+management+culture) / 25
            except:
                composite_score = 'NaN'
            # Append new data to dataframe
            indeed_reviews = indeed_reviews.append({'company': a,
                                                    'work_life_balance': work_life_balance,
                                                    'pay_and_benefits': pay_and_benefits,
                                                    'job_security_and_advancement': job_security_and_advancement,
                                                    'management': management,
                                                    'culture': culture,
                                                    'composite_score': composite_score}, ignore_index=True)
        
            time.sleep(delay)
            try:
                driver.find_element_by_link_text('Company reviews').click()
            except:
                time.sleep(12)
                exit_button = driver.find_element_by_xpath('//*[@id="popover-x"]/button')
                exit_button.click()
                time.sleep(delay)
                driver.find_element_by_link_text('Company reviews').click()
            time.sleep(delay)
            
        except:
            # Random time delay
            time.sleep(delay)
            driver.find_element_by_link_text('Company reviews').click()
            time.sleep(delay)

        print(a)
    
    return indeed_reviews
    

In [8]:
# Get Indeed reviews
indeed_reviews = get_indeed_reviews(companies_n_tickers)

3M Company
Abbott Laboratories
AbbVie Inc.
Abiomed
Accenture
Activision Blizzard
Adobe Inc.
Advanced Micro Devices
Advance Auto Parts
AES Corp
Aflac
Agilent Technologies
Air Products & Chemicals
Akamai Technologies
Alaska Air Group
Albemarle Corporation
Alexandria Real Estate Equities
Alexion Pharmaceuticals
Align Technology
Allegion
Alliant Energy
Allstate Corp
Altria Group Inc
Amazon.com Inc.
Amcor plc
Ameren Corp
American Airlines Group
American Electric Power
American Express
American International Group
American Tower Corp.
American Water Works
Ameriprise Financial
AmerisourceBergen
Ametek
Amgen Inc.
Amphenol Corp
Analog Devices, Inc.
ANSYS, Inc.
Anthem
Aon plc
A.O. Smith Corp
APA Corporation
Apple Inc.
Applied Materials Inc.
Aptiv PLC
Archer-Daniels-Midland Co
Arista Networks
Arthur J. Gallagher & Co.
Assurant
AT&T Inc.
Atmos Energy
Autodesk Inc.
Automatic Data Processing
AutoZone Inc
AvalonBay Communities
Avery Dennison Corp
Baker Hughes Co
Ball Corp
Bank of America Corp
The Ban

West Pharmaceutical Services
Western Digital
Western Union Co
WestRock
Weyerhaeuser
Whirlpool Corp.
Williams Companies
Willis Towers Watson
Wynn Resorts Ltd
Xcel Energy Inc
Xerox
Xilinx
Xylem Inc.
Yum! Brands Inc
Zebra Technologies
Zimmer Biomet
Zions Bancorp
Zoetis


In [9]:
# Checking Indeed reviews data frame
indeed_reviews

Unnamed: 0,company,work_life_balance,pay_and_benefits,job_security_and_advancement,management,culture,composite_score
0,3M Company,3.8,3.9,3.5,3.5,3.8,0.74
1,Abbott Laboratories,3.8,3.9,3.4,3.5,,
2,AbbVie Inc.,3.8,4,3.4,3.5,3.7,0.736
3,Abiomed,,,,3.2,3.5,
4,Accenture,3.7,3.6,3.8,3.6,3.9,0.744
...,...,...,...,...,...,...,...
478,Yum! Brands Inc,3.4,3.4,3.3,3.4,,
479,Zebra Technologies,,,,3.3,3.5,
480,Zimmer Biomet,3.5,3.6,3.1,3.1,,
481,Zions Bancorp,3.6,3.2,3.1,3.2,3.4,0.66


## Comparably

In [12]:
def get_comparably_reviews(companies):
    '''
    This function takes in a list of companies and returns a 
    data frame with available metrics crom Comparably
    '''
    
    comparably_reviews = pd.DataFrame(columns=['company','culture','ceo_score','net_promoter_scale','perks_and_benefits','outlook','executive_team','work_culture','compensation','leadership','diversity','team','happiness','environment','gender','manager','retention','meetings','professional_development','office_culture'])
    
    # Set path to chromedriver
    PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
    # Define options 
    options = Options()
    # Remove pop up window
    options.add_argument("disable-infobars")
    options.add_argument("--headless")
    # Define driver
    driver = webdriver.Chrome(PATH, options=options)
    driver.set_window_size(1080,800)
    # Define url
    url= "https://www.comparably.com/"
    # Get website
    driver.get(url)
    time.sleep(delay)
    for company,ticker in companies:
        try:
            # Find search bar
            search_bar = driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div/div/div/div/form/div/div/input')
            # Clear search bar
            search_bar.clear()
            time.sleep(delay)
            # Enter company name into search bar
            search_bar.send_keys(company[:3])
            time.sleep(delay)
            search_bar.send_keys(company[3:])
            # Search company
            search_bar.send_keys(Keys.ENTER)
            time.sleep(delay)
            # Find first company listed
            likely_company = driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div/div[2]/div[1]/div/div[2]/a[1]/div/div/div[1]')
            # Click on company
            likely_company.click()
            time.sleep(delay)
            try:
                driver.find_element_by_xpath('/html/body/div[4]/a').click()
            except:
                pass
            try:
                culture = float(driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[4]/div/div/div/div/div[1]/div[1]/a/div[2]/div[1]/div[2]/b').text)
            except:
                culture - 'NaN'
            try:
                ceo_score = float(driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[4]/div/div/div/div/div[1]/div[3]/a/div[4]/span/span[1]/span').text)
            except:
                ceo_score = 'NaN'
            try:
                net_promoter_scale = float(driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[4]/div/div/div/div/div[2]/div[2]/a[3]/div[2]/div/div[2]/div[1]').text)
            except:
                net_promoter_scale = 'NaN'
            time.sleep(delay)
            try:
                try:
                    driver.find_element_by_link_text('Culture').click()
                except:
                    driver.find_element_by_xpath('//*[@id="bodyContent"]/div[2]/div[2]/div[3]/div/div/ul/li[2]/div/a').click()
                time.sleep(delay)
                # Getting page content
                content = driver.page_source.encode('utf-8').strip()
                # Getting page content in html
                soup = BeautifulSoup(content,"html.parser")
                # perks_and_benefits = soup.find_all("a", href=re.compile("perks-and-benefits"))
                table = soup.find(class_="gs-row offset cppCultureGrades-Grades")
                ratings_squares = table.find_all(class_="gs-col gs-col-1-2")
                for i in range(0,18):
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Perks And Benefits':
                        perks_and_benefits = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Outlook':
                        outlook = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Executive Team':
                        executive_team = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Work Culture':
                        work_culture = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Compensation':
                        compensation = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Leadership':
                        leadership = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Diversity':
                        diversity = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Team':
                        team = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Happiness':
                        happiness = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Environment':
                        environment = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Gender':
                        gender = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Manager':
                        manager = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Retention':
                        retention = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Meetings':
                        meetings = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Professional Development':
                        professional_development = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                    if ratings_squares[i].find(class_ = 'section-subtitle').text == 'Office Culture':
                        office_culture = float(ratings_squares[i].find(class_ = 'numberGrade-score').text)
                        
                if perks_and_benefits is None:
                    perks_and_benefits = 'NaN'
                if outlook is None:
                    outlook = 'NaN'
                if executive_team is None:
                    executive_team = 'NaN'
                if work_culture is None:
                    work_culture = 'NaN'
                if compensation is None:
                    compensation = 'NaN'
                if leadership is None:
                    leadership = 'NaN'
                if diversity is None:
                    diversity = 'NaN'
                if team is None:
                    team = 'NaN'
                if happiness is None:
                    happiness = 'NaN'
                if environment is None:
                    environment = 'NaN'
                if gender is None:
                    gender = 'NaN'
                if manager is None:
                    manager = 'NaN'
                if retention is None:
                    retention = 'NaN'
                if meetings is None:
                    meetings = 'NaN'
                if professional_development is None:
                    professional_development = 'NaN'
                if office_culture is None:
                    office_culture = 'NaN'

                comparably_reviews = comparably_reviews.append({'company': company,
                                                                'culture': culture,
                                                                'ceo_score': ceo_score,
                                                                'net_promoter_scale': net_promoter_scale,
                                                                'perks_and_benefits': perks_and_benefits,
                                                                'outlook': outlook,
                                                                'executive_team': executive_team,
                                                                'work_culture': work_culture,
                                                                'compensation': compensation,
                                                                'leadership': leadership,
                                                                'diversity': diversity,
                                                                'team': team,
                                                                'happiness': happiness,
                                                                'environment': environment,
                                                                'gender': gender,
                                                                'manager': manager,
                                                                'retention': retention,
                                                                'meetings': meetings,
                                                                'professional_development': professional_development,
                                                                'office_culture': office_culture}, ignore_index=True)
                
            except:
                pass
        
        except:
            comparably_reviews = comparably_reviews.append({'company': company,
                                                            'culture': 'NaN',
                                                            'ceo_score': 'NaN',
                                                            'net_promoter_scale': 'NaN',
                                                            'perks_and_benefits': 'NaN',
                                                            'outlook': 'NaN',
                                                            'executive_team': 'NaN',
                                                            'work_culture': 'NaN',
                                                            'compensation': 'NaN',
                                                            'leadership': 'NaN',
                                                            'diversity': 'NaN',
                                                            'team': 'NaN',
                                                            'happiness': 'NaN',
                                                            'environment': 'NaN',
                                                            'gender': 'NaN',
                                                            'manager': 'NaN',
                                                            'retention': 'NaN',
                                                            'meetings': 'NaN',
                                                            'professional_development': 'NaN',
                                                            'office_culture': 'NaN'}, ignore_index=True)

        time.sleep(delay)
        driver.find_element_by_xpath('//*[@id="body"]/header/div/div/div[1]/div/a').click()
        time.sleep(delay)
        print(company)
        
    return comparably_reviews

In [None]:
# Getting Comparably reviews
comparably_reviews = get_comparably_reviews(companies_n_tickers)

3M Company
Abbott Laboratories
AbbVie Inc.
Abiomed
Accenture
Activision Blizzard
Adobe Inc.
Advanced Micro Devices
Advance Auto Parts
AES Corp
Aflac
Agilent Technologies
Air Products & Chemicals
Akamai Technologies
Alaska Air Group
Albemarle Corporation
Alexandria Real Estate Equities
Alexion Pharmaceuticals
Align Technology
Allegion
Alliant Energy
Allstate Corp
Altria Group Inc
Amazon.com Inc.
Amcor plc
Ameren Corp
American Airlines Group
American Electric Power
American Express
American International Group
American Tower Corp.
American Water Works
Ameriprise Financial
AmerisourceBergen
Ametek
Amgen Inc.
Amphenol Corp
Analog Devices, Inc.
ANSYS, Inc.
Anthem
Aon plc
A.O. Smith Corp
APA Corporation
Apple Inc.
Applied Materials Inc.
Aptiv PLC
Archer-Daniels-Midland Co
Arista Networks
Arthur J. Gallagher & Co.
Assurant
AT&T Inc.
Atmos Energy
Autodesk Inc.
Automatic Data Processing
AutoZone Inc
AvalonBay Communities
Avery Dennison Corp
Baker Hughes Co
Ball Corp
Bank of America Corp
The Ban

In [None]:
# Checking Comprarably Reviews
comparably_reviews

# Making Final Composite Metric

In [188]:
# List of data frames
data_frames = [indeed_reviews, comparably_reviews]

# Merged data frame
data = reduce(lambda  left,right: pd.merge(left,right,on=['company'],how='outer'), data_frames)

In [190]:
def combine_employee_metrics(df_row,input_list):
    
    available_metrics = []
    for i in input_list:
        if df_row[i] != 'NaN':
            available_metrics.append(i)
    
    total = 0
    for i in available_metrics:
        if df_row[i] > 5:
            total += df_row[i]
        else:
            total += (df_row[i]*20)
            
    if len(available_metrics) > 0:
        metric = total / len(available_metrics)
    else:
        metric = 'NaN'
    
    return metric

In [191]:
culture_list = ['culture_x','culture_y','office_culture','work_culture','meetings','team']
opportunity_list = ['job_security_and_advancement','professional_development']
leadership_and_management_list = ['management','ceo_score','executive_team','leadership','manager']
pay_perks_and_benefits_list = ['work_life_balance','pay_and_benefits','compensation']
employee_happiness = ['outlook','happiness','retention','diversity','gender','environment']

df_columns = [('company_culture',culture_list),
              ('company_opportunity',opportunity_list),
              ('company_benefits_and_perks',pay_perks_and_benefits_list),
              ('company_executive_team',leadership_and_management_list),
              ('company_employee_treatment',employee_happiness)]

for column,column_list in df_columns:
    data[column] = 'NaN'
    for i in range(0,len(data)):
        data[column][i] = combine_employee_metrics(data.iloc[i], column_list) 
        
data = data[['company','company_culture','company_opportunity','company_benefits_and_perks','company_executive_team','company_employee_treatment']]

In [193]:
def overall_employee_metric(df_row):
    
    rows = ['company_culture',
            'company_opportunity',
            'company_benefits_and_perks',
            'company_executive_team',
            'company_employee_treatment']
    
    final_total = 0
    count = 0
    for column in rows:
        if df_row[column] is not 'NaN':
            final_total += df_row[column]
            count +=1
    
    final_metric = final_total / count
        
    return final_metric

In [194]:
data['final_metric'] = 0
for i in range(0,len(data)):
    try:
        data['final_metric'][i] = overall_employee_metric(data.iloc[i])
    except:
        data['final_metric'][i] = 'NaN'

In [197]:
data.final_metric = data.final_metric.apply(lambda x: 0 if x == 'NaN' else x) 

In [198]:
data = data.sort_values(by='final_metric', ascending=False)

In [204]:
data[:20]

Unnamed: 0,company,company_culture,company_opportunity,company_benefits_and_perks,company_executive_team,company_employee_treatment,final_metric
32,American Water Works,84.0,84,82.0,84.0,,83.5
38,"Analog Devices, Inc.",84.0,78,83.0,78.0,,80.75
44,Apple Inc.,82.0,74,79.0,74.0,,77.25
29,American Express,80.0,72,80.0,74.0,,76.5
53,Autodesk Inc.,78.0,66,82.0,,,75.333333
23,Altria Group Inc,76.0,70,83.0,72.0,,75.25
67,Biogen Inc.,,66,84.0,,,75.0
64,Berkshire Hathaway,78.0,70,,76.0,,74.666667
48,Arista Networks,80.0,70,77.0,70.0,,74.25
4,Accenture,78.0,76,73.0,72.0,,74.0


# EXTRA

## Glassdoor Scraper

In [165]:
# # Set path to chromedriver
# PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
# # Define options 
# options = Options()
# # Make headless
# options.add_argument("--headless")
# # Define driver
# driver = webdriver.Chrome(PATH, options=options)
# # Set headless window size
# driver.set_window_size(1080,800)
# # Define url
# url= "https://www.glassdoor.com/index.htm"
# # Get website
# driver.get(url)
# # Random time delay
# time.sleep(delay)
# # Signing into glassdoor with google
# driver.find_element_by_xpath('//*[@id="InlineLoginModule"]/div/div/div/div/div/div[1]/div/div[2]/button/span[2]').click()
# # Random time delay
# time.sleep(delay)
# # Switching to google signin popup window
# driver.switch_to.window(driver.window_handles[-1])
# # Random time delay
# time.sleep(delay)
# # Input email 
# email_input = driver.find_element_by_xpath('//*[@id="identifierId"]')
# email_input.send_keys('xxxxxxxxxxx@gmail.com')
# # Random time delay
# time.sleep(delay)
# # Clicking next
# driver.find_element_by_xpath('//*[@id="identifierNext"]/div/button/div[2]').click()
# # Random time delay
# time.sleep(delay)
# # Inputing password
# password = driver.find_element_by_xpath('//*[@id="password"]/div[1]/div/div[1]/input')
# password.send_keys('xxxxxxxxx')
# # Random time delay
# time.sleep(delay)
# # Clicking next
# driver.find_element_by_xpath('//*[@id="passwordNext"]/div/button/div[2]').click()
# # Random time delay
# time.sleep(15)
# # Switching back to glassdoor window
# driver.switch_to.window(driver.window_handles[-1])
# # Random time delay
# time.sleep(delay)
# # Clicking companies for targeted search
# driver.find_element_by_xpath('//*[@id="SiteNav"]/nav[2]/div/div/div[2]/div[2]/div[1]/div/a/div/h3').click()
# # Random time delay
# time.sleep(delay)
# # Entering company into search bar
# search_bar = driver.find_element_by_xpath('//*[@id="sc.keyword"]')
# search_bar.send_keys('Apple Inc')
# # Random time delay
# time.sleep(delay)
# # Searching company
# search_bar.send_keys(Keys.ENTER)
# # Random time delay
# time.sleep(delay)
# # Clicking on searched company
# search_result = driver.find_element_by_xpath('//*[@id="MainCol"]/div/div[1]/div/div[1]/div/div[2]/h2/a')
# search_result.click()
# # Random time delay
# time.sleep(delay)
# # Clicking on reviews
# driver.find_element_by_xpath('//*[@id="EIProductHeaders"]/div/a[1]/span[2]').click()

## Career Bliss Scraper

In [260]:
# def get_careerbliss_reviews(companies):
    
#     careerbliss_reviews = pd.DataFrame(columns=['company','company_culture','coworkers','rewards','way_you_work','growth_opp','person_you_work_for','support','work_setting'])
    
#     # Set path to chromedriver
#     PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
#     # Define options 
#     options = Options()
#     # Remove pop up window
#     options.add_argument("disable-infobars")
#     options.add_argument("--headless")
#     # Define driver
#     driver = webdriver.Chrome(PATH, options=options)
#     # # Define driver
#     # driver = webdriver.Chrome(PATH)
#     driver.set_window_size(1080,800)
#     # Define url
#     url= "https://www.careerbliss.com/reviews/"
#     # Get website
#     driver.get(url)
#     time.sleep(2)
#     for company in companies:
#         try:
#             # Find search bar
#             search_bar = driver.find_element_by_xpath('//*[@id="search-q"]')
#             # Clear search bar
#             search_bar.clear()
#             # Enter company name into search bar
#             search_bar.send_keys(company)
#             time.sleep(3)
#             # Search company
#             search_bar.send_keys(Keys.ENTER)
#             time.sleep(4)
#             try:
#                 # Find first company listed
#                 likely_company = driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div[1]/div[3]/div[2]/div[2]/div[2]/div/div/span/a')
#                 # Click on company
#                 likely_company.click()
#             except:
#                 driver.refresh()
#                 time.sleep(3)
#                 # Find first company listed
#                 likely_company = driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div[1]/div[3]/div[2]/div[2]/div[2]/div/div/span/a')
#                 # Click on company
#                 likely_company.click()
#             time.sleep(3)
#             company_culture = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[1]/div[1]/div/span[3]').text)
#             coworkers = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[2]/div[1]/div/span[3]').text)
#             rewards = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[3]/div[1]/div/span[3]').text)
#             way_you_work = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[4]/div[1]/div/span[3]').text)
#             growth_opp = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[1]/div[2]/div/span[3]').text)
#             person_you_work_for = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[2]/div[2]/div/span[3]').text)
#             support = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[3]/div[2]/div/span[3]').text)
#             work_setting = float(driver.find_element_by_xpath('//*[@id="bodyContainer"]/div/div/div[6]/div[4]/div[2]/div/span[3]').text)
#         except:
#             pass
#         time.sleep(2)
#         driver.back()
#         time.sleep(2)
#         driver.back()
#         time.sleep(2)
        
#         careerbliss_reviews = careerbliss_reviews.append({'company': company,
#                                                           'company_culture': company_culture,
#                                                           'coworkers': coworkers,
#                                                           'rewards': rewards,
#                                                           'way_you_work': way_you_work,
#                                                           'growth_opp': growth_opp,
#                                                           'person_you_work_for': person_you_work_for,
#                                                           'support': support,
#                                                           'work_setting': work_setting}, ignore_index=True)
    
#     return careerbliss_reviews

## Vault

In [258]:
# def get_vault_reviews(companies):
    
#     vault_reviews = pd.DataFrame(columns=['company','uppers','downers'])
    
#     # Set path to chromedriver
#     PATH = "/Users/MichaelWirtz/Desktop/pathfile/chromedriver_2"
#     # Define options 
#     options = Options()
#     # Remove pop up window
#     options.add_argument("disable-infobars")
# #     options.add_argument("--headless")
#     # Define driver
#     driver = webdriver.Chrome(PATH, options=options)
#     # # Define driver
#     # driver = webdriver.Chrome(PATH)
#     driver.set_window_size(1080,800)
#     # Define url
#     url= "https://www.vault.com/"
#     # Get website
#     driver.get(url)
#     time.sleep(2)

#     for company in companies:
#         # Find search bar
#         search_bar = driver.find_element_by_xpath('//*[@id="HeroSearchBox"]')
#         # Clear search bar
#         search_bar.clear()
#         # Enter company name into search bar
#         search_bar.send_keys(company)
#         time.sleep(3)
#         # Search company
#         search_bar.send_keys(Keys.ENTER)
#         time.sleep(4)
#         # Find first company listed
#         likely_company = driver.find_element_by_xpath('//*[@id="feed-article-1"]/div/h2/a')
#         # Click on company
#         likely_company.click()
#         time.sleep(3)
#         # Get company positives
#         uppers = (driver.find_element_by_xpath('//*[@id="main-content"]/div[1]/div[2]/div[1]/div[1]').text).replace('\n',' ').lstrip('Uppers').strip()
#         # Get company negatives
#         downers = (driver.find_element_by_xpath('//*[@id="main-content"]/div[1]/div[2]/div[1]/div[2]').text).replace('\n',' ').lstrip('Downers').strip()
        
#         # Append new data to dataframe
#         vault_reviews = vault_reviews.append({'company': company,
#                                                 'uppers': uppers,
#                                                 'downers': downers}, ignore_index=True)
        
#         time.sleep(2)
#         driver.back()
#         time.sleep(1)
#         driver.back()
#         time.sleep(3)
    
#     vault_reviews.uppers = vault_reviews.uppers.apply(lambda x: x.lower())
#     vault_reviews.downers = vault_reviews.downers.apply(lambda x: x.lower())
        
#     return vault_reviews   