# Washington Post Newspaper Guild Pay Study 2022


This is the study of Washington Post Guild members' salaries based on data turned over by management of The Washington Post on May 28, 2021, pursuant to a request by members of the Guild. Management turned over two Excel files: one file detailing the salaries of current guild members working for The Post (as of the date of transmission) and one file detailing the salaries of past guild members who worked for The Post and have left the organization in the past six years.

What follows is an attempt to understand pay at The Washington Post. No individual analysis should be taken on its own to mean that disparities in pay do or do not exist. This study will start with summary analysis of trends and will dive deeper as the study goes on. 

The only data manipulation done prior to analysis was taking the data out of Excel and putting the files into CSV files, converting dates from 'MM/DD/YYYY' to 'YYYY-MM-DD' and removing commas from monetary columns where values exceeded 1,000.

## Importing data

In [1]:
from pathlib import Path

import re
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
from linearmodels.iv import IV2SLS
import seaborn as sns

pd.options.display.max_columns = None

pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [2]:
BASEDIR = Path.cwd()
CSVPATH = BASEDIR.joinpath('csvs')

In [3]:
active_wd_schema = {
    'department': str,
    'employee_id': str,
    'gender': str,
    'race_ethnicity': str,
    'education': str,
    'military_status': str,
    'date_of_birth': str,
    'original_hire_date': str,
    'hire_date': str,
    'pay_rate_type': str,
    'current_base_pay': np.float64,
    'job_profile_current': str,
    'time_type_current': str,
    'cost_center_current': str,
    'effective_date1': str,
    'business_process_type1': str,
    'business_process_reason1': str,
    'pay_rate_type1': str,
    'base_pay_change1': np.float64,
    'job_profile1': str,
    'time_type1': str,
    'cost_center1': str,
    'effective_date2': str,
    'business_process_type2': str,
    'business_process_reason2': str,
    'pay_rate_type2': str,
    'base_pay_change2': np.float64,
    'job_profile2': str,
    'time_type2': str,
    'cost_center2': str,
    'effective_date3': str,
    'business_process_type3': str,
    'business_process_reason3': str,
    'pay_rate_type3': str,
    'base_pay_change3': np.float64,
    'job_profile3': str,
    'time_type3': str,
    'cost_center3': str,
    'effective_date4': str,
    'business_process_type4': str,
    'business_process_reason4': str,
    'pay_rate_type4': str,
    'base_pay_change4': np.float64,
    'job_profile4': str,
    'time_type4': str,
    'cost_center4': str,
    'effective_date5': str,
    'business_process_type5': str,
    'business_process_reason5': str,
    'pay_rate_type5': str,
    'base_pay_change5': np.float64,
    'job_profile5': str,
    'time_type5': str,
    'cost_center5': str,
    'effective_date6': str,
    'business_process_type6': str,
    'business_process_reason6': str,
    'pay_rate_type6': str,
    'base_pay_change6': np.float64,
    'job_profile6': str,
    'time_type6': str,
    'cost_center6': str,
    'effective_date7': str,
    'business_process_type7': str,
    'business_process_reason7': str,
    'pay_rate_type7': str,
    'base_pay_change7': np.float64,
    'job_profile7': str,
    'time_type7': str,
    'cost_center7': str,
    'effective_date8': str,
    'business_process_type8': str,
    'business_process_reason8': str,
    'pay_rate_type8': str,
    'base_pay_change8': np.float64,
    'job_profile8': str,
    'time_type8': str,
    'cost_center8': str,
    'effective_date9': str,
    'business_process_type9': str,
    'business_process_reason9': str,
    'pay_rate_type9': str,
    'base_pay_change9': np.float64,
    'job_profile9': str,
    'time_type9': str,
    'cost_center9': str,
    'effective_date10': str,
    'business_process_type10': str,
    'business_process_reason10': str,
    'pay_rate_type10': str,
    'base_pay_change10': np.float64,
    'job_profile10': str,
    'time_type10': str,
    'cost_center10': str,
    'effective_date11': str,
    'business_process_type11': str,
    'business_process_reason11': str,
    'pay_rate_type11': str,
    'base_pay_change11': np.float64,
    'job_profile11': str,
    'time_type11': str,
    'cost_center11': str,
    'effective_date12': str,
    'business_process_type12': str,
    'business_process_reason12': str,
    'pay_rate_type12': str,
    'base_pay_change12': np.float64,
    'job_profile12': str,
    'time_type12': str,
    'cost_center12': str,
    'effective_date13': str,
    'business_process_type13': str,
    'business_process_reason13': str,
    'pay_rate_type13': str,
    'base_pay_change13': np.float64,
    'job_profile13': str,
    'time_type13': str,
    'cost_center13': str,
    'effective_date14': str,
    'business_process_type14': str,
    'business_process_reason14': str,
    'pay_rate_type14': str,
    'base_pay_change14': np.float64,
    'job_profile14': str,
    'time_type14': str,
    'cost_center14': str,
    'effective_date15': str,
    'business_process_type15': str,
    'business_process_reason15': str,
    'pay_rate_type15': str,
    'base_pay_change15': np.float64,
    'job_profile15': str,
    'time_type15': str,
    'cost_center15': str,
    'effective_date16': str,
    'business_process_type16': str,
    'business_process_reason16': str,
    'pay_rate_type16': str,
    'base_pay_change16': np.float64,
    'job_profile16': str,
    'time_type16': str,
    'cost_center16': str,
    'effective_date17': str,
    'business_process_type17': str,
    'business_process_reason17': str,
    'pay_rate_type17': str,
    'base_pay_change17': np.float64,
    'job_profile17': str,
    'time_type17': str,
    'cost_center17': str,
    'effective_date18': str,
    'business_process_type18': str,
    'business_process_reason18': str,
    'pay_rate_type18': str,
    'base_pay_change18': np.float64,
    'job_profile18': str,
    'time_type18': str,
    'cost_center18': str,
    'effective_date19': str,
    'business_process_type19': str,
    'business_process_reason19': str,
    'pay_rate_type19': str,
    'base_pay_change19': np.float64,
    'job_profile19': str,
    'time_type19': str,
    'cost_center19': str,
    'effective_date20': str,
    'business_process_type20': str,
    'business_process_reason20': str,
    'pay_rate_type20': str,
    'base_pay_change20': np.float64,
    'job_profile20': str,
    'time_type20': str,
    'cost_center20': str,
    'effective_date21': str,
    'business_process_type21': str,
    'business_process_reason21': str,
    'pay_rate_type21': str,
    'base_pay_change21': np.float64,
    'job_profile21': str,
    'time_type21': str,
    'cost_center21': str,
    'effective_date22': str,
    'business_process_type22': str,
    'business_process_reason22': str,
    'pay_rate_type22': str,
    'base_pay_change22': np.float64,
    'job_profile22': str,
    'time_type22': str,
    'cost_center22': str,
    'effective_date23': str,
    'business_process_type23': str,
    'business_process_reason23': str,
    'pay_rate_type23': str,
    'base_pay_change23': np.float64,
    'job_profile23': str,
    'time_type23': str,
    'cost_center23': str,
    'effective_date24': str,
    'business_process_type24': str,
    'business_process_reason24': str,
    'pay_rate_type24': str,
    'base_pay_change24': np.float64,
    'job_profile24': str,
    'time_type24': str,
    'cost_center24': str,
    'effective_date25': str,
    'business_process_type25': str,
    'business_process_reason25': str,
    'job_profile25': str,
    'cost_center25': str,
    'effective_date26': str,
    'business_process_type26': str,
    'business_process_reason26': str,
    'job_profile26': str,
    'cost_center26': str,
    '2008_annual_performance_rating': np.float64,
    '2009_annual_performance_rating': np.float64,
    '2010_annual_performance_rating': np.float64,
    '2011_annual_performance_rating': np.float64,
    '2012_annual_performance_rating': np.float64,
    '2013_annual_performance_rating': np.float64,
    '2014_annual_performance_rating': np.float64,
    '2015_annual_performance_rating': np.float64,
    '2016_annual_performance_rating': np.float64,
    '2017_annual_performance_rating': np.float64,
    '2018_annual_performance_rating': np.float64,
    '2019_annual_performance_rating': np.float64,
    '2020_annual_performance_rating': np.float64
}

parse_dates = ['date_of_birth', 'original_hire_date', 'hire_date','effective_date1','effective_date2','effective_date3','effective_date4','effective_date5','effective_date6','effective_date7','effective_date8','effective_date9','effective_date10','effective_date11','effective_date12','effective_date13','effective_date14','effective_date15','effective_date16','effective_date17','effective_date18','effective_date19','effective_date20','effective_date21','effective_date22','effective_date23','effective_date24','effective_date25','effective_date26']

In [4]:
terminated_wd_schema = {
    'department': str,
    'employee_id': str,
    'gender': str,
    'race_ethnicity': str,
    'education': str,
    'military_status': str,
    'date_of_birth': str,
    'original_hire_date': str,
    'hire_date': str,
    'termination_date': str,
    'pay_rate_type': str,
    'current_base_pay': np.float64,
    'job_profile_current': str,
    'time_type_current': str,
    'cost_center_current': str,
    'effective_date1': str,
    'business_process_type1': str,
    'business_process_reason1': str,
    'pay_rate_type1': str,
    'base_pay_change1': np.float64,
    'job_profile1': str,
    'time_type1': str,
    'cost_center1': str,
    'effective_date2': str,
    'business_process_type2': str,
    'business_process_reason2': str,
    'pay_rate_type2': str,
    'base_pay_change2': np.float64,
    'job_profile2': str,
    'time_type2': str,
    'cost_center2': str,
    'effective_date3': str,
    'business_process_type3': str,
    'business_process_reason3': str,
    'pay_rate_type3': str,
    'base_pay_change3': np.float64,
    'job_profile3': str,
    'time_type3': str,
    'cost_center3': str,
    'effective_date4': str,
    'business_process_type4': str,
    'business_process_reason4': str,
    'pay_rate_type4': str,
    'base_pay_change4': np.float64,
    'job_profile4': str,
    'time_type4': str,
    'cost_center4': str,
    'effective_date5': str,
    'business_process_type5': str,
    'business_process_reason5': str,
    'pay_rate_type5': str,
    'base_pay_change5': np.float64,
    'job_profile5': str,
    'time_type5': str,
    'cost_center5': str,
    'effective_date6': str,
    'business_process_type6': str,
    'business_process_reason6': str,
    'pay_rate_type6': str,
    'base_pay_change6': np.float64,
    'job_profile6': str,
    'time_type6': str,
    'cost_center6': str,
    'effective_date7': str,
    'business_process_type7': str,
    'business_process_reason7': str,
    'pay_rate_type7': str,
    'base_pay_change7': np.float64,
    'job_profile7': str,
    'time_type7': str,
    'cost_center7': str,
    'effective_date8': str,
    'business_process_type8': str,
    'business_process_reason8': str,
    'pay_rate_type8': str,
    'base_pay_change8': np.float64,
    'job_profile8': str,
    'time_type8': str,
    'cost_center8': str,
    'effective_date9': str,
    'business_process_type9': str,
    'business_process_reason9': str,
    'pay_rate_type9': str,
    'base_pay_change9': np.float64,
    'job_profile9': str,
    'time_type9': str,
    'cost_center9': str,
    'effective_date10': str,
    'business_process_type10': str,
    'business_process_reason10': str,
    'pay_rate_type10': str,
    'base_pay_change10': np.float64,
    'job_profile10': str,
    'time_type10': str,
    'cost_center10': str,
    'effective_date11': str,
    'business_process_type11': str,
    'business_process_reason11': str,
    'pay_rate_type11': str,
    'base_pay_change11': np.float64,
    'job_profile11': str,
    'time_type11': str,
    'cost_center11': str,
    'effective_date12': str,
    'business_process_type12': str,
    'business_process_reason12': str,
    'pay_rate_type12': str,
    'base_pay_change12': np.float64,
    'job_profile12': str,
    'time_type12': str,
    'cost_center12': str,
    'effective_date13': str,
    'business_process_type13': str,
    'business_process_reason13': str,
    'pay_rate_type13': str,
    'base_pay_change13': np.float64,
    'job_profile13': str,
    'time_type13': str,
    'cost_center13': str,
    'effective_date14': str,
    'business_process_type14': str,
    'business_process_reason14': str,
    'pay_rate_type14': str,
    'base_pay_change14': np.float64,
    'job_profile14': str,
    'time_type14': str,
    'cost_center14': str,
    'effective_date15': str,
    'business_process_type15': str,
    'business_process_reason15': str,
    'pay_rate_type15': str,
    'base_pay_change15': np.float64,
    'job_profile15': str,
    'time_type15': str,
    'cost_center15': str,
    '2008_annual_performance_rating': np.float64,
    '2009_annual_performance_rating': np.float64,
    '2010_annual_performance_rating': np.float64,
    '2011_annual_performance_rating': np.float64,
    '2012_annual_performance_rating': np.float64,
    '2013_annual_performance_rating': np.float64,
    '2014_annual_performance_rating': np.float64,
    '2015_annual_performance_rating': np.float64,
    '2016_annual_performance_rating': np.float64,
    '2017_annual_performance_rating': np.float64,
    '2018_annual_performance_rating': np.float64,
    '2019_annual_performance_rating': np.float64,
    '2020_annual_performance_rating': np.float64
}

parse_dates2 = ['date_of_birth', 'original_hire_date', 'hire_date','termination_date','effective_date1','effective_date2','effective_date3','effective_date4','effective_date5','effective_date6','effective_date7','effective_date8','effective_date9','effective_date10','effective_date11','effective_date12','effective_date13','effective_date14','effective_date15']

In [5]:
df = pd.read_csv(CSVPATH.joinpath('active_wd_2021.csv'), dtype=active_wd_schema, parse_dates=parse_dates)
df2 = pd.read_csv(CSVPATH.joinpath('terminated_wd_2021.csv'), dtype=terminated_wd_schema, parse_dates=parse_dates2)

## Add fields for analysis

In [6]:
date_received = np.datetime64('2021-05-28')

df['age'] = (date_received - df['date_of_birth']).astype('<m8[Y]')
df['years_of_service'] = (date_received - df['hire_date']).astype('<m8[Y]')
df2['age'] = (date_received - df2['date_of_birth']).astype('<m8[Y]')
df2['years_of_service'] = (date_received - df2['hire_date']).astype('<m8[Y]')

### Add field for 5-year age groups

In [7]:
bins= [0,25,30,35,40,45,50,55,60,65,100]
labels = ['<25','25-29','30-34','35-39','40-44', '45-49','50-54','55-59','60-64','65+']
df['age_group_5'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)
df2['age_group_5'] = pd.cut(df2['age'], bins=bins, labels=labels, right=False)

### Add field for 10-year age groups

In [8]:
bins= [0,25,35,45,55,65,100]
labels = ['<25','25-34','35-44','45-54','55-64','65+']
df['age_group_10'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)
df2['age_group_10'] = pd.cut(df2['age'], bins=bins, labels=labels, right=False)

### Add field for years-of-service groups

In [9]:
bins= [0,1,3,6,11,16,21,26,100]
labels = ['0','1-2','3-5','6-10','11-15','16-20','21-25','25+']
df['years_of_service_grouped'] = pd.cut(df['years_of_service'], bins=bins, labels=labels, right=False)
df2['years_of_service_grouped'] = pd.cut(df2['years_of_service'], bins=bins, labels=labels, right=False)

### Group departments

In [10]:
def dept(row):
    NEWS_DEPTS = ['News', 'Editorial', 'News Service and Syndicate']
    COMMERCIAL_DEPTS = [
        'Client Solutions', 'Circulation', 'Finance', 'Marketing', 'WP News Media Services', 'Production', 'Public Relations', 'Administration', 'Product', 'Audience Development and Insights', 'Customer Care and Logistics', 'Legal', 'Washington Post Live'
    ]
    if row['department'] in NEWS_DEPTS:
        return 'News'
    elif row['department'] in COMMERCIAL_DEPTS:
        return 'Commercial'
    else:
        return 'Unknown'

df['dept'] = df.apply(lambda row: dept(row), axis=1)
df2['dept'] = df2.apply(lambda row: dept(row), axis=1)

### Group desks

In [11]:
def desk(row):
    OPERATIONS = ['110000 News Operations','110001 News Digital Operations']
    AUDIENCE = ['110610 Audience Development and Engagement']
    AUDIO = ['110620 News Audio']
    DESIGN = ['110604 Presentation Design']
    PHOTO = ['110605 Presentation']
    EMERGING = ['110664 News National Apps','110665 News The Lily','110666 News Snapchat','110667 News By The Way']
    FINANCIAL = ['113210 Economy and Business']
    FOREIGN = ['114000 Foreign Administration','114095 News Foreign Brazil','114100 Foreign Latam','114220 News Foreign Istanbul','114235 Foreign Western Europe','114300 News Foreign West Africa','114415 Foreign Hong Kong','114405 Foreign Beijing Bureau','114105 Foreign Mexico Bureau','114005 Foreign Beirut Bureau','114400 Foreign India Bureau','114410 Foreign Tokyo Bureau','114205 Foreign Islamabad Bureau','114305 Foreign Nairobi Bureau','114240 Foreign Rome Bureau','114200 Foreign London Bureau','114230 Foreign Moscow Bureau','114225 Foreign Cairo Bureau','114215 Foreign Berlin Bureau','114310 Foreign Baghdad Bureau','114315 Foreign Jerusalem Bureau']
    GRAPHICS = ['110603 Presentation Graphics']
    INVESTIGATIVE = ['110450 Investigative']
    LOCAL = ['112300 Local Politics and Government']
    MULTI = ['110601 Multiplatform Desk']
    NATIONAL = ['110500 Magazine','113200 National Politics and Government','113205 National Security','113215 News National Health & Science','113220 National Enterprise','113235 National America','113240 News National Environment']
    RESEARCH = ['110006 News Content & Research']
    LOGISTICS = ['110455 News Logistics']
    OUTLOOK = ['110410 Book World','110460 Outlook']
    POLLING = ['110475 Polling']
    SPORTS = ['110015 Sports Main']
    STYLE = ['110300 Style','110435 Food','110485 Travel','110495 Local Living','110505 Weekend']
    UNIVERSAL = ['110600 Universal Desk']
    VIDEO = ['110652 News Video - General']
    OTHER = ['110663 Wake Up Report']
    EDITORIAL = ['115000 Editorial Administration']
    if row['cost_center_current'] in OPERATIONS:
        return 'Operations'
    elif row['cost_center_current'] in AUDIENCE:
        return 'Audience Development and Engagement'
    elif row['cost_center_current'] in AUDIO:
        return 'Audio'
    elif row['cost_center_current'] in DESIGN:
        return 'Design'
    elif row['cost_center_current'] in EMERGING:
        return 'Emerging News Products'
    elif row['cost_center_current'] in FINANCIAL:
        return 'Financial'
    elif row['cost_center_current'] in FOREIGN:
        return 'Foreign'
    elif row['cost_center_current'] in GRAPHICS:
        return 'Graphics'
    elif row['cost_center_current'] in LOCAL:
        return 'Local'
    elif row['cost_center_current'] in MULTI:
        return 'Multiplatform'
    elif row['cost_center_current'] in NATIONAL:
        return 'National'
    elif row['cost_center_current'] in RESEARCH:
        return 'News Content and Research'
    elif row['cost_center_current'] in LOGISTICS:
        return 'News Logistics'
    elif row['cost_center_current'] in OUTLOOK:
        return 'Outlook'
    elif row['cost_center_current'] in POLLING:
        return 'Polling'
    elif row['cost_center_current'] in PHOTO:
        return 'Photography'
    elif row['cost_center_current'] in SPORTS:
        return 'Sports'
    elif row['cost_center_current'] in STYLE:
        return 'Style'
    elif row['cost_center_current'] in UNIVERSAL:
        return 'Universal Desk'
    elif row['cost_center_current'] in VIDEO:
        return 'Video'
    elif row['cost_center_current'] in OTHER:
        return 'Other'
    elif row['cost_center_current'] in EDITORIAL:
        return 'Editorial'
    else:
        return 'non-newsroom'

df['desk'] = df.apply(lambda row: desk(row), axis=1)
df2['desk'] = df2.apply(lambda row: desk(row), axis=1)

### Group desks by median salary ranges

In [12]:
def tier(row):
    TIER1 = ['National','Foreign','Financial','Investigative']
    TIER2 = ['Style','Local','Graphics','Universal Desk','Sports','Outlook','Editorial']
    TIER3 = ['Audio','Polling','Design','Operations','Multiplatform','Video','Audience Development and Engagement','Photography']
    TIER4 = ['News Logistics','News Content and Research','Emerging News Products','Other']
    if row['desk'] in TIER1:
        return 'Tier 1'
    elif row['desk'] in TIER2:
        return 'Tier 2'
    elif row['desk'] in TIER3:
        return 'Tier 3'
    elif row['desk'] in TIER4:
        return 'Tier 4'
    else:
        return 'other'

df['tier'] = df.apply(lambda row: tier(row), axis=1)
df2['tier'] = df2.apply(lambda row: tier(row), axis=1)

### Group race and ethnicity

In [13]:
def race_groups(row):
    WHITE = ['White (United States of America)']
    NONWHITE = [
        'Black or African American (United States of America)', 'Asian (United States of America)', 'Hispanic or Latino (United States of America)', 'Two or More Races (United States of America)', 'American Indian or Alaska Native (United States of America)', 'Native Hawaiian or Other Pacific Islander (United States of America)'
    ]
    if row['race_ethnicity'] in WHITE:
        return 'white'
    elif row['race_ethnicity'] in NONWHITE:
        return 'person of color'
    else:
        return 'unknown'

df['race_grouping'] = df.apply(lambda row: race_groups(row), axis=1)
df2['race_grouping'] = df2.apply(lambda row: race_groups(row), axis=1)

### Employee pay change grouping

In [14]:
reason_for_change1 = df[['business_process_reason1','base_pay_change1','effective_date1','pay_rate_type1','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason1':'business_process_reason','base_pay_change1':'base_pay_change','effective_date1':'effective_date','pay_rate_type1':'pay_rate_type'})
reason_for_change2 = df[['business_process_reason2','base_pay_change2','effective_date2','pay_rate_type2','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason2':'business_process_reason','base_pay_change2':'base_pay_change','effective_date2':'effective_date','pay_rate_type2':'pay_rate_type'})
reason_for_change3 = df[['business_process_reason3','base_pay_change3','effective_date3','pay_rate_type3','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason3':'business_process_reason','base_pay_change3':'base_pay_change','effective_date3':'effective_date','pay_rate_type3':'pay_rate_type'})
reason_for_change4 = df[['business_process_reason4','base_pay_change4','effective_date4','pay_rate_type4','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason4':'business_process_reason','base_pay_change4':'base_pay_change','effective_date4':'effective_date','pay_rate_type4':'pay_rate_type'})
reason_for_change5 = df[['business_process_reason5','base_pay_change5','effective_date5','pay_rate_type5','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason5':'business_process_reason','base_pay_change5':'base_pay_change','effective_date5':'effective_date','pay_rate_type5':'pay_rate_type'})
reason_for_change6 = df[['business_process_reason6','base_pay_change6','effective_date6','pay_rate_type6','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason6':'business_process_reason','base_pay_change6':'base_pay_change','effective_date6':'effective_date','pay_rate_type6':'pay_rate_type'})
reason_for_change7 = df[['business_process_reason7','base_pay_change7','effective_date7','pay_rate_type7','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason7':'business_process_reason','base_pay_change7':'base_pay_change','effective_date7':'effective_date','pay_rate_type7':'pay_rate_type'})
reason_for_change8 = df[['business_process_reason8','base_pay_change8','effective_date8','pay_rate_type8','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason8':'business_process_reason','base_pay_change8':'base_pay_change','effective_date8':'effective_date','pay_rate_type8':'pay_rate_type'})
reason_for_change9 = df[['business_process_reason9','base_pay_change9','effective_date9','pay_rate_type9','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason9':'business_process_reason','base_pay_change9':'base_pay_change','effective_date9':'effective_date','pay_rate_type9':'pay_rate_type'})
reason_for_change10 = df[['business_process_reason10','base_pay_change10','effective_date10','pay_rate_type10','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason10':'business_process_reason','base_pay_change10':'base_pay_change','effective_date10':'effective_date','pay_rate_type10':'pay_rate_type'})
reason_for_change11 = df[['business_process_reason11','base_pay_change11','effective_date11','pay_rate_type11','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason11':'business_process_reason','base_pay_change11':'base_pay_change','effective_date11':'effective_date','pay_rate_type11':'pay_rate_type'})
reason_for_change12 = df[['business_process_reason12','base_pay_change12','effective_date12','pay_rate_type12','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason12':'business_process_reason','base_pay_change12':'base_pay_change','effective_date12':'effective_date','pay_rate_type12':'pay_rate_type'})
reason_for_change13 = df[['business_process_reason13','base_pay_change13','effective_date13','pay_rate_type13','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason13':'business_process_reason','base_pay_change13':'base_pay_change','effective_date13':'effective_date','pay_rate_type13':'pay_rate_type'})
reason_for_change14 = df[['business_process_reason14','base_pay_change14','effective_date14','pay_rate_type14','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason14':'business_process_reason','base_pay_change14':'base_pay_change','effective_date14':'effective_date','pay_rate_type14':'pay_rate_type'})
reason_for_change15 = df[['business_process_reason15','base_pay_change15','effective_date15','pay_rate_type15','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason15':'business_process_reason','base_pay_change15':'base_pay_change','effective_date15':'effective_date','pay_rate_type15':'pay_rate_type'})
reason_for_change16 = df[['business_process_reason16','base_pay_change16','effective_date16','pay_rate_type16','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason16':'business_process_reason','base_pay_change16':'base_pay_change','effective_date16':'effective_date','pay_rate_type16':'pay_rate_type'})
reason_for_change17 = df[['business_process_reason17','base_pay_change17','effective_date17','pay_rate_type17','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason17':'business_process_reason','base_pay_change17':'base_pay_change','effective_date17':'effective_date','pay_rate_type17':'pay_rate_type'})
reason_for_change18 = df[['business_process_reason18','base_pay_change18','effective_date18','pay_rate_type18','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason18':'business_process_reason','base_pay_change18':'base_pay_change','effective_date18':'effective_date','pay_rate_type18':'pay_rate_type'})
reason_for_change19 = df[['business_process_reason19','base_pay_change19','effective_date19','pay_rate_type19','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason19':'business_process_reason','base_pay_change19':'base_pay_change','effective_date19':'effective_date','pay_rate_type19':'pay_rate_type'})
reason_for_change20 = df[['business_process_reason20','base_pay_change20','effective_date20','pay_rate_type20','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason20':'business_process_reason','base_pay_change20':'base_pay_change','effective_date20':'effective_date','pay_rate_type20':'pay_rate_type'})
reason_for_change21 = df[['business_process_reason21','base_pay_change21','effective_date21','pay_rate_type21','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason21':'business_process_reason','base_pay_change21':'base_pay_change','effective_date21':'effective_date','pay_rate_type21':'pay_rate_type'})
reason_for_change22 = df[['business_process_reason22','base_pay_change22','effective_date22','pay_rate_type22','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason22':'business_process_reason','base_pay_change22':'base_pay_change','effective_date22':'effective_date','pay_rate_type22':'pay_rate_type'})
reason_for_change23 = df[['business_process_reason23','base_pay_change23','effective_date23','pay_rate_type23','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason23':'business_process_reason','base_pay_change23':'base_pay_change','effective_date23':'effective_date','pay_rate_type23':'pay_rate_type'})
reason_for_change24 = df[['business_process_reason24','base_pay_change24','effective_date24','pay_rate_type24','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason24':'business_process_reason','base_pay_change24':'base_pay_change','effective_date24':'effective_date','pay_rate_type24':'pay_rate_type'})
reason_for_change25 = df2[['business_process_reason1','base_pay_change1','effective_date1','pay_rate_type1','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason1':'business_process_reason','base_pay_change1':'base_pay_change','effective_date1':'effective_date','pay_rate_type1':'pay_rate_type'})
reason_for_change26 = df2[['business_process_reason2','base_pay_change2','effective_date2','pay_rate_type2','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason2':'business_process_reason','base_pay_change2':'base_pay_change','effective_date2':'effective_date','pay_rate_type2':'pay_rate_type'})
reason_for_change27 = df2[['business_process_reason3','base_pay_change3','effective_date3','pay_rate_type3','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason3':'business_process_reason','base_pay_change3':'base_pay_change','effective_date3':'effective_date','pay_rate_type3':'pay_rate_type'})
reason_for_change28 = df2[['business_process_reason4','base_pay_change4','effective_date4','pay_rate_type4','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason4':'business_process_reason','base_pay_change4':'base_pay_change','effective_date4':'effective_date','pay_rate_type4':'pay_rate_type'})
reason_for_change29 = df2[['business_process_reason5','base_pay_change5','effective_date5','pay_rate_type5','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason5':'business_process_reason','base_pay_change5':'base_pay_change','effective_date5':'effective_date','pay_rate_type5':'pay_rate_type'})
reason_for_change30 = df2[['business_process_reason6','base_pay_change6','effective_date6','pay_rate_type6','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason6':'business_process_reason','base_pay_change6':'base_pay_change','effective_date6':'effective_date','pay_rate_type6':'pay_rate_type'})
reason_for_change31 = df2[['business_process_reason7','base_pay_change7','effective_date7','pay_rate_type7','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason7':'business_process_reason','base_pay_change7':'base_pay_change','effective_date7':'effective_date','pay_rate_type7':'pay_rate_type'})
reason_for_change32 = df2[['business_process_reason8','base_pay_change8','effective_date8','pay_rate_type8','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason8':'business_process_reason','base_pay_change8':'base_pay_change','effective_date8':'effective_date','pay_rate_type8':'pay_rate_type'})
reason_for_change33 = df2[['business_process_reason9','base_pay_change9','effective_date9','pay_rate_type9','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason9':'business_process_reason','base_pay_change9':'base_pay_change','effective_date9':'effective_date','pay_rate_type9':'pay_rate_type'})
reason_for_change34 = df2[['business_process_reason10','base_pay_change10','effective_date10','pay_rate_type10','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason10':'business_process_reason','base_pay_change10':'base_pay_change','effective_date10':'effective_date','pay_rate_type10':'pay_rate_type'})
reason_for_change35 = df2[['business_process_reason11','base_pay_change11','effective_date11','pay_rate_type11','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason11':'business_process_reason','base_pay_change11':'base_pay_change','effective_date11':'effective_date','pay_rate_type11':'pay_rate_type'})
reason_for_change36 = df2[['business_process_reason12','base_pay_change12','effective_date12','pay_rate_type12','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason12':'business_process_reason','base_pay_change12':'base_pay_change','effective_date12':'effective_date','pay_rate_type12':'pay_rate_type'})
reason_for_change37 = df2[['business_process_reason13','base_pay_change13','effective_date13','pay_rate_type13','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason13':'business_process_reason','base_pay_change13':'base_pay_change','effective_date13':'effective_date','pay_rate_type13':'pay_rate_type'})
reason_for_change38 = df2[['business_process_reason14','base_pay_change14','effective_date14','pay_rate_type14','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason14':'business_process_reason','base_pay_change14':'base_pay_change','effective_date14':'effective_date','pay_rate_type14':'pay_rate_type'})
reason_for_change39 = df2[['business_process_reason15','base_pay_change15','effective_date15','pay_rate_type15','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason15':'business_process_reason','base_pay_change15':'base_pay_change','effective_date15':'effective_date','pay_rate_type15':'pay_rate_type'})
reason_for_change1 = pd.DataFrame(reason_for_change1)
reason_for_change2 = pd.DataFrame(reason_for_change2)
reason_for_change3 = pd.DataFrame(reason_for_change3)
reason_for_change4 = pd.DataFrame(reason_for_change4)
reason_for_change5 = pd.DataFrame(reason_for_change5)
reason_for_change6 = pd.DataFrame(reason_for_change6)
reason_for_change7 = pd.DataFrame(reason_for_change7)
reason_for_change8 = pd.DataFrame(reason_for_change8)
reason_for_change9 = pd.DataFrame(reason_for_change9)
reason_for_change10 = pd.DataFrame(reason_for_change10)
reason_for_change11 = pd.DataFrame(reason_for_change11)
reason_for_change12 = pd.DataFrame(reason_for_change12)
reason_for_change13 = pd.DataFrame(reason_for_change13)
reason_for_change14 = pd.DataFrame(reason_for_change14)
reason_for_change15 = pd.DataFrame(reason_for_change15)
reason_for_change16 = pd.DataFrame(reason_for_change16)
reason_for_change17 = pd.DataFrame(reason_for_change17)
reason_for_change18 = pd.DataFrame(reason_for_change18)
reason_for_change19 = pd.DataFrame(reason_for_change19)
reason_for_change20 = pd.DataFrame(reason_for_change20)
reason_for_change21 = pd.DataFrame(reason_for_change21)
reason_for_change22 = pd.DataFrame(reason_for_change22)
reason_for_change23 = pd.DataFrame(reason_for_change23)
reason_for_change24 = pd.DataFrame(reason_for_change24)
reason_for_change25 = pd.DataFrame(reason_for_change25)
reason_for_change26 = pd.DataFrame(reason_for_change26)
reason_for_change27 = pd.DataFrame(reason_for_change27)
reason_for_change28 = pd.DataFrame(reason_for_change28)
reason_for_change29 = pd.DataFrame(reason_for_change29)
reason_for_change30 = pd.DataFrame(reason_for_change30)
reason_for_change31 = pd.DataFrame(reason_for_change31)
reason_for_change32 = pd.DataFrame(reason_for_change32)
reason_for_change33 = pd.DataFrame(reason_for_change33)
reason_for_change34 = pd.DataFrame(reason_for_change34)
reason_for_change35 = pd.DataFrame(reason_for_change35)
reason_for_change36 = pd.DataFrame(reason_for_change36)
reason_for_change37 = pd.DataFrame(reason_for_change37)
reason_for_change38 = pd.DataFrame(reason_for_change38)
reason_for_change39 = pd.DataFrame(reason_for_change39)

reason_for_change_combined = pd.concat([reason_for_change1,reason_for_change2,reason_for_change3,reason_for_change4,reason_for_change5,reason_for_change6,reason_for_change7,reason_for_change8,reason_for_change9,reason_for_change10,reason_for_change11,reason_for_change12,reason_for_change13,reason_for_change14,reason_for_change15,reason_for_change16,reason_for_change17,reason_for_change18,reason_for_change19,reason_for_change20,reason_for_change21,reason_for_change22,reason_for_change23,reason_for_change24,reason_for_change25,reason_for_change26,reason_for_change27,reason_for_change28,reason_for_change29,reason_for_change30,reason_for_change31,reason_for_change32,reason_for_change33,reason_for_change34,reason_for_change35,reason_for_change36,reason_for_change37,reason_for_change38,reason_for_change39])

### Employee performance evaluation grouping

In [15]:
eight1 = df[['2008_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2008_annual_performance_rating':'performance_rating'})
eight2 = df2[['2008_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2008_annual_performance_rating':'performance_rating'})
nine1 = df[['2009_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2009_annual_performance_rating':'performance_rating'})
nine2 = df2[['2009_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2009_annual_performance_rating':'performance_rating'})
ten1 = df[['2010_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2010_annual_performance_rating':'performance_rating'})
ten2 = df2[['2010_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2010_annual_performance_rating':'performance_rating'})
eleven1 = df[['2011_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2011_annual_performance_rating':'performance_rating'})
eleven2 = df2[['2011_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2011_annual_performance_rating':'performance_rating'})
twelve1 = df[['2012_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2012_annual_performance_rating':'performance_rating'})
twelve2 = df2[['2012_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2012_annual_performance_rating':'performance_rating'})
thirteen1 = df[['2013_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2013_annual_performance_rating':'performance_rating'})
thirteen2 = df2[['2013_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2013_annual_performance_rating':'performance_rating'})
fourteen1 = df[['2014_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2014_annual_performance_rating':'performance_rating'})
fourteen2 = df2[['2014_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2014_annual_performance_rating':'performance_rating'})
fifteen1 = df[['2015_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
fifteen2 = df2[['2015_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
sixteen1 = df[['2016_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
sixteen2 = df2[['2016_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
seventeen1 = df[['2017_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
seventeen2 = df2[['2017_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
eighteen1 = df[['2018_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
eighteen2 = df2[['2018_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
nineteen1 = df[['2019_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2019_annual_performance_rating':'performance_rating'})
nineteen2 = df2[['2019_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2019_annual_performance_rating':'performance_rating'})
twenty1 = df[['2020_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2020_annual_performance_rating':'performance_rating'})
twenty2 = df2[['2020_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2020_annual_performance_rating':'performance_rating'})
eight1 = pd.DataFrame(eight1)
eight2 = pd.DataFrame(eight2)
nine1 = pd.DataFrame(nine1)
nine2 = pd.DataFrame(nine2)
ten1 = pd.DataFrame(ten1)
ten2 = pd.DataFrame(ten2)
eleven1 = pd.DataFrame(eleven1)
eleven2 = pd.DataFrame(eleven2)
twelve1 = pd.DataFrame(twelve1)
twelve2 = pd.DataFrame(twelve2)
thirteen1 = pd.DataFrame(thirteen1)
thirteen2 = pd.DataFrame(thirteen2)
fourteen1 = pd.DataFrame(fourteen1)
fourteen2 = pd.DataFrame(fourteen2)
fifteen1 = pd.DataFrame(fifteen1)
fifteen2 = pd.DataFrame(fifteen2)
sixteen1 = pd.DataFrame(sixteen1)
sixteen2 = pd.DataFrame(sixteen2)
seventeen1 = pd.DataFrame(seventeen1)
seventeen2 = pd.DataFrame(seventeen2)
eighteen1 = pd.DataFrame(eighteen1)
eighteen2 = pd.DataFrame(eighteen2)
nineteen1 = pd.DataFrame(nineteen1)
nineteen2 = pd.DataFrame(nineteen2)
twenty1 = pd.DataFrame(twenty1)
twenty2 = pd.DataFrame(twenty2)

ratings_combined = pd.concat([eight1,eight2,nine1,nine2,ten1,ten2,eleven1,eleven2,twelve1,twelve2,thirteen1,thirteen2,fourteen1,fourteen2,fifteen1,fifteen2,sixteen1,sixteen2,seventeen1,seventeen2,eighteen1,eighteen2,nineteen1,nineteen2,twenty1,twenty2])

### Create departmental data frames

In [16]:
news_salaried = df[(df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
news_hourly = df[(df['dept'] == 'News') & (df['pay_rate_type'] == 'Hourly')]
commercial_salaried = df[(df['dept'] == 'Commercial') & (df['pay_rate_type'] == 'Salaried')]
commercial_hourly = df[(df['dept'] == 'Commercial') & (df['pay_rate_type'] == 'Hourly')]

news_salaried2 = df2[(df2['dept'] == 'News') & (df2['pay_rate_type'] == 'Salaried')]
news_hourly2 = df2[(df2['dept'] == 'News') & (df2['pay_rate_type'] == 'Hourly')]
commercial_salaried2 = df2[(df2['dept'] == 'Commercial') & (df2['pay_rate_type'] == 'Salaried')]
commercial_hourly2 = df2[(df2['dept'] == 'Commercial') & (df2['pay_rate_type'] == 'Hourly')]

## Supress Results

### Suppress results where there are less than five employees

In [17]:
df['count'] = 1
df2['count'] = 1

def suppress(results):
    results.columns = results.columns.get_level_values(1)
    return results[results['count_nonzero'] >= 5]

### Suppress results and order them by count of employees

In [18]:
def suppress_count(results):
    results.columns = results.columns.get_level_values(1)
    return results[results['count_nonzero'] >= 5].sort_values('count_nonzero', ascending=False)

### Suppress results and order them by median salary of employees

In [19]:
def suppress_median(results):
    results.columns = results.columns.get_level_values(1)
    return results[results['count_nonzero'] >= 5].sort_values('median', ascending=False)

## Summary Analysis

### Employee counts

In [20]:
current_employee_count = df.shape[0]
terminated_employee_count = df2.shape[0]

print('Total employees in data: ' + str(current_employee_count + terminated_employee_count))
print('Current employees: ' + str(current_employee_count))
print('Terminated employees: ' + str(terminated_employee_count))

Total employees in data: 1466
Current employees: 1003
Terminated employees: 463


In [21]:
current_salaried_employee_count = df[df['pay_rate_type'] == 'Salaried'].shape[0]
terminated_salaried_employee_count = df2[df2['pay_rate_type'] == 'Salaried'].shape[0]

print('Total salaried employees in data: ' + str(current_salaried_employee_count + terminated_salaried_employee_count))
print('Current salaried employees: ' + str(current_salaried_employee_count))
print('Terminated salaried employees: ' + str(terminated_salaried_employee_count))

Total salaried employees in data: 1049
Current salaried employees: 783
Terminated salaried employees: 266


In [22]:
current_hourly_employee_count = df[df['pay_rate_type'] == 'Hourly'].shape[0]
terminated_hourly_employee_count = df2[df2['pay_rate_type'] == 'Hourly'].shape[0]

print('Total hourly employees in data: ' + str(current_hourly_employee_count + terminated_hourly_employee_count))
print('Current hourly employees: ' + str(current_hourly_employee_count))
print('Terminated hourly employees: ' + str(terminated_hourly_employee_count))

Total hourly employees in data: 417
Current hourly employees: 220
Terminated hourly employees: 197


### Salary information

In [23]:
current_mean_salary = df[df['pay_rate_type'] == 'Salaried']['current_base_pay'].mean()
current_median_salary = df[df['pay_rate_type'] == 'Salaried']['current_base_pay'].median()

print('The mean yearly pay for current salaried employees is $' + str(current_mean_salary) + '.')
print('The median yearly pay for current salaried employees is $' + str(current_median_salary) + '.')

The mean yearly pay for current salaried employees is $118500.28141762452.
The median yearly pay for current salaried employees is $105298.56.


In [24]:
current_mean_hourly = df[df['pay_rate_type'] == 'Hourly']['current_base_pay'].mean()
current_median_hourly = df[df['pay_rate_type'] == 'Hourly']['current_base_pay'].median()

print('The mean rate for current hourly employees at The Washington Post is $' + str(current_mean_hourly) + '.')
print('The median rate for current hourly employees at The Washington Post is $' + str(current_median_hourly) + '.')

The mean rate for current hourly employees at The Washington Post is $31.94181818181818.
The median rate for current hourly employees at The Washington Post is $31.16.


### Employee gender

In [25]:
current_employee_gender = df.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_gender)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,546
Male,456


In [26]:
terminated_employee_gender = df2.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_gender)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,261
Male,201


In [27]:
current_median_salary_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_salary_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,421,99465.35
Male,361,114921.16


In [28]:
current_median_hourly_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_hourly_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,125,32.65
Male,95,28.65


In [29]:
current_age_gender_salaried = df[df['pay_rate_type'] == 'Salaried'].groupby(['gender'])['age'].median().sort_values(ascending=False)
current_age_gender_salaried

gender
Male                     42.00
Female                   35.00
Prefer not to disclose   30.00
Name: age, dtype: float64

### Employee race and ethnicity

In [30]:
current_employee_race_ethnicity = df.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_race_ethnicity)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),624
Black or African American (United States of America),159
Asian (United States of America),91
Hispanic or Latino (United States of America),54
Two or More Races (United States of America),28
Prefer Not to Disclose (United States of America),15


In [31]:
terminated_employee_race_ethnicity = df2.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(terminated_employee_race_ethnicity)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),252
Black or African American (United States of America),119
Asian (United States of America),45
Hispanic or Latino (United States of America),23
Two or More Races (United States of America),13
Prefer Not to Disclose (United States of America),6


In [32]:
current_median_salary_race = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_median_salary_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),525,110453.45
Black or African American (United States of America),76,95000.0
Prefer Not to Disclose (United States of America),13,95000.0
Asian (United States of America),72,94920.0
Hispanic or Latino (United States of America),47,94780.0
Two or More Races (United States of America),21,90000.0


In [33]:
current_median_hourly_race = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_median_hourly_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Two or More Races (United States of America),7,35.0
White (United States of America),99,33.77
Asian (United States of America),19,31.92
Black or African American (United States of America),83,27.88
Hispanic or Latino (United States of America),7,26.56


In [34]:
current_age_race_salaried = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_age_race_salaried

race_ethnicity
American Indian or Alaska Native (United States of America)            46.00
Native Hawaiian or Other Pacific Islander (United States of America)   45.00
Black or African American (United States of America)                   40.50
White (United States of America)                                       39.00
Hispanic or Latino (United States of America)                          35.00
Asian (United States of America)                                       33.00
Prefer Not to Disclose (United States of America)                      31.00
Two or More Races (United States of America)                           29.00
Name: age, dtype: float64

In [35]:
current_age_race_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_age_race_hourly

race_ethnicity
American Indian or Alaska Native (United States of America)   55.50
Black or African American (United States of America)          47.00
White (United States of America)                              38.00
Prefer Not to Disclose (United States of America)             34.00
Two or More Races (United States of America)                  32.00
Hispanic or Latino (United States of America)                 29.00
Asian (United States of America)                              28.00
Name: age, dtype: float64

### Employee gender x race/ethnicity

In [36]:
current_employee_race_gender = df.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,64
Asian (United States of America),Male,27
Black or African American (United States of America),Female,84
Black or African American (United States of America),Male,75
Hispanic or Latino (United States of America),Female,29
Hispanic or Latino (United States of America),Male,25
Prefer Not to Disclose (United States of America),Female,6
Prefer Not to Disclose (United States of America),Male,9
Two or More Races (United States of America),Female,20
Two or More Races (United States of America),Male,8


In [37]:
current_salaried_race_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_salaried_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,51
Asian (United States of America),Male,21
Black or African American (United States of America),Female,40
Black or African American (United States of America),Male,36
Hispanic or Latino (United States of America),Female,24
Hispanic or Latino (United States of America),Male,23
Prefer Not to Disclose (United States of America),Female,6
Prefer Not to Disclose (United States of America),Male,7
Two or More Races (United States of America),Female,15
Two or More Races (United States of America),Male,6


In [38]:
current_hourly_race_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_hourly_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,13
Asian (United States of America),Male,6
Black or African American (United States of America),Female,44
Black or African American (United States of America),Male,39
Hispanic or Latino (United States of America),Female,5
Two or More Races (United States of America),Female,5
White (United States of America),Female,56
White (United States of America),Male,43


In [39]:
current_median_salary_race_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_salary_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,51,94840.0
Asian (United States of America),Male,21,95691.99
Black or African American (United States of America),Female,40,91959.9
Black or African American (United States of America),Male,36,99375.0
Hispanic or Latino (United States of America),Female,24,91254.94
Hispanic or Latino (United States of America),Male,23,98411.56
Prefer Not to Disclose (United States of America),Female,6,92500.0
Prefer Not to Disclose (United States of America),Male,7,98340.0
Two or More Races (United States of America),Female,15,90780.0
Two or More Races (United States of America),Male,6,86890.0


In [40]:
current_median_hourly_race_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_hourly_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,13,31.92
Asian (United States of America),Male,6,32.83
Black or African American (United States of America),Female,44,29.42
Black or African American (United States of America),Male,39,26.66
Hispanic or Latino (United States of America),Female,5,33.85
Two or More Races (United States of America),Female,5,35.9
White (United States of America),Female,56,35.9
White (United States of America),Male,43,32.12


### Employee age

In [41]:
current_employee_age_5 = df.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_age_5)

Unnamed: 0_level_0,count_nonzero
age_group_5,Unnamed: 1_level_1
<25,43
25-29,193
30-34,147
35-39,150
40-44,94
45-49,85
50-54,92
55-59,87
60-64,70
65+,42


In [42]:
terminated_employee_age_5 = df2.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_age_5)

Unnamed: 0_level_0,count_nonzero
age_group_5,Unnamed: 1_level_1
<25,8
25-29,89
30-34,110
35-39,64
40-44,43
45-49,25
50-54,25
55-59,37
60-64,21
65+,40


In [43]:
current_employee_age_10 = df.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_age_10)

Unnamed: 0_level_0,count_nonzero
age_group_10,Unnamed: 1_level_1
<25,43
25-34,340
35-44,244
45-54,177
55-64,157
65+,42


In [44]:
terminated_employee_age_10 = df2.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_age_10)

Unnamed: 0_level_0,count_nonzero
age_group_10,Unnamed: 1_level_1
<25,8
25-34,199
35-44,107
45-54,50
55-64,58
65+,40


In [45]:
current_median_salary_age_5 = df[df['pay_rate_type'] == 'Salaried'].groupby(['age_group_5']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_salary_age_5)

Unnamed: 0_level_0,median,count_nonzero
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,72250.0,22
25-29,83389.64,150
30-34,95104.86,120
35-39,115000.0,137
40-44,128280.0,77
45-49,126572.33,63
50-54,117924.49,77
55-59,117935.68,61
60-64,145686.55,52
65+,120304.93,24


In [46]:
current_median_hourly_age_5 = df[df['pay_rate_type'] == 'Hourly'].groupby(['age_group_5']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_hourly_age_5)

Unnamed: 0_level_0,median,count_nonzero
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,31.79,21
25-29,30.14,43
30-34,33.33,27
35-39,33.08,13
40-44,28.74,17
45-49,32.27,22
50-54,28.61,15
55-59,29.7,26
60-64,27.68,18
65+,31.26,18


In [47]:
current_median_salary_age_10 = df[df['pay_rate_type'] == 'Salaried'].groupby(['age_group_10']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_salary_age_10)

Unnamed: 0_level_0,median,count_nonzero
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,72250.0,22
25-34,89980.0,270
35-44,117780.0,214
45-54,124398.47,140
55-64,129999.56,113
65+,120304.93,24


In [48]:
current_median_hourly_age_10 = df[df['pay_rate_type'] == 'Hourly'].groupby(['age_group_10']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_hourly_age_10)

Unnamed: 0_level_0,median,count_nonzero
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,31.79,21
25-34,32.23,70
35-44,31.36,30
45-54,31.06,37
55-64,29.34,44
65+,31.26,18


### Employee department

In [49]:
current_employee_dept = df.groupby(['dept']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_dept)

Unnamed: 0_level_0,count_nonzero
dept,Unnamed: 1_level_1
News,740
Commercial,263


In [50]:
current_employee_department = df.groupby(['department']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_department)

Unnamed: 0_level_0,count_nonzero
department,Unnamed: 1_level_1
News,699
Client Solutions,139
Editorial,41
Production,38
Finance,32
Audience Development and Insights,18
Customer Care and Logistics,12
Washington Post Live,9
Marketing,7
Public Relations,7


In [51]:
current_employee_dept_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['dept']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_dept_salary)

Unnamed: 0_level_0,count_nonzero,median
dept,Unnamed: 1_level_1,Unnamed: 2_level_1
News,657,109884.02
Commercial,126,90000.0


In [52]:
current_employee_department_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_department_salary)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Editorial,33,128560.27
News,624,109577.5
Finance,9,95000.0
Client Solutions,91,90241.58
Audience Development and Insights,18,90000.0
Production,5,75000.51


In [53]:
current_employee_dept_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['dept']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_dept_hourly)

Unnamed: 0_level_0,count_nonzero,median
dept,Unnamed: 1_level_1,Unnamed: 2_level_1
News,83,34.28
Commercial,137,29.18


In [54]:
current_employee_department_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_department_hourly)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Editorial,8,43.27
Marketing,7,39.64
Public Relations,7,38.4
News,75,33.85
Washington Post Live,7,33.33
Client Solutions,48,31.23
Finance,23,30.26
Production,33,25.41
Customer Care and Logistics,12,21.67


### Employee cost center

In [55]:
current_employee_desk = df.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_desk)

Unnamed: 0_level_0,count_nonzero
desk,Unnamed: 1_level_1
non-newsroom,281
National,125
Local,72
Sports,59
Style,56
Multiplatform,51
Video,49
Financial,43
Editorial,41
Foreign,34


In [56]:
current_employee_cost_center = df.groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_cost_center)

Unnamed: 0_level_0,count_nonzero
cost_center_current,Unnamed: 1_level_1
112300 Local Politics and Government,72
113200 National Politics and Government,61
110015 Sports Main,59
110601 Multiplatform Desk,51
110652 News Video - General,49
113210 Economy and Business,43
110300 Style,42
115000 Editorial Administration,41
110604 Presentation Design,34
119065 Dispatch Operations (Night Circulation),32


In [57]:
current_employee_desk_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_desk_salary)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
National,114,158242.11
Foreign,33,142780.0
Financial,43,137140.0
Editorial,33,128560.27
Local,68,113140.0
Style,50,111833.47
Universal Desk,6,107876.01
Sports,43,107560.0
Outlook,6,105497.5
Graphics,21,104320.14


In [58]:
current_employee_cost_center_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_cost_center_salary)

Unnamed: 0_level_0,count_nonzero,median
cost_center_current,Unnamed: 1_level_1,Unnamed: 2_level_1
113205 National Security,18,180737.5
117682 Global Sales,17,174956.0
113215 News National Health & Science,14,164322.11
113200 National Politics and Government,55,157840.0
110450 Investigative,17,144340.0
113210 Economy and Business,43,137140.0
113240 News National Environment,8,133387.5
117900 Agency Partner,5,129999.56
113235 National America,15,129990.0
115000 Editorial Administration,33,128560.27


In [59]:
current_employee_desk_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_desk_hourly)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
Universal Desk,8,44.16
Editorial,8,43.27
National,11,33.85
Multiplatform,7,32.77
non-newsroom,137,29.18
Sports,16,27.74
Operations,8,22.68
Style,6,20.95


In [60]:
current_employee_cost_center_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_cost_center_hourly)

Unnamed: 0_level_0,count_nonzero,median
cost_center_current,Unnamed: 1_level_1,Unnamed: 2_level_1
110600 Universal Desk,8,44.16
115000 Editorial Administration,8,43.27
128150 Consumer Mktg - Digital Subscription,5,39.64
129100 Community,7,38.4
118150 WP Live,6,34.1
113200 National Politics and Government,6,33.59
110601 Multiplatform Desk,7,32.77
126020 Revenue Administration,19,30.14
117310 Consumer to Consumer Team I,5,28.61
110015 Sports Main,16,27.74


### Employee years of service

In [61]:
current_employee_yos = df.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos)

Unnamed: 0_level_0,count_nonzero
years_of_service_grouped,Unnamed: 1_level_1
0,99
1-2,234
3-5,213
6-10,161
11-15,77
16-20,79
21-25,78
25+,62


In [62]:
terminated_employee_yos = df2.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_yos)

Unnamed: 0_level_0,count_nonzero
years_of_service_grouped,Unnamed: 1_level_1
1-2,50
3-5,181
6-10,128
11-15,32
16-20,29
21-25,22
25+,18


In [63]:
current_employee_yos_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_salary)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,70,90000.0
1-2,182,90260.79
3-5,178,100780.0
6-10,135,115162.62
11-15,55,116560.0
16-20,58,118372.75
21-25,60,135024.69
25+,45,131075.9


In [64]:
current_employee_yos_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_hourly)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,29,34.87
1-2,52,28.81
3-5,35,29.66
6-10,26,31.03
11-15,22,31.63
16-20,21,31.92
21-25,18,32.27
25+,17,32.21


In [65]:
current_employee_yos_gender = df.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
years_of_service_grouped,gender,Unnamed: 2_level_1
0,Female,68.0
0,Male,31.0
1-2,Female,145.0
1-2,Male,88.0
3-5,Female,114.0
3-5,Male,99.0
6-10,Female,72.0
6-10,Male,89.0
11-15,Female,32.0
11-15,Male,45.0


In [66]:
current_employee_yos_gender_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,46.0,89000.0
0,Male,24.0,93500.0
1-2,Female,116.0,88321.3
1-2,Male,65.0,95000.0
3-5,Female,97.0,95000.0
3-5,Male,81.0,107855.6
6-10,Female,64.0,111074.25
6-10,Male,71.0,123340.0
11-15,Female,25.0,104618.47
11-15,Male,30.0,134800.96


In [67]:
current_employee_yos_gender_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,22,35.45
0,Male,7,29.23
1-2,Female,29,30.14
1-2,Male,23,23.83
3-5,Female,17,34.61
3-5,Male,18,26.37
6-10,Female,8,43.36
6-10,Male,18,28.12
11-15,Female,7,32.51
11-15,Male,15,28.96


In [68]:
current_employee_yos_race = df.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1
0,Asian (United States of America),13.0
0,Black or African American (United States of America),22.0
0,Hispanic or Latino (United States of America),12.0
0,Two or More Races (United States of America),8.0
0,White (United States of America),39.0
1-2,Asian (United States of America),31.0
1-2,Black or African American (United States of America),30.0
1-2,Hispanic or Latino (United States of America),14.0
1-2,Prefer Not to Disclose (United States of America),9.0
1-2,Two or More Races (United States of America),10.0


In [69]:
current_employee_yos_race_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_race_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Asian (United States of America),8.0,87500.0
0,Black or African American (United States of America),14.0,91000.0
0,Hispanic or Latino (United States of America),10.0,96250.0
0,Two or More Races (United States of America),5.0,90000.0
0,White (United States of America),28.0,85000.0
1-2,Asian (United States of America),24.0,86662.55
1-2,Black or African American (United States of America),13.0,92780.0
1-2,Hispanic or Latino (United States of America),12.0,80890.0
1-2,Prefer Not to Disclose (United States of America),7.0,94780.0
1-2,Two or More Races (United States of America),8.0,81890.0


In [70]:
current_employee_yos_race_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Asian (United States of America),5.0,37.44
0,Black or African American (United States of America),8.0,29.69
0,White (United States of America),11.0,35.9
1-2,Asian (United States of America),7.0,28.61
1-2,Black or African American (United States of America),17.0,29.01
1-2,White (United States of America),22.0,32.53
3-5,Black or African American (United States of America),12.0,27.18
3-5,White (United States of America),19.0,33.77
6-10,Black or African American (United States of America),9.0,26.75
6-10,White (United States of America),14.0,32.28


### Employee performance evaluations

In [71]:
fifteen = pd.concat([fifteen1,fifteen2])
fifteenrating_gender = fifteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_gender

gender
Female                   3.40
Male                     3.40
Prefer not to disclose    NaN
Name: performance_rating, dtype: float64

In [72]:
sixteen = pd.concat([sixteen1,sixteen2])
sixteenrating_gender = sixteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_gender

gender
Female                   3.30
Male                     3.30
Prefer not to disclose    NaN
Name: performance_rating, dtype: float64

In [73]:
seventeen = pd.concat([seventeen1,seventeen2])
seventeenrating_gender = seventeen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_gender

gender
Female                   3.40
Male                     3.40
Prefer not to disclose    NaN
Name: performance_rating, dtype: float64

In [74]:
eighteen = pd.concat([eighteen1,eighteen2])
eighteenrating_gender = eighteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_gender

gender
Female                   3.40
Male                     3.40
Prefer not to disclose   3.20
Name: performance_rating, dtype: float64

In [75]:
fifteenrating_race_ethnicity = fifteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_race_ethnicity

race_ethnicity
American Indian or Alaska Native (United States of America)            3.50
Asian (United States of America)                                       3.40
White (United States of America)                                       3.40
Prefer Not to Disclose (United States of America)                      3.30
Two or More Races (United States of America)                           3.30
Native Hawaiian or Other Pacific Islander (United States of America)   3.25
Hispanic or Latino (United States of America)                          3.20
Black or African American (United States of America)                   3.10
Name: performance_rating, dtype: float64

In [76]:
sixteenrating_race_ethnicity = sixteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_race_ethnicity

race_ethnicity
Native Hawaiian or Other Pacific Islander (United States of America)   3.70
Asian (United States of America)                                       3.40
White (United States of America)                                       3.40
Prefer Not to Disclose (United States of America)                      3.30
American Indian or Alaska Native (United States of America)            3.25
Black or African American (United States of America)                   3.20
Hispanic or Latino (United States of America)                          3.20
Two or More Races (United States of America)                           3.20
Name: performance_rating, dtype: float64

In [77]:
seventeenrating_race_ethnicity = seventeen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_race_ethnicity

race_ethnicity
American Indian or Alaska Native (United States of America)            3.55
Native Hawaiian or Other Pacific Islander (United States of America)   3.50
Asian (United States of America)                                       3.40
Prefer Not to Disclose (United States of America)                      3.40
White (United States of America)                                       3.40
Hispanic or Latino (United States of America)                          3.30
Two or More Races (United States of America)                           3.30
Black or African American (United States of America)                   3.20
Name: performance_rating, dtype: float64

In [78]:
eighteenrating_race_ethnicity = eighteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_race_ethnicity

race_ethnicity
American Indian or Alaska Native (United States of America)            3.55
White (United States of America)                                       3.50
Asian (United States of America)                                       3.40
Native Hawaiian or Other Pacific Islander (United States of America)   3.40
Black or African American (United States of America)                   3.30
Hispanic or Latino (United States of America)                          3.30
Prefer Not to Disclose (United States of America)                      3.30
Two or More Races (United States of America)                           3.30
Name: performance_rating, dtype: float64

In [79]:
fifteenrating_gender_race = fifteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_gender_race

race_ethnicity                                                        gender                
American Indian or Alaska Native (United States of America)           Female                   3.50
Asian (United States of America)                                      Male                     3.50
White (United States of America)                                      Male                     3.50
American Indian or Alaska Native (United States of America)           Male                     3.40
Asian (United States of America)                                      Female                   3.40
White (United States of America)                                      Female                   3.40
Native Hawaiian or Other Pacific Islander (United States of America)  Male                     3.30
Prefer Not to Disclose (United States of America)                     Female                   3.30
Two or More Races (United States of America)                          Female                   3.30
Hispani

In [80]:
sixteenrating_gender_race = sixteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_gender_race

race_ethnicity                                                        gender                
Native Hawaiian or Other Pacific Islander (United States of America)  Female                   4.10
Asian (United States of America)                                      Female                   3.40
White (United States of America)                                      Female                   3.40
                                                                      Male                     3.40
American Indian or Alaska Native (United States of America)           Female                   3.30
Asian (United States of America)                                      Male                     3.30
Native Hawaiian or Other Pacific Islander (United States of America)  Male                     3.30
Prefer Not to Disclose (United States of America)                     Female                   3.30
Black or African American (United States of America)                  Female                   3.25
America

In [81]:
seventeenrating_gender_race = seventeen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_gender_race

race_ethnicity                                                        gender                
Native Hawaiian or Other Pacific Islander (United States of America)  Female                   4.00
American Indian or Alaska Native (United States of America)           Female                   3.70
Prefer Not to Disclose (United States of America)                     Female                   3.50
Asian (United States of America)                                      Female                   3.40
White (United States of America)                                      Female                   3.40
                                                                      Male                     3.40
Asian (United States of America)                                      Male                     3.35
Hispanic or Latino (United States of America)                         Female                   3.30
                                                                      Male                     3.30
Two or 

In [82]:
eighteenrating_gender_race = eighteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_gender_race

race_ethnicity                                                        gender                
American Indian or Alaska Native (United States of America)           Female                   3.70
Prefer Not to Disclose (United States of America)                     Female                   3.55
White (United States of America)                                      Male                     3.50
Asian (United States of America)                                      Female                   3.40
                                                                      Male                     3.40
Native Hawaiian or Other Pacific Islander (United States of America)  Male                     3.40
White (United States of America)                                      Female                   3.40
Two or More Races (United States of America)                          Female                   3.35
Black or African American (United States of America)                  Male                     3.30
Hispani

### Employee pay changes

In [83]:
reason_for_change = reason_for_change_combined.groupby(['business_process_reason']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change)

Unnamed: 0_level_0,count_nonzero
business_process_reason,Unnamed: 1_level_1
Request Compensation Change > Adjustment > Contract Increase,4149
Merit > Performance > Annual Performance Appraisal,2723
Request Compensation Change > Adjustment > Change Plan Assignment,1399
Data Change > Data Change > Change Job Details,800
Request Compensation Change > Adjustment > Market Adjustment,639
Transfer > Transfer > Move to another manager,527
Promotion > Promotion > Promotion,489
Hire Employee > New Hire > Fill Vacancy,345
Hire Employee > New Hire > New Position,244
Request Compensation Change > Adjustment > Increased Job Responsibilities,76


In [84]:
reason_for_change_gender = reason_for_change_combined.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,gender,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,Female,2169
Request Compensation Change > Adjustment > Contract Increase,Male,1975
Merit > Performance > Annual Performance Appraisal,Female,1399
Merit > Performance > Annual Performance Appraisal,Male,1322
Request Compensation Change > Adjustment > Change Plan Assignment,Female,809
Request Compensation Change > Adjustment > Change Plan Assignment,Male,588
Data Change > Data Change > Change Job Details,Female,433
Request Compensation Change > Adjustment > Market Adjustment,Female,389
Data Change > Data Change > Change Job Details,Male,367
Promotion > Promotion > Promotion,Female,321


In [85]:
reason_for_change_race = reason_for_change_combined.groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,race_ethnicity,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),2656
Merit > Performance > Annual Performance Appraisal,White (United States of America),1775
Request Compensation Change > Adjustment > Change Plan Assignment,White (United States of America),882
Request Compensation Change > Adjustment > Contract Increase,Black or African American (United States of America),761
Data Change > Data Change > Change Job Details,White (United States of America),538
...,...,...
Hire Employee > New Hire > Convert Contingent,Asian (United States of America),5
Lateral Move > Lateral Move > Change Job Profile,Black or African American (United States of America),5
Hire Employee > Rehire > Fill Vacancy,Asian (United States of America),5
Data Change > Data Change > Change Job Details,Prefer Not to Disclose (United States of America),5


In [86]:
reason_for_change_race_gender = reason_for_change_combined.groupby(['business_process_reason','race_ethnicity','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero
business_process_reason,race_ethnicity,gender,Unnamed: 3_level_1
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),Female,1356
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),Male,1295
Merit > Performance > Annual Performance Appraisal,White (United States of America),Male,899
Merit > Performance > Annual Performance Appraisal,White (United States of America),Female,874
Request Compensation Change > Adjustment > Change Plan Assignment,White (United States of America),Female,497
...,...,...,...
Hire Employee > New Hire > Convert Contingent,Black or African American (United States of America),Female,5
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),Prefer not to disclose,5
Request Compensation Change > Adjustment > Contract Increase,Native Hawaiian or Other Pacific Islander (United States of America),Male,5
Transfer > Transfer > Transfer between companies,White (United States of America),Male,5


## News

### Gender

In [87]:
current_news_gender_salaried = news_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_gender_salaried)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,336
Male,320


In [88]:
current_news_gender_hourly = news_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_gender_hourly)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,50
Male,33


In [89]:
current_news_gender_salaried_median = news_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,336,102700.04
Male,320,120976.6


In [90]:
current_news_gender_hourly_median = news_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,50,35.9
Male,33,33.77


In [91]:
current_news_gender_age_salaried = news_salaried.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_news_gender_age_salaried

gender
Male                     42.00
Female                   35.50
Prefer not to disclose   30.00
Name: age, dtype: float64

In [92]:
current_news_gender_age_hourly = news_hourly.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_news_gender_age_hourly

gender
Female   34.00
Male     34.00
Name: age, dtype: float64

In [93]:
current_news_gender_age_5_salary = news_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,13.0,69280.0
25-29,Female,79.0,83780.0
25-29,Male,32.0,80830.0
30-34,Female,61.0,93000.0
30-34,Male,44.0,99170.0
35-39,Female,58.0,105790.0
35-39,Male,60.0,127373.27
40-44,Female,28.0,121560.0
40-44,Male,43.0,131410.0
45-49,Female,23.0,133000.0


In [94]:
current_news_gender_age_5_hourly = news_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,11,32.65
25-29,Female,8,33.85
25-29,Male,7,17.3
30-34,Female,7,33.33
30-34,Male,7,37.18
50-54,Female,5,44.67
65+,Female,5,44.46


In [95]:
current_news_gender_age_10_salary = news_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,13.0,69280.0
25-34,Female,140.0,90000.0
25-34,Male,76.0,90140.0
35-44,Female,86.0,110016.26
35-44,Male,103.0,130000.0
45-54,Female,50.0,125350.0
45-54,Male,71.0,127840.0
55-64,Female,42.0,140669.27
55-64,Male,54.0,144111.84
65+,Female,5.0,162355.42


In [96]:
current_news_gender_age_10_hourly = news_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,11,32.65
25-34,Female,15,33.85
25-34,Male,14,23.56
35-44,Female,5,36.05
35-44,Male,5,32.77
45-54,Female,7,51.3
55-64,Female,7,42.94
55-64,Male,7,34.77
65+,Female,5,44.46


In [97]:
current_news_gender_salaried_under_40 = news_salaried[news_salaried['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,211,90440.0
Male,139,103400.0


In [98]:
current_news_gender_salaried_over_40 = news_salaried[news_salaried['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,125,128280.0
Male,181,131410.0


In [99]:
current_news_gender_hourly_under_40 = news_hourly[news_hourly['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,28,33.85
Male,20,31.12


In [100]:
current_news_gender_hourly_over_40 = news_hourly[news_hourly['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,22,44.56
Male,13,34.18


### Race and ethnicity

In [101]:
current_news_race_salaried = news_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_salaried)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),437
Black or African American (United States of America),60
Asian (United States of America),59
Hispanic or Latino (United States of America),42
Two or More Races (United States of America),18
Prefer Not to Disclose (United States of America),12


In [102]:
current_news_race_hourly = news_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_hourly)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),58
Asian (United States of America),11
Black or African American (United States of America),8


In [103]:
current_news_race_group_salaried = news_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_group_salaried)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
white,437
person of color,181
unknown,39


In [104]:
current_news_race_group_hourly = news_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_group_hourly)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
white,58
person of color,24


In [105]:
current_news_race_median_salaried = news_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),437,113809.68
Asian (United States of America),59,103970.05
Black or African American (United States of America),60,102700.04
Hispanic or Latino (United States of America),42,95780.04
Prefer Not to Disclose (United States of America),12,94890.0
Two or More Races (United States of America),18,91090.0


In [106]:
current_news_race_median_hourly = news_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),58,35.4
Asian (United States of America),11,34.28
Black or African American (United States of America),8,23.22


In [107]:
current_news_race_group_median_salaried = news_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_group_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,39,130000.0
white,437,113809.68
person of color,181,98435.0


In [108]:
current_news_race_group_median_hourly = news_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_group_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,58,35.4
person of color,24,33.59


In [109]:
current_news_race_age_salaried = news_salaried.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_news_race_age_salaried

race_ethnicity
American Indian or Alaska Native (United States of America)            46.00
Native Hawaiian or Other Pacific Islander (United States of America)   45.00
White (United States of America)                                       41.00
Black or African American (United States of America)                   39.00
Hispanic or Latino (United States of America)                          35.50
Asian (United States of America)                                       35.00
Prefer Not to Disclose (United States of America)                      31.00
Two or More Races (United States of America)                           30.00
Name: age, dtype: float64

In [110]:
current_news_race_age_hourly = news_hourly.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_news_race_age_hourly

race_ethnicity
American Indian or Alaska Native (United States of America)   71.00
White (United States of America)                              36.50
Black or African American (United States of America)          30.50
Asian (United States of America)                              28.00
Hispanic or Latino (United States of America)                 25.00
Two or More Races (United States of America)                  22.00
Name: age, dtype: float64

In [111]:
current_news_race_age_5_salary = news_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,68000.0
25-29,Asian (United States of America),17.0,90000.0
25-29,Black or African American (United States of America),10.0,82000.0
25-29,Hispanic or Latino (United States of America),12.0,86690.0
25-29,Two or More Races (United States of America),7.0,79000.0
25-29,White (United States of America),60.0,80670.0
30-34,Asian (United States of America),9.0,120000.0
30-34,Black or African American (United States of America),11.0,92000.0
30-34,Hispanic or Latino (United States of America),8.0,93894.94
30-34,Prefer Not to Disclose (United States of America),5.0,95000.0


In [112]:
current_news_race_age_5_hourly = news_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),6.0,32.7
25-29,White (United States of America),10.0,20.88
30-34,White (United States of America),10.0,34.36
35-39,White (United States of America),5.0,36.05
50-54,White (United States of America),6.0,39.42
55-59,White (United States of America),5.0,39.04
60-64,White (United States of America),7.0,35.69


In [113]:
current_news_race_age_10_salary = news_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,68000.0
25-34,Asian (United States of America),26.0,90530.0
25-34,Black or African American (United States of America),21.0,90000.0
25-34,Hispanic or Latino (United States of America),20.0,91254.94
25-34,Prefer Not to Disclose (United States of America),9.0,90000.0
25-34,Two or More Races (United States of America),10.0,89500.0
25-34,White (United States of America),122.0,88870.0
35-44,Asian (United States of America),20.0,110016.26
35-44,Black or African American (United States of America),17.0,131090.0
35-44,Hispanic or Latino (United States of America),15.0,96780.08


In [114]:
current_news_race_age_10_hourly = news_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),6.0,32.7
25-34,White (United States of America),20.0,32.48
35-44,White (United States of America),7.0,35.4
45-54,White (United States of America),9.0,44.67
55-64,White (United States of America),12.0,37.36


In [115]:
current_news_race_group_age_5_salary = news_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,80060.0
<25,white,9.0,68000.0
25-29,person of color,46.0,85641.0
25-29,unknown,5.0,90000.0
25-29,white,60.0,80670.0
30-34,person of color,31.0,93000.0
30-34,unknown,13.0,98340.0
30-34,white,62.0,95104.86
35-39,person of color,29.0,108000.0
35-39,unknown,11.0,133560.0


In [116]:
current_news_race_group_age_5_hourly = news_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,8.0,33.7
<25,white,6.0,32.7
25-29,person of color,5.0,25.41
25-29,white,10.0,20.88
30-34,white,10.0,34.36
35-39,white,5.0,36.05
50-54,white,6.0,39.42
55-59,white,5.0,39.04
60-64,white,7.0,35.69


In [117]:
current_news_race_group_age_10_salary = news_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,80060.0
<25,white,9.0,68000.0
25-34,person of color,77.0,90000.0
25-34,unknown,18.0,92500.0
25-34,white,122.0,88870.0
35-44,person of color,57.0,110560.0
35-44,unknown,12.0,135060.0
35-44,white,120.0,119280.0
45-54,person of color,28.0,116206.39
45-54,white,89.0,129690.0


In [118]:
current_news_race_group_age_10_hourly = news_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,8.0,33.7
<25,white,6.0,32.7
25-34,person of color,8.0,29.37
25-34,white,20.0,32.48
35-44,white,7.0,35.4
45-54,white,9.0,44.67
55-64,white,12.0,37.36


In [119]:
current_news_race_under_40_salaried = news_salaried[news_salaried['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_under_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),209,96000.0
Prefer Not to Disclose (United States of America),11,94780.0
Asian (United States of America),41,94280.0
Hispanic or Latino (United States of America),28,92780.0
Black or African American (United States of America),30,92250.0
Two or More Races (United States of America),14,89500.0


In [120]:
current_news_race_over_40_salaried = news_salaried[news_salaried['age'] > 39].groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_over_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,10,151785.44
white,228,130799.71
person of color,68,121874.57


In [121]:
current_news_race_under_40_hourly = news_hourly[news_hourly['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_under_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),31,33.33
Asian (United States of America),7,32.4
Black or African American (United States of America),5,21.71


In [122]:
current_news_race_over_40_hourly = news_hourly[news_hourly['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_over_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),27,39.62


### Gender x race/ethnicity

In [123]:
current_news_race_gender_salaried = news_salaried.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,43
Asian (United States of America),Male,16
Black or African American (United States of America),Female,31
Black or African American (United States of America),Male,29
Hispanic or Latino (United States of America),Female,21
Hispanic or Latino (United States of America),Male,21
Prefer Not to Disclose (United States of America),Female,5
Prefer Not to Disclose (United States of America),Male,7
Two or More Races (United States of America),Female,13
Two or More Races (United States of America),Male,5


In [124]:
current_news_race_gender_hourly = news_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,8
White (United States of America),Female,33
White (United States of America),Male,25


In [125]:
current_news_race_gender_median_salaried = news_salaried.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_median_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,109,94840.0
person of color,Male,72,104420.04
unknown,Female,20,116890.0
unknown,Male,19,133560.0
white,Female,207,105780.0
white,Male,229,123796.94


In [126]:
current_news_race_gender_median_hourly = news_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_median_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,8,33.34
White (United States of America),Female,33,38.46
White (United States of America),Male,25,33.77


In [127]:
current_news_race_gender_under_40_salaried = news_salaried[news_salaried['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_under_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,31,94280.0
Asian (United States of America),Male,10,100200.0
Black or African American (United States of America),Female,18,87281.0
Black or African American (United States of America),Male,12,110000.0
Hispanic or Latino (United States of America),Female,20,92780.0
Hispanic or Latino (United States of America),Male,8,91260.0
Prefer Not to Disclose (United States of America),Female,5,90000.0
Prefer Not to Disclose (United States of America),Male,6,96560.0
Two or More Races (United States of America),Female,10,90390.0
White (United States of America),Female,116,90030.0


In [128]:
current_news_race_gender_under_40_hourly = news_hourly[news_hourly['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_under_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,5,32.4
White (United States of America),Female,16,33.59
White (United States of America),Male,15,32.75


In [129]:
current_news_race_gender_over_40_salaried = news_salaried[news_salaried['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_over_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,12,114484.76
Asian (United States of America),Male,6,125812.5
Black or African American (United States of America),Female,13,125700.0
Black or African American (United States of America),Male,17,114488.29
Hispanic or Latino (United States of America),Male,13,98411.56
White (United States of America),Female,91,127340.0
White (United States of America),Male,137,137799.88


In [130]:
current_news_race_gender_over_40_hourly = news_hourly[news_hourly['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_over_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
White (United States of America),Female,17,44.67
White (United States of America),Male,10,33.98


### Years of service

In [131]:
current_news_yos_salary = news_salaried.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_salary)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,56,90000.0
1-2,139,92780.0
3-5,154,102210.0
6-10,117,115000.0
11-15,48,125342.5
16-20,51,120250.77
21-25,53,140206.92
25+,39,138517.37


In [132]:
current_news_yos_hourly = news_hourly.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_hourly)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,13,35.9
1-2,21,21.31
3-5,13,33.85
6-10,10,38.89
11-15,6,37.55
16-20,8,38.19
21-25,7,35.69
25+,5,39.62


In [133]:
current_news_yos_gender_salary = news_salaried.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,38.0,90000.0
0,Male,18.0,107890.0
1-2,Female,82.0,88870.0
1-2,Male,56.0,105170.0
3-5,Female,78.0,95000.0
3-5,Male,76.0,108395.0
6-10,Female,53.0,110148.5
6-10,Male,64.0,118235.5
11-15,Female,21.0,104618.47
11-15,Male,27.0,139410.0


In [134]:
current_news_yos_gender_hourly = news_hourly.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,12,35.9
1-2,Female,12,21.16
1-2,Male,9,21.71
3-5,Female,5,40.8
3-5,Male,8,33.77
6-10,Male,6,35.38
16-20,Female,5,44.46
21-25,Female,5,40.63


In [135]:
current_news_yos_race_salary = news_salaried.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Asian (United States of America),7.0,90000.0
0,Black or African American (United States of America),11.0,95000.0
0,Hispanic or Latino (United States of America),8.0,102890.0
0,Two or More Races (United States of America),5.0,90000.0
0,White (United States of America),20.0,81390.0
1-2,Asian (United States of America),17.0,90000.0
1-2,Black or African American (United States of America),12.0,92640.0
1-2,Hispanic or Latino (United States of America),12.0,80890.0
1-2,Prefer Not to Disclose (United States of America),6.0,92390.0
1-2,Two or More Races (United States of America),5.0,89000.0


In [136]:
current_news_yos_race_hourly = news_hourly.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Asian (United States of America),5.0,37.44
1-2,White (United States of America),15.0,25.91
3-5,White (United States of America),10.0,35.48
6-10,White (United States of America),9.0,37.99
16-20,White (United States of America),6.0,47.92
21-25,White (United States of America),7.0,35.69


In [137]:
current_news_yos_race_gender_salary = news_salaried.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Female,20.0,95000.0
0,person of color,Male,11.0,92000.0
0,white,Female,14.0,76250.0
0,white,Male,6.0,162500.0
1-2,person of color,Female,31.0,84780.0
1-2,person of color,Male,15.0,104560.0
1-2,unknown,Female,9.0,102780.0
1-2,white,Female,42.0,89120.0
1-2,white,Male,37.0,107560.0
3-5,person of color,Female,26.0,93420.0


In [138]:
current_news_yos_race_gender_hourly = news_hourly.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Female,8.0,35.45
1-2,white,Female,9.0,25.91
1-2,white,Male,6.0,24.57
3-5,white,Male,7.0,33.77
6-10,white,Male,5.0,32.77
21-25,white,Female,5.0,40.63


### Age

In [139]:
current_median_news_age_5_salaried = news_salaried.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,16,70000.0
25-29,111,82780.0
30-34,106,95000.0
35-39,118,115975.0
40-44,71,128280.0
45-49,51,126840.0
50-54,70,121579.96
55-59,47,130000.0
60-64,49,155522.35
65+,18,140134.9


In [140]:
current_median_news_age_5_hourly = news_hourly.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,14,32.7
25-29,15,21.71
30-34,14,34.62
35-39,5,36.05
40-44,5,27.41
50-54,7,44.67
55-59,7,39.04
60-64,7,35.69
65+,6,48.15


In [141]:
current_median_news_age_10_salaried = news_salaried.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,16,70000.0
25-34,217,90000.0
35-44,189,120000.0
45-54,121,126572.33
55-64,96,143592.68
65+,18,140134.9


In [142]:
current_median_news_age_10_hourly = news_hourly.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,14,32.7
25-34,29,33.33
35-44,10,34.09
45-54,10,47.98
55-64,14,37.36
65+,6,48.15


In [143]:
current_news_age_5_yos_salary = news_salaried.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_5_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,7.0,68000.0
<25,1-2,9.0,75780.0
25-29,0,24.0,90000.0
25-29,1-2,40.0,79780.0
25-29,3-5,41.0,82780.0
25-29,6-10,6.0,102972.57
30-34,0,11.0,90000.0
30-34,1-2,36.0,90000.0
30-34,3-5,33.0,98340.0
30-34,6-10,25.0,100000.0


In [144]:
current_news_age_5_yos_hourly = news_hourly.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_5_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,7.0,35.9
<25,1-2,7.0,21.0
25-29,1-2,9.0,21.31
30-34,3-5,6.0,37.48


In [145]:
current_news_age_10_yos_salary = news_salaried.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_10_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,7.0,68000.0
<25,1-2,9.0,75780.0
25-34,0,35.0,90000.0
25-34,1-2,76.0,85921.0
25-34,3-5,74.0,90000.0
25-34,6-10,31.0,101625.0
35-44,0,13.0,120780.0
35-44,1-2,38.0,111485.0
35-44,3-5,51.0,120840.0
35-44,6-10,44.0,120560.0


In [146]:
current_news_age_10_yos_hourly = news_hourly.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_10_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,7.0,35.9
<25,1-2,7.0,21.0
25-34,1-2,12.0,21.11
25-34,3-5,10.0,35.52
55-64,21-25,5.0,35.69


In [147]:
current_median_news_age_5_gender_salaried = news_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,13.0,69280.0
25-29,Female,79.0,83780.0
25-29,Male,32.0,80830.0
30-34,Female,61.0,93000.0
30-34,Male,44.0,99170.0
35-39,Female,58.0,105790.0
35-39,Male,60.0,127373.27
40-44,Female,28.0,121560.0
40-44,Male,43.0,131410.0
45-49,Female,23.0,133000.0


In [148]:
current_median_news_age_5_gender_hourly = news_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,11,32.65
25-29,Female,8,33.85
25-29,Male,7,17.3
30-34,Female,7,33.33
30-34,Male,7,37.18
50-54,Female,5,44.67
65+,Female,5,44.46


In [149]:
current_median_news_age_10_gender_salaried = news_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,13.0,69280.0
25-34,Female,140.0,90000.0
25-34,Male,76.0,90140.0
35-44,Female,86.0,110016.26
35-44,Male,103.0,130000.0
45-54,Female,50.0,125350.0
45-54,Male,71.0,127840.0
55-64,Female,42.0,140669.27
55-64,Male,54.0,144111.84
65+,Female,5.0,162355.42


In [150]:
current_median_news_age_10_gender_hourly = news_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,11,32.65
25-34,Female,15,33.85
25-34,Male,14,23.56
35-44,Female,5,36.05
35-44,Male,5,32.77
45-54,Female,7,51.3
55-64,Female,7,42.94
55-64,Male,7,34.77
65+,Female,5,44.46


In [151]:
current_median_news_age_5_race_salaried = news_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,68000.0
25-29,Asian (United States of America),17.0,90000.0
25-29,Black or African American (United States of America),10.0,82000.0
25-29,Hispanic or Latino (United States of America),12.0,86690.0
25-29,Two or More Races (United States of America),7.0,79000.0
25-29,White (United States of America),60.0,80670.0
30-34,Asian (United States of America),9.0,120000.0
30-34,Black or African American (United States of America),11.0,92000.0
30-34,Hispanic or Latino (United States of America),8.0,93894.94
30-34,Prefer Not to Disclose (United States of America),5.0,95000.0


In [152]:
current_median_news_age_5_race_hourly = news_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),6.0,32.7
25-29,White (United States of America),10.0,20.88
30-34,White (United States of America),10.0,34.36
35-39,White (United States of America),5.0,36.05
50-54,White (United States of America),6.0,39.42
55-59,White (United States of America),5.0,39.04
60-64,White (United States of America),7.0,35.69


In [153]:
current_median_news_age_5_race_group_salaried = news_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,80060.0
<25,white,9.0,68000.0
25-29,person of color,46.0,85641.0
25-29,unknown,5.0,90000.0
25-29,white,60.0,80670.0
30-34,person of color,31.0,93000.0
30-34,unknown,13.0,98340.0
30-34,white,62.0,95104.86
35-39,person of color,29.0,108000.0
35-39,unknown,11.0,133560.0


In [154]:
current_median_news_age_5_race_group_hourly = news_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,8.0,33.7
<25,white,6.0,32.7
25-29,person of color,5.0,25.41
25-29,white,10.0,20.88
30-34,white,10.0,34.36
35-39,white,5.0,36.05
50-54,white,6.0,39.42
55-59,white,5.0,39.04
60-64,white,7.0,35.69


In [155]:
current_median_news_age_10_race_salaried = news_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,68000.0
25-34,Asian (United States of America),26.0,90530.0
25-34,Black or African American (United States of America),21.0,90000.0
25-34,Hispanic or Latino (United States of America),20.0,91254.94
25-34,Prefer Not to Disclose (United States of America),9.0,90000.0
25-34,Two or More Races (United States of America),10.0,89500.0
25-34,White (United States of America),122.0,88870.0
35-44,Asian (United States of America),20.0,110016.26
35-44,Black or African American (United States of America),17.0,131090.0
35-44,Hispanic or Latino (United States of America),15.0,96780.08


In [156]:
current_median_news_age_10_race_hourly = news_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),6.0,32.7
25-34,White (United States of America),20.0,32.48
35-44,White (United States of America),7.0,35.4
45-54,White (United States of America),9.0,44.67
55-64,White (United States of America),12.0,37.36


In [157]:
current_median_news_age_10_race_group_salaried = news_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,80060.0
<25,white,9.0,68000.0
25-34,person of color,77.0,90000.0
25-34,unknown,18.0,92500.0
25-34,white,122.0,88870.0
35-44,person of color,57.0,110560.0
35-44,unknown,12.0,135060.0
35-44,white,120.0,119280.0
45-54,person of color,28.0,116206.39
45-54,white,89.0,129690.0


In [158]:
current_median_news_age_10_race_group_hourly = news_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,8.0,33.7
<25,white,6.0,32.7
25-34,person of color,8.0,29.37
25-34,white,20.0,32.48
35-44,white,7.0,35.4
45-54,white,9.0,44.67
55-64,white,12.0,37.36


In [159]:
current_median_news_age_5_race_gender_salaried = news_salaried.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,White (United States of America),Female,8.0,66640.0
25-29,Asian (United States of America),Female,14.0,90000.0
25-29,Black or African American (United States of America),Female,8.0,79640.0
25-29,Hispanic or Latino (United States of America),Female,9.0,92500.0
25-29,White (United States of America),Female,41.0,80855.0
25-29,White (United States of America),Male,19.0,80560.0
30-34,Asian (United States of America),Female,6.0,128420.0
30-34,Black or African American (United States of America),Female,8.0,90780.0
30-34,Hispanic or Latino (United States of America),Female,5.0,97780.0
30-34,White (United States of America),Female,32.0,90383.33


In [160]:
current_median_news_age_5_race_gender_hourly = news_hourly.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-29,White (United States of America),Female,6.0,34.88
30-34,White (United States of America),Male,6.0,37.48
50-54,White (United States of America),Female,5.0,44.67


In [161]:
current_median_news_age_5_race_group_gender_salaried = news_salaried.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,5.0,72500.0
<25,white,Female,8.0,66640.0
25-29,person of color,Female,35.0,90000.0
25-29,person of color,Male,11.0,80880.0
25-29,white,Female,41.0,80855.0
25-29,white,Male,19.0,80560.0
30-34,person of color,Female,21.0,94280.0
30-34,person of color,Male,10.0,90500.0
30-34,unknown,Female,8.0,92500.0
30-34,unknown,Male,5.0,130000.0


In [162]:
current_median_news_age_5_race_group_gender_hourly = news_hourly.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,7.0,32.4
25-29,white,Female,6.0,34.88
30-34,white,Male,6.0,37.48
50-54,white,Female,5.0,44.67


In [163]:
current_median_news_age_10_race_gender_salaried = news_salaried.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,White (United States of America),Female,8.0,66640.0
25-34,Asian (United States of America),Female,20.0,92670.0
25-34,Asian (United States of America),Male,6.0,86500.0
25-34,Black or African American (United States of America),Female,16.0,87281.0
25-34,Black or African American (United States of America),Male,5.0,92000.0
25-34,Hispanic or Latino (United States of America),Female,14.0,92780.0
25-34,Hispanic or Latino (United States of America),Male,6.0,80330.0
25-34,Prefer Not to Disclose (United States of America),Male,5.0,98340.0
25-34,Two or More Races (United States of America),Female,6.0,90390.0
25-34,White (United States of America),Female,73.0,85740.0


In [164]:
current_median_news_age_10_race_gender_hourly = news_hourly.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-34,White (United States of America),Female,10.0,32.48
25-34,White (United States of America),Male,10.0,28.05
45-54,White (United States of America),Female,7.0,51.3
55-64,White (United States of America),Female,5.0,42.94
55-64,White (United States of America),Male,7.0,34.77


In [165]:
current_median_news_age_10_race_group_gender_salaried = news_salaried.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,5.0,72500.0
<25,white,Female,8.0,66640.0
25-34,person of color,Female,56.0,90394.94
25-34,person of color,Male,21.0,88000.0
25-34,unknown,Female,11.0,90000.0
25-34,unknown,Male,7.0,102780.0
25-34,white,Female,73.0,85740.0
25-34,white,Male,48.0,92140.0
35-44,person of color,Female,32.0,109112.01
35-44,person of color,Male,25.0,124345.0


In [166]:
current_median_news_age_10_race_group_gender_hourly = news_hourly.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,7.0,32.4
25-34,white,Female,10.0,32.48
25-34,white,Male,10.0,28.05
45-54,white,Female,7.0,51.3
55-64,white,Female,5.0,42.94
55-64,white,Male,7.0,34.77


### Desks

In [167]:
current_news_median_desk_salaried = news_salaried.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_salaried)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
National,114,158242.11
Foreign,33,142780.0
non-newsroom,18,140840.0
Financial,43,137140.0
Editorial,33,128560.27
Local,68,113140.0
Style,50,111833.47
Universal Desk,6,107876.01
Sports,43,107560.0
Outlook,6,105497.5


In [168]:
current_news_median_desk_hourly = news_hourly.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_hourly)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
Universal Desk,8,44.16
Editorial,8,43.27
National,11,33.85
Multiplatform,7,32.77
Sports,16,27.74
Operations,8,22.68
Style,6,20.95


In [169]:
current_news_median_desk_gender_salaried = news_salaried.groupby(['desk','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
National,Male,60,178737.5
Editorial,Male,18,156476.68
non-newsroom,Male,11,151780.0
Foreign,Male,16,150780.0
National,Female,54,145602.97
Financial,Male,27,137340.0
Financial,Female,16,133666.0
Foreign,Female,17,129780.0
non-newsroom,Female,7,129060.0
Sports,Female,10,125935.0


In [170]:
current_news_median_desk_gender_hourly = news_hourly.groupby(['desk','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Universal Desk,Female,5,43.87
Editorial,Female,6,43.27
National,Female,8,39.42
Sports,Male,13,33.77
Style,Female,5,20.91


In [171]:
current_news_median_desk_race_salaried = news_salaried.groupby(['desk','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
National,white,83,177838.45
non-newsroom,white,13,151780.0
Foreign,unknown,25,144340.0
Financial,white,33,137799.88
National,person of color,28,135595.0
Editorial,white,25,134806.25
Financial,person of color,8,131420.0
Foreign,white,6,124843.27
Style,white,38,115471.31
Local,white,49,113280.0


In [172]:
current_news_median_desk_race_hourly = news_hourly.groupby(['desk','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Universal Desk,White (United States of America),6,44.16
Editorial,White (United States of America),6,39.49
National,White (United States of America),7,33.85
Sports,White (United States of America),15,33.77
Operations,White (United States of America),5,28.54


In [173]:
current_news_median_desk_race_gender_salaried = news_salaried.groupby(['desk','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
National,White (United States of America),Male,47,186065.0
Sports,White (United States of America),Female,5,160089.98
Editorial,White (United States of America),Male,15,158673.36
non-newsroom,White (United States of America),Male,9,151840.0
National,White (United States of America),Female,36,148550.0
National,Asian (United States of America),Female,8,144985.0
Financial,White (United States of America),Male,23,142027.5
Sports,Black or African American (United States of America),Male,5,136340.0
National,Black or African American (United States of America),Male,7,136190.0
Financial,White (United States of America),Female,10,129876.0


In [174]:
current_news_median_desk_race_gender_hourly = news_hourly.groupby(['desk','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Editorial,White (United States of America),Female,5,42.94
National,White (United States of America),Female,5,33.85
Sports,White (United States of America),Male,12,33.77


In [175]:
current_news_median_desk_race_group_gender_salaried = news_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
National,white,Male,47,186065.0
Sports,white,Female,5,160089.98
Editorial,white,Male,15,158673.36
non-newsroom,white,Male,9,151840.0
National,white,Female,36,148550.0
Foreign,unknown,Male,12,147670.0
Financial,white,Male,23,142027.5
Financial,person of color,Female,6,141070.0
National,person of color,Female,16,137890.0
Foreign,unknown,Female,13,136560.0


In [176]:
current_news_median_desk_race_group_gender_hourly = news_hourly.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Editorial,white,Female,5,42.94
National,white,Female,5,33.85
Sports,white,Male,12,33.77


In [177]:
current_news_median_desk_race_gender_age5_salaried = news_salaried.groupby(['desk','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
National,White (United States of America),Female,50-54,5.0,194690.0
National,White (United States of America),Male,50-54,5.0,191325.0
National,White (United States of America),Male,60-64,6.0,185984.24
National,White (United States of America),Male,35-39,12.0,184065.0
National,White (United States of America),Male,40-44,10.0,177347.15
National,White (United States of America),Female,55-59,6.0,153515.0
National,White (United States of America),Female,45-49,5.0,147910.0
Financial,White (United States of America),Male,40-44,5.0,146315.0
Sports,White (United States of America),Male,35-39,7.0,143680.0
Sports,White (United States of America),Male,50-54,7.0,140206.92


In [178]:
current_news_median_desk_race_gender_age5_hourly = news_hourly.groupby(['desk','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


In [179]:
current_news_median_desk_race_group_gender_age5_salaried = news_salaried.groupby(['desk','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
National,white,Female,50-54,5.0,194690.0
National,white,Male,50-54,5.0,191325.0
National,white,Male,60-64,6.0,185984.24
National,white,Male,35-39,12.0,184065.0
National,white,Male,40-44,10.0,177347.15
National,white,Female,55-59,6.0,153515.0
National,white,Female,45-49,5.0,147910.0
Financial,white,Male,40-44,5.0,146315.0
Sports,white,Male,35-39,7.0,143680.0
Sports,white,Male,50-54,7.0,140206.92


In [180]:
current_news_median_desk_race_group_gender_age5_hourly = news_hourly.groupby(['desk','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


In [181]:
current_news_median_desk_tier_salaried = news_salaried.groupby(['tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_salaried)

Unnamed: 0_level_0,count_nonzero,median
tier,Unnamed: 1_level_1,Unnamed: 2_level_1
Tier 1,190,144973.59
other,18,140840.0
Tier 2,227,110453.45
Tier 3,189,91625.25
Tier 4,33,78000.0


In [182]:
current_news_median_desk_tier_gender_salaried = news_salaried.groupby(['tier','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
tier,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Tier 1,Male,103,155780.0
other,Male,11,151780.0
Tier 1,Female,87,140780.0
other,Female,7,129060.0
Tier 2,Male,120,116620.55
Tier 2,Female,106,105539.28
Tier 3,Male,77,96780.08
Tier 3,Female,112,89080.0
Tier 4,Male,9,78340.0
Tier 4,Female,24,77280.0


In [183]:
current_news_median_desk_tier_race_salaried = news_salaried.groupby(['tier','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
tier,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Tier 1,White (United States of America),122,161670.0
other,White (United States of America),13,151780.0
Tier 1,Black or African American (United States of America),12,141307.97
Tier 1,Asian (United States of America),18,138960.0
Tier 1,Hispanic or Latino (United States of America),6,131455.0
Tier 2,White (United States of America),159,114669.36
Tier 2,Black or African American (United States of America),23,113000.0
Tier 2,Hispanic or Latino (United States of America),14,103710.0
Tier 1,Prefer Not to Disclose (United States of America),5,102780.0
Tier 2,Two or More Races (United States of America),8,96107.5


In [184]:
current_news_median_desk_tier_race_gender_salaried = news_salaried.groupby(['tier','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
tier,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Tier 1,White (United States of America),Male,72,173325.41
other,White (United States of America),Male,9,151840.0
Tier 1,White (United States of America),Female,50,143210.02
Tier 1,Asian (United States of America),Female,12,142890.0
Tier 1,Black or African American (United States of America),Male,8,134550.0
Tier 1,Asian (United States of America),Male,6,129735.0
Tier 2,Hispanic or Latino (United States of America),Male,8,127582.4
Tier 2,Black or African American (United States of America),Male,12,121454.57
Tier 2,White (United States of America),Male,91,116681.11
Tier 2,White (United States of America),Female,67,108840.0


In [185]:
current_news_median_desk_tier_race_group_gender_salaried = news_salaried.groupby(['tier','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
tier,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Tier 1,white,Male,72,173325.41
other,white,Male,9,151840.0
Tier 1,unknown,Male,15,144340.0
Tier 1,white,Female,50,143210.02
Tier 1,person of color,Female,22,138960.0
Tier 1,unknown,Female,15,136560.0
Tier 1,person of color,Male,16,135050.0
Tier 2,white,Male,91,116681.11
Tier 2,person of color,Male,28,115355.0
Tier 2,white,Female,67,108840.0


In [186]:
current_news_median_desk_tier_race_gender_age5_salaried = news_salaried.groupby(['tier','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
tier,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Tier 1,White (United States of America),Male,65+,5.0,192244.02
Tier 1,White (United States of America),Male,50-54,7.0,191325.0
Tier 1,White (United States of America),Male,60-64,8.0,182534.24
Tier 1,White (United States of America),Male,35-39,16.0,179875.0
Tier 1,White (United States of America),Male,45-49,6.0,176422.5
Tier 1,White (United States of America),Female,50-54,7.0,175340.0
Tier 1,White (United States of America),Male,40-44,15.0,174409.31
Tier 1,White (United States of America),Female,45-49,7.0,167910.0
Tier 2,White (United States of America),Female,60-64,7.0,155522.35
Tier 1,White (United States of America),Female,55-59,6.0,153515.0


In [187]:
current_news_median_desk_tier_race_group_gender_age5_salaried = news_salaried.groupby(['tier','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
tier,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Tier 1,white,Male,65+,5.0,192244.02
Tier 1,white,Male,50-54,7.0,191325.0
Tier 1,white,Male,60-64,8.0,182534.24
Tier 1,white,Male,35-39,16.0,179875.0
Tier 1,white,Male,45-49,6.0,176422.5
Tier 1,white,Female,50-54,7.0,175340.0
Tier 1,white,Male,40-44,15.0,174409.31
Tier 1,white,Female,45-49,7.0,167910.0
Tier 1,person of color,Male,35-39,5.0,158340.0
Tier 2,white,Female,60-64,7.0,155522.35


### Job profiles

In [188]:
current_news_median_job_salaried = news_salaried.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_salaried)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
300313 - Columnist - Editorial,13,225000.0
300113 - Columnist,14,182044.43
320113 - Critic,9,155522.35
330113 - Editorial Writer,7,138191.93
280212 - Staff Writer,351,129690.0
390510 - Graphics Editor,6,118747.0
360114 - Photographer,18,112837.62
370301 - Librarian,5,110780.0
126902 - Topic Editor,6,110581.73
392210 - Multiplatform Editor - Editorial,5,105215.73


In [189]:
current_news_median_job_hourly = news_hourly.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_hourly)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
397210 - Multiplatform Editor - Editorial (PT/PTOC),5,51.28
581709 - Administrative Assistant,6,38.34
280225 - Producer,11,37.44
397110 - Multiplatform Editor (PT/PTOC),15,35.0
410251 - Editorial Aide,12,23.02
430117 - News Aide,10,17.23


In [190]:
current_news_median_job_gender_salaried = news_salaried.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
300313 - Columnist - Editorial,Male,9,225000.0
300113 - Columnist,Female,8,184898.4
300113 - Columnist,Male,6,182044.43
320113 - Critic,Male,6,158340.0
280212 - Staff Writer,Male,184,136095.0
280212 - Staff Writer,Female,167,121910.0
360114 - Photographer,Male,13,114488.29
280226 - Video Journalist,Male,8,107000.0
390610 - Graphics Reporter,Male,6,100000.0
280226 - Video Journalist,Female,11,100000.0


In [191]:
current_news_median_job_gender_hourly = news_hourly.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
581709 - Administrative Assistant,Female,6,38.34
280225 - Producer,Female,8,36.67
397110 - Multiplatform Editor (PT/PTOC),Female,5,35.0
397110 - Multiplatform Editor (PT/PTOC),Male,10,34.79
410251 - Editorial Aide,Female,8,21.16
430117 - News Aide,Male,7,17.3


In [192]:
current_news_median_job_race_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
300313 - Columnist - Editorial,White (United States of America),9,234560.0
300113 - Columnist,White (United States of America),9,187340.0
320113 - Critic,White (United States of America),8,153431.17
330113 - Editorial Writer,White (United States of America),6,142293.98
280212 - Staff Writer,White (United States of America),237,130189.42
280212 - Staff Writer,Black or African American (United States of America),26,125350.0
280212 - Staff Writer,Asian (United States of America),31,124345.0
280226 - Video Journalist,White (United States of America),12,114500.0
360114 - Photographer,White (United States of America),12,113790.63
280212 - Staff Writer,Hispanic or Latino (United States of America),17,105780.0


In [193]:
current_news_median_job_race_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
280225 - Producer,White (United States of America),6,38.23
397110 - Multiplatform Editor (PT/PTOC),White (United States of America),12,34.79
410251 - Editorial Aide,White (United States of America),7,25.64
430117 - News Aide,White (United States of America),8,16.77


In [194]:
current_news_median_job_race_gender_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
300113 - Columnist,White (United States of America),Female,5,230020.51
300313 - Columnist - Editorial,White (United States of America),Male,6,229780.0
320113 - Critic,White (United States of America),Male,6,158340.0
280212 - Staff Writer,Hispanic or Latino (United States of America),Male,7,154710.0
280212 - Staff Writer,White (United States of America),Male,135,139410.0
280212 - Staff Writer,Black or African American (United States of America),Male,17,129951.95
280212 - Staff Writer,Asian (United States of America),Male,10,124952.5
280212 - Staff Writer,White (United States of America),Female,102,124720.0
280212 - Staff Writer,Asian (United States of America),Female,21,121910.0
360114 - Photographer,White (United States of America),Male,7,120816.68


In [195]:
current_news_median_job_race_gender_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
280225 - Producer,White (United States of America),Female,5,38.46
397110 - Multiplatform Editor (PT/PTOC),White (United States of America),Male,10,34.79
410251 - Editorial Aide,White (United States of America),Female,5,25.64
430117 - News Aide,White (United States of America),Male,6,16.85


In [196]:
current_news_median_job_race_group_gender_salaried = news_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
National,white,Male,47,186065.0
Sports,white,Female,5,160089.98
Editorial,white,Male,15,158673.36
non-newsroom,white,Male,9,151840.0
National,white,Female,36,148550.0
Foreign,unknown,Male,12,147670.0
Financial,white,Male,23,142027.5
Financial,person of color,Female,6,141070.0
National,person of color,Female,16,137890.0
Foreign,unknown,Female,13,136560.0


In [197]:
current_news_median_job_race_group_gender_hourly = news_hourly.groupby(['job_profile_current','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
280225 - Producer,white,Female,5,38.46
397110 - Multiplatform Editor (PT/PTOC),white,Male,10,34.79
410251 - Editorial Aide,white,Female,5,25.64
430117 - News Aide,white,Male,6,16.85


In [198]:
current_news_median_job_race_gender_age5_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
280212 - Staff Writer,White (United States of America),Male,60-64,16.0,167440.92
280212 - Staff Writer,White (United States of America),Female,65+,5.0,162355.42
280212 - Staff Writer,White (United States of America),Female,60-64,7.0,160089.98
280212 - Staff Writer,White (United States of America),Female,45-49,10.0,157910.0
280212 - Staff Writer,White (United States of America),Male,35-39,29.0,151780.0
280212 - Staff Writer,White (United States of America),Male,50-54,17.0,144410.0
280212 - Staff Writer,White (United States of America),Male,40-44,21.0,144340.0
280212 - Staff Writer,White (United States of America),Female,55-59,7.0,144190.0
280212 - Staff Writer,White (United States of America),Male,55-59,10.0,134154.41
280212 - Staff Writer,White (United States of America),Male,45-49,12.0,133814.4


In [199]:
current_news_median_job_race_gender_age5_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


In [200]:
current_news_median_job_race_group_gender_age5_salaried = news_salaried.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
280212 - Staff Writer,white,Male,60-64,16.0,167440.92
280212 - Staff Writer,white,Female,65+,5.0,162355.42
280212 - Staff Writer,white,Female,60-64,7.0,160089.98
280212 - Staff Writer,white,Female,45-49,10.0,157910.0
280212 - Staff Writer,white,Male,35-39,29.0,151780.0
280212 - Staff Writer,white,Male,50-54,17.0,144410.0
280212 - Staff Writer,white,Male,40-44,21.0,144340.0
280212 - Staff Writer,white,Female,55-59,7.0,144190.0
280212 - Staff Writer,unknown,Female,35-39,5.0,136560.0
280212 - Staff Writer,white,Male,55-59,10.0,134154.41


In [201]:
current_news_median_job_race_group_gender_age5_hourly = news_hourly.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


### Performance evaluations

In [202]:
news_ratings = ratings_combined[ratings_combined['dept'] == 'News']

In [203]:
news_ratings_gender = news_ratings.groupby(['gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(news_ratings_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,5772,3.5
Female,6539,3.4
Prefer not to disclose,26,3.2


In [204]:
news_ratings_race = news_ratings.groupby(['race_ethnicity']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(news_ratings_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
American Indian or Alaska Native (United States of America),39,3.5
White (United States of America),8138,3.5
Asian (United States of America),1222,3.4
Hispanic or Latino (United States of America),715,3.4
Prefer Not to Disclose (United States of America),182,3.4
Black or African American (United States of America),1287,3.3
Native Hawaiian or Other Pacific Islander (United States of America),13,3.3
Two or More Races (United States of America),338,3.3


In [205]:
news_ratings_race_gender = news_ratings.groupby(['race_ethnicity','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
American Indian or Alaska Native (United States of America),Female,26,3.6
American Indian or Alaska Native (United States of America),Male,13,3.2
Asian (United States of America),Female,884,3.4
Asian (United States of America),Male,338,3.4
Black or African American (United States of America),Female,676,3.3
Black or African American (United States of America),Male,611,3.3
Hispanic or Latino (United States of America),Female,351,3.4
Hispanic or Latino (United States of America),Male,364,3.4
Native Hawaiian or Other Pacific Islander (United States of America),Male,13,3.3
Prefer Not to Disclose (United States of America),Female,78,3.6


In [206]:
news_ratings_race_gender_under3 = news_ratings[news_ratings['performance_rating'] < 3.1].groupby(['race_grouping','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender_under3)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,109,3.0
person of color,Male,112,3.0
unknown,Female,5,3.0
unknown,Male,9,2.9
white,Female,195,3.0
white,Male,169,3.0


In [207]:
news_ratings_race_gender_over4 = news_ratings[news_ratings['performance_rating'] > 3.9].groupby(['race_grouping','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender_over4)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,50,4.1
person of color,Male,22,4.1
unknown,Female,8,4.1
unknown,Male,26,4.1
white,Female,205,4.1
white,Male,320,4.2


### Pay changes

In [208]:
news_change = reason_for_change_combined[reason_for_change_combined['dept'] == 'News']

In [209]:
news_change_gender = news_change.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(news_change_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,gender,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,Female,1449
Request Compensation Change > Adjustment > Contract Increase,Male,1445
Merit > Performance > Annual Performance Appraisal,Male,1029
Merit > Performance > Annual Performance Appraisal,Female,962
Request Compensation Change > Adjustment > Change Plan Assignment,Female,440
Request Compensation Change > Adjustment > Change Plan Assignment,Male,401
Data Change > Data Change > Change Job Details,Female,346
Data Change > Data Change > Change Job Details,Male,308
Request Compensation Change > Adjustment > Market Adjustment,Female,292
Transfer > Transfer > Move to another manager,Male,203


In [210]:
news_change_race = news_change.groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(news_change_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,race_ethnicity,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),2057
Merit > Performance > Annual Performance Appraisal,White (United States of America),1457
Request Compensation Change > Adjustment > Change Plan Assignment,White (United States of America),565
Data Change > Data Change > Change Job Details,White (United States of America),448
Request Compensation Change > Adjustment > Market Adjustment,White (United States of America),309
...,...,...
Data Change > Data Change > Change Job Profile,Black or African American (United States of America),5
Request Compensation Change > Adjustment > Contract Increase,Native Hawaiian or Other Pacific Islander (United States of America),5
Request Compensation Change > Adjustment > Job Change,Black or African American (United States of America),5
Request Compensation Change > Adjustment > Job Change,Asian (United States of America),5


### Performance evaluations x merit raises

In [211]:
reason_for_change_combined['merit_raises'] = reason_for_change_combined['business_process_reason'].str.contains('Merit', re.IGNORECASE)

In [212]:
twenty14 = np.datetime64('2016-04-01')
twenty15 = np.datetime64('2017-04-01')
twenty16 = np.datetime64('2018-04-01')
twenty17 = np.datetime64('2019-04-01')
twenty18 = np.datetime64('2020-04-01')

def raise_time(row):
    if row['effective_date'] < twenty14:
        return 'before 2015'
    if row['effective_date'] < twenty15:
        return '2015'
    if row['effective_date'] < twenty16:
        return '2016'
    if row['effective_date'] < twenty17:
        return '2017'
    if row['effective_date'] < twenty18:
        return '2018'
    return 'unknown'

reason_for_change_combined['raise_after'] = reason_for_change_combined.apply(lambda row: raise_time(row), axis=1)

In [213]:
merit_raises_news_gender_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(merit_raises_news_gender_salaried)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,742,3000.0
Male,843,3000.0


In [214]:
merit_raises_news_gender_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(merit_raises_news_gender_hourly)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,103,1.04
Male,60,1.03


In [215]:
merit_raises_news_race_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
American Indian or Alaska Native (United States of America),7,3500.0
Asian (United States of America),116,3000.0
White (United States of America),1185,3000.0
Two or More Races (United States of America),16,2850.0
Black or African American (United States of America),142,2844.0
Prefer Not to Disclose (United States of America),12,2610.0
Hispanic or Latino (United States of America),77,2500.0


In [216]:
merit_raises_news_race_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),118,1.1
Asian (United States of America),21,1.06
Black or African American (United States of America),18,1.02


In [217]:
merit_raises_news_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_group_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
person of color,360,3000.0
unknown,42,3000.0
white,1185,3000.0


In [218]:
merit_raises_news_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_group_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,118,1.1
person of color,45,1.03


In [219]:
merit_raises_news_gender_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_gender_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Male,unknown,22,3750.0
Female,person of color,198,3000.0
Female,unknown,20,3000.0
Female,white,524,3000.0
Male,person of color,162,3000.0
Male,white,659,3000.0


In [220]:
merit_raises_news_gender_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_gender_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Male,white,42,1.12
Female,white,76,1.1
Female,person of color,27,1.03
Male,person of color,18,1.03


In [221]:
fifteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,16,2881.5
Female,white,43,2500.0
Male,person of color,10,2162.5
Male,white,62,2848.5


In [222]:
fifteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,16,3.4
Female,white,43,3.7
Male,person of color,10,3.5
Male,white,62,3.6


In [223]:
sixteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,24,2750.0
Female,white,57,3000.0
Male,person of color,16,2900.0
Male,white,82,3000.0


In [224]:
sixteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,24,3.4
Female,white,57,3.5
Male,person of color,16,3.4
Male,white,82,3.6


In [225]:
seventeen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,23,3000.0
Female,white,56,2500.0
Male,person of color,25,3000.0
Male,white,79,3000.0


In [226]:
seventeen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,23,3.45
Female,white,56,3.4
Male,person of color,25,3.4
Male,white,79,3.6


In [227]:
eighteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,25,3000.0
Female,white,94,3000.0
Male,person of color,25,2500.0
Male,white,110,3000.0


In [228]:
eighteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,25,3.4
Female,white,94,3.5
Male,person of color,25,3.4
Male,white,110,3.6


In [229]:
merit_raises_15 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2015') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_16 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2016') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_17 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2017') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_18 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2018') & (reason_for_change_combined['merit_raises'] == True)]

merit_raises_15 = merit_raises_15[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
merit_raises_16 = merit_raises_16[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2016_annual_performance_rating']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
merit_raises_17 = merit_raises_17[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2017_annual_performance_rating']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
merit_raises_18 = merit_raises_18[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2018_annual_performance_rating']].rename(columns={'2018_annual_performance_rating':'performance_rating'})

merit_raises_15 = pd.DataFrame(merit_raises_15)
merit_raises_16 = pd.DataFrame(merit_raises_16)
merit_raises_17 = pd.DataFrame(merit_raises_17)
merit_raises_18 = pd.DataFrame(merit_raises_18)

merit_raises_combined = pd.concat([merit_raises_15,merit_raises_16,merit_raises_17,merit_raises_18])

In [230]:
news_salaried_raises = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Salaried') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(news_salaried_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,88,3000.0
Female,unknown,8,2860.0
Female,white,250,2961.16
Male,person of color,76,2500.0
Male,unknown,6,3250.0
Male,white,333,3000.0


In [231]:
news_salaried_raises_scores = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Salaried') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_salaried_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,88,3.4
Female,unknown,8,4.0
Female,white,250,3.5
Male,person of color,76,3.4
Male,unknown,6,3.75
Male,white,333,3.6


In [232]:
news_hourly_raises = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Hourly') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(news_hourly_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,15,1.26
Female,white,51,1.4
Male,person of color,13,1.03
Male,white,26,1.02


In [233]:
news_hourly_raises_scores = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Hourly') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_hourly_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,15,3.4
Female,white,51,3.5
Male,person of color,13,3.3
Male,white,26,3.45


### Era

In [234]:
bezos = df[(df['hire_date'] > '2013-10-04') & (df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
graham = df[(df['hire_date'] < '2013-10-05') & (df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]

In [235]:
bezos_gender = bezos.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,188,108395.0
Female,230,94060.0


In [236]:
graham_gender = graham.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,132,133375.0
Female,106,117930.08


In [237]:
bezos_race = bezos.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),253,104000.0
Black or African American (United States of America),39,100000.0
Prefer Not to Disclose (United States of America),12,94890.0
Hispanic or Latino (United States of America),35,94780.0
Asian (United States of America),44,94560.0
Two or More Races (United States of America),18,91090.0


In [238]:
graham_race = graham.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Hispanic or Latino (United States of America),7,134324.81
White (United States of America),184,128076.79
Asian (United States of America),15,118821.01
Black or African American (United States of America),21,114488.29


In [239]:
bezos_race_group = bezos.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_race_group)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,30,115920.0
white,253,104000.0
person of color,136,94530.0


In [240]:
graham_race_group = graham.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_race_group)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,9,158340.0
white,184,128076.79
person of color,45,117924.49


In [241]:
bezos_gender_race_group = bezos.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
unknown,Male,12,128420.0
white,Male,128,110455.0
unknown,Female,18,103890.0
person of color,Male,48,99217.5
white,Female,124,94920.0
person of color,Female,88,93030.0


In [242]:
graham_gender_race_group = graham.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
unknown,Male,7,152730.88
white,Male,101,137799.88
person of color,Male,24,118698.72
person of color,Female,21,117924.49
white,Female,83,117600.0


In [243]:
bezos_gender_race_group_age5 = bezos.groupby(['race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group_age5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
race_grouping,gender,age_group_5,Unnamed: 3_level_1,Unnamed: 4_level_1
white,Female,45-49,10.0,168735.0
white,Male,60-64,6.0,165060.0
white,Male,55-59,6.0,157077.97
white,Male,40-44,17.0,142780.0
unknown,Female,35-39,5.0,136560.0
white,Male,45-49,13.0,134806.25
person of color,Male,40-44,10.0,131735.0
person of color,Female,40-44,10.0,130545.0
unknown,Male,30-34,5.0,130000.0
white,Female,50-54,7.0,129560.0


In [244]:
graham_gender_race_group_age5 = graham.groupby(['race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group_age5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
race_grouping,gender,age_group_5,Unnamed: 3_level_1,Unnamed: 4_level_1
white,Female,65+,5.0,162355.42
white,Male,35-39,11.0,157260.0
white,Female,60-64,13.0,143908.85
white,Male,50-54,20.0,142308.46
white,Male,60-64,20.0,141732.27
white,Male,55-59,14.0,140697.96
white,Male,40-44,12.0,137750.42
white,Male,65+,12.0,134610.4
person of color,Male,50-54,11.0,131075.9
white,Female,55-59,16.0,119149.34


In [245]:
bezos_gender_race_group_age5_tier = bezos.groupby(['race_grouping','gender','age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group_age5_tier)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
race_grouping,gender,age_group_5,tier,Unnamed: 4_level_1,Unnamed: 5_level_1
white,Male,45-49,Tier 1,5.0,192065.0
white,Male,40-44,Tier 1,9.0,187910.0
white,Female,45-49,Tier 1,5.0,169560.0
person of color,Female,40-44,Tier 1,6.0,153570.0
white,Male,35-39,Tier 1,10.0,152218.27
person of color,Male,40-44,Tier 2,6.0,145310.0
unknown,Female,35-39,Tier 1,5.0,136560.0
unknown,Female,30-34,Tier 1,5.0,128780.0
white,Female,35-39,Tier 1,8.0,125110.0
white,Male,30-34,Tier 1,8.0,121895.0


In [246]:
graham_gender_race_group_age5_tier = graham.groupby(['race_grouping','gender','age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group_age5_tier)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
race_grouping,gender,age_group_5,tier,Unnamed: 4_level_1,Unnamed: 5_level_1
white,Female,50-54,Tier 1,5.0,194690.0
white,Male,35-39,Tier 1,6.0,184065.0
white,Male,50-54,Tier 1,6.0,182922.79
white,Male,60-64,Tier 1,6.0,182534.24
white,Male,40-44,Tier 1,6.0,159250.08
white,Female,55-59,Tier 1,6.0,153515.0
white,Male,55-59,Tier 2,13.0,143276.51
white,Male,50-54,Tier 2,10.0,133726.66
person of color,Male,50-54,Tier 2,7.0,131075.9
white,Male,60-64,Tier 2,7.0,127410.0


### Overall disparity calculations

In [247]:
news_groups = news_salaried.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
expected_medians = pd.merge(news_salaried, news_groups, on=['age_group_5', 'tier'])

  expected_medians = pd.merge(news_salaried, news_groups, on=['age_group_5', 'tier'])


In [248]:
below_expected_medians = expected_medians[expected_medians['current_base_pay'] < expected_medians[('current_base_pay', 'median')]].groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(below_expected_medians)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_grouping,gender,Unnamed: 2_level_1
person of color,Female,51
person of color,Male,33
unknown,Female,10
unknown,Male,11
white,Female,108
white,Male,98


In [249]:
above_expected_medians = expected_medians[expected_medians['current_base_pay'] > expected_medians[('current_base_pay', 'median')]].groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(above_expected_medians)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_grouping,gender,Unnamed: 2_level_1
person of color,Female,48
person of color,Male,35
unknown,Female,9
unknown,Male,7
white,Female,93
white,Male,122


In [250]:
expected_medians['disparity'] = expected_medians['current_base_pay'] - expected_medians[('current_base_pay', 'median')]
expected_medians['disparity_pct'] = (expected_medians['current_base_pay'] - expected_medians[('current_base_pay', 'median')])/expected_medians[('current_base_pay', 'median')]

In [251]:
disparity = expected_medians.groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,99,0.0
person of color,Male,68,0.0
unknown,Female,19,-587.5
unknown,Male,18,-955.0
white,Female,201,-1175.0
white,Male,220,3780.0


In [252]:
disparity_pct_above = expected_medians[expected_medians['disparity_pct'] > .05].groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity_pct_above)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,32,14330.07
person of color,Male,26,23107.5
unknown,Female,5,11825.0
unknown,Male,6,16715.0
white,Female,74,18798.44
white,Male,103,33500.0


In [253]:
disparity_pct_below = expected_medians[expected_medians['disparity_pct'] < -.05].groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity_pct_below)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,36,-10209.65
person of color,Male,25,-17205.06
unknown,Female,8,-28562.5
unknown,Male,8,-13360.0
white,Female,79,-13639.97
white,Male,72,-18719.56


In [254]:
expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})

Unnamed: 0_level_0,Unnamed: 1_level_0,disparity_pct,disparity_pct
Unnamed: 0_level_1,Unnamed: 1_level_1,count_nonzero,average
race_grouping,gender,Unnamed: 2_level_2,Unnamed: 3_level_2
person of color,Female,99,0.02
person of color,Male,68,0.05
unknown,Female,19,-0.05
unknown,Male,18,-0.01
white,Female,201,0.05
white,Male,220,0.09
white,Prefer not to disclose,1,0.09


In [255]:
bezos_news_groups = bezos.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
bezos_expected_medians = pd.merge(bezos, bezos_news_groups, on=['age_group_5', 'tier'])
graham_news_groups = graham.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
graham_expected_medians = pd.merge(graham, graham_news_groups, on=['age_group_5', 'tier'])

  bezos_expected_medians = pd.merge(bezos, bezos_news_groups, on=['age_group_5', 'tier'])
  graham_expected_medians = pd.merge(graham, graham_news_groups, on=['age_group_5', 'tier'])


In [256]:
bezos_expected_medians['disparity'] = bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')]
bezos_expected_medians['disparity_pct'] = (bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')])/bezos_expected_medians[('current_base_pay', 'median')]
graham_expected_medians['disparity'] = graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')]
graham_expected_medians['disparity_pct'] = (graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')])/graham_expected_medians[('current_base_pay', 'median')]

  bezos_expected_medians['disparity'] = bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')]
  bezos_expected_medians['disparity_pct'] = (bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')])/bezos_expected_medians[('current_base_pay', 'median')]
  graham_expected_medians['disparity'] = graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')]
  graham_expected_medians['disparity_pct'] = (graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')])/graham_expected_medians[('current_base_pay', 'median')]


In [257]:
bezos_disparity_gender = bezos_expected_medians.groupby(['gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_gender)

Unnamed: 0_level_0,count_nonzero,average
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,212,0.04
Male,171,0.08


In [258]:
bezos_disparity_race_group = bezos_expected_medians.groupby(['race_grouping']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_race_group)

Unnamed: 0_level_0,count_nonzero,average
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
person of color,124,0.04
unknown,27,-0.04
white,233,0.07


In [259]:
bezos_disparity_gender_race_group = bezos_expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,average
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,79,0.03
person of color,Male,45,0.07
unknown,Female,17,-0.05
unknown,Male,10,-0.02
white,Female,116,0.05
white,Male,116,0.09


In [260]:
graham_disparity_gender = graham_expected_medians.groupby(['gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_gender)

Unnamed: 0_level_0,count_nonzero,average
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,99,0.03
Male,127,0.07


In [261]:
graham_disparity_race_group = graham_expected_medians.groupby(['race_grouping']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_race_group)

Unnamed: 0_level_0,count_nonzero,average
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
person of color,42,0.01
unknown,8,-0.03
white,176,0.07


In [262]:
graham_disparity_gender_race_group = graham_expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,average
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,19,0.01
person of color,Male,23,0.0
unknown,Male,6,-0.03
white,Female,78,0.04
white,Male,98,0.09


### Regression

In [263]:
news_salaried_regression = news_salaried[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
news_salaried_regression = pd.get_dummies(news_salaried_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [264]:
news_salaried_regression = news_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model1 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result1 = model1.fit()
result1.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.036
Model:,OLS,Adj. R-squared:,0.033
Method:,Least Squares,F-statistic:,12.13
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,6.74e-06
Time:,20:31:52,Log-Likelihood:,-7942.3
No. Observations:,657,AIC:,15890.0
Df Residuals:,654,BIC:,15900.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.063e+05,4.31e+04,2.465,0.014,2.16e+04,1.91e+05
gender_Female,7338.7585,4.32e+04,0.170,0.865,-7.75e+04,9.22e+04
gender_Male,2.389e+04,4.32e+04,0.553,0.580,-6.09e+04,1.09e+05

0,1,2,3
Omnibus:,179.942,Durbin-Watson:,1.702
Prob(Omnibus):,0.0,Jarque-Bera (JB):,429.949
Skew:,1.429,Prob(JB):,4.34e-94
Kurtosis:,5.746,Cond. No.,54.4


In [265]:
model2 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result2 = model2.fit()
result2.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.027
Model:,OLS,Adj. R-squared:,0.024
Method:,Least Squares,F-statistic:,8.957
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000145
Time:,20:31:52,Log-Likelihood:,-7945.3
No. Observations:,657,AIC:,15900.0
Df Residuals:,654,BIC:,15910.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.256e+05,6939.188,18.102,0.000,1.12e+05,1.39e+05
race_grouping_white,574.5152,7242.215,0.079,0.937,-1.36e+04,1.48e+04
race_grouping_person_of_color,-1.549e+04,7650.339,-2.024,0.043,-3.05e+04,-464.535

0,1,2,3
Omnibus:,167.823,Durbin-Watson:,1.711
Prob(Omnibus):,0.0,Jarque-Bera (JB):,374.307
Skew:,1.366,Prob(JB):,5.25e-82
Kurtosis:,5.492,Cond. No.,9.02


In [266]:
model3 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result3 = model3.fit()
result3.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.056
Model:,OLS,Adj. R-squared:,0.051
Method:,Least Squares,F-statistic:,9.737
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.18e-07
Time:,20:31:52,Log-Likelihood:,-7935.2
No. Observations:,657,AIC:,15880.0
Df Residuals:,652,BIC:,15900.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.063e+05,4.33e+04,2.453,0.014,2.12e+04,1.91e+05
gender_Female,1.194e+04,4.28e+04,0.279,0.780,-7.21e+04,9.6e+04
gender_Male,2.708e+04,4.28e+04,0.633,0.527,-5.7e+04,1.11e+05
race_grouping_white,44.0015,7143.588,0.006,0.995,-1.4e+04,1.41e+04
race_grouping_person_of_color,-1.413e+04,7550.212,-1.872,0.062,-2.9e+04,692.197

0,1,2,3
Omnibus:,174.613,Durbin-Watson:,1.707
Prob(Omnibus):,0.0,Jarque-Bera (JB):,410.327
Skew:,1.394,Prob(JB):,7.919999999999999e-90
Kurtosis:,5.687,Cond. No.,63.6


In [267]:
new_news_salaried_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_news_salaried_regression['predicted'] = result3.predict(new_news_salaried_regression)
new_news_salaried_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,118280.62
1,0,1,1,0,40,133419.52
2,1,0,0,1,40,104103.15
3,0,1,0,1,40,119242.05


In [268]:
model4 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result4 = model4.fit()
result4.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.237
Model:,OLS,Adj. R-squared:,0.224
Method:,Least Squares,F-statistic:,18.22
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,7.96e-32
Time:,20:31:53,Log-Likelihood:,-7865.4
No. Observations:,657,AIC:,15750.0
Df Residuals:,645,BIC:,15810.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.119e+05,3.53e+04,3.170,0.002,4.26e+04,1.81e+05
gender_Female,-2099.5121,3.88e+04,-0.054,0.957,-7.84e+04,7.42e+04
gender_Male,4711.0354,3.89e+04,0.121,0.904,-7.16e+04,8.1e+04
age_group_5_25_under,-3.77e+04,9634.532,-3.913,0.000,-5.66e+04,-1.88e+04
age_group_5_25to29,-2.255e+04,5195.953,-4.340,0.000,-3.28e+04,-1.23e+04
age_group_5_30to34,-5545.8004,4968.360,-1.116,0.265,-1.53e+04,4210.313
age_group_5_35to39,1.177e+04,5092.608,2.312,0.021,1773.930,2.18e+04
age_group_5_40to44,2.259e+04,5736.192,3.938,0.000,1.13e+04,3.39e+04
age_group_5_45to49,2.294e+04,6289.504,3.648,0.000,1.06e+04,3.53e+04

0,1,2,3
Omnibus:,190.345,Durbin-Watson:,1.9
Prob(Omnibus):,0.0,Jarque-Bera (JB):,541.3
Skew:,1.426,Prob(JB):,2.87e-118
Kurtosis:,6.412,Cond. No.,1500000000000000.0


In [269]:
model5 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result5 = model5.fit()
result5.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.238
Model:,OLS,Adj. R-squared:,0.225
Method:,Least Squares,F-statistic:,18.35
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,4.6100000000000005e-32
Time:,20:31:53,Log-Likelihood:,-7864.8
No. Observations:,657,AIC:,15750.0
Df Residuals:,645,BIC:,15810.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.196e+05,5861.680,20.403,0.000,1.08e+05,1.31e+05
race_grouping_white,-5562.1439,6542.297,-0.850,0.396,-1.84e+04,7284.629
race_grouping_person_of_color,-1.303e+04,6928.255,-1.881,0.060,-2.66e+04,574.645
age_group_5_25_under,-3.741e+04,8954.320,-4.178,0.000,-5.5e+04,-1.98e+04
age_group_5_25to29,-2.199e+04,3810.252,-5.772,0.000,-2.95e+04,-1.45e+04
age_group_5_30to34,-5447.2684,3800.522,-1.433,0.152,-1.29e+04,2015.622
age_group_5_35to39,1.23e+04,3644.703,3.376,0.001,5146.772,1.95e+04
age_group_5_40to44,2.533e+04,4567.605,5.546,0.000,1.64e+04,3.43e+04
age_group_5_45to49,2.342e+04,5203.606,4.502,0.000,1.32e+04,3.36e+04

0,1,2,3
Omnibus:,189.869,Durbin-Watson:,1.897
Prob(Omnibus):,0.0,Jarque-Bera (JB):,531.171
Skew:,1.43,Prob(JB):,4.55e-116
Kurtosis:,6.351,Cond. No.,1610000000000000.0


In [270]:
model6 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result6 = model6.fit()
result6.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.243
Model:,OLS,Adj. R-squared:,0.228
Method:,Least Squares,F-statistic:,15.89
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.24e-31
Time:,20:31:53,Log-Likelihood:,-7862.7
No. Observations:,657,AIC:,15750.0
Df Residuals:,643,BIC:,15820.0
Df Model:,13,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.171e+05,3.58e+04,3.271,0.001,4.68e+04,1.87e+05
gender_Female,-529.3607,3.88e+04,-0.014,0.989,-7.67e+04,7.56e+04
gender_Male,5773.4074,3.88e+04,0.149,0.882,-7.04e+04,8.19e+04
race_grouping_white,-5564.1416,6533.784,-0.852,0.395,-1.84e+04,7265.991
race_grouping_person_of_color,-1.25e+04,6922.302,-1.805,0.072,-2.61e+04,1098.023
age_group_5_25_under,-3.578e+04,9663.473,-3.703,0.000,-5.48e+04,-1.68e+04
age_group_5_25to29,-2.099e+04,5237.642,-4.007,0.000,-3.13e+04,-1.07e+04
age_group_5_30to34,-5182.1108,4958.720,-1.045,0.296,-1.49e+04,4555.130
age_group_5_35to39,1.201e+04,5090.717,2.359,0.019,2013.977,2.2e+04

0,1,2,3
Omnibus:,190.682,Durbin-Watson:,1.89
Prob(Omnibus):,0.0,Jarque-Bera (JB):,541.397
Skew:,1.429,Prob(JB):,2.74e-118
Kurtosis:,6.407,Cond. No.,1230000000000000.0


In [271]:
model7 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4')
result7 = model7.fit()
result7.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.445
Model:,OLS,Adj. R-squared:,0.43
Method:,Least Squares,F-statistic:,30.13
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,2.33e-70
Time:,20:31:53,Log-Likelihood:,-7760.8
No. Observations:,657,AIC:,15560.0
Df Residuals:,639,BIC:,15640.0
Df Model:,17,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.076e+05,3.17e+04,3.393,0.001,4.53e+04,1.7e+05
gender_Female,3898.5269,3.34e+04,0.117,0.907,-6.16e+04,6.94e+04
gender_Male,7604.6247,3.34e+04,0.228,0.820,-5.79e+04,7.32e+04
race_grouping_white,1.53e+04,5815.295,2.630,0.009,3876.083,2.67e+04
race_grouping_person_of_color,1.033e+04,6175.636,1.673,0.095,-1795.658,2.25e+04
age_group_5_25_under,-2.7e+04,8441.069,-3.199,0.001,-4.36e+04,-1.04e+04
age_group_5_25to29,-1.282e+04,4616.125,-2.778,0.006,-2.19e+04,-3757.902
age_group_5_30to34,-1384.4169,4303.793,-0.322,0.748,-9835.703,7066.870
age_group_5_35to39,8654.0619,4451.474,1.944,0.052,-87.224,1.74e+04

0,1,2,3
Omnibus:,274.112,Durbin-Watson:,1.895
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1550.092
Skew:,1.788,Prob(JB):,0.0
Kurtosis:,9.62,Cond. No.,1330000000000000.0


In [272]:
model8 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4 + years_of_service_grouped_0 + years_of_service_grouped_1to2 + years_of_service_grouped_3to5 + years_of_service_grouped_6to10 + years_of_service_grouped_11to15 + years_of_service_grouped_16to20 + years_of_service_grouped_21to25 + years_of_service_grouped_25_over')
result8 = model8.fit()
result8.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.449
Model:,OLS,Adj. R-squared:,0.428
Method:,Least Squares,F-statistic:,21.42
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,2.98e-66
Time:,20:31:53,Log-Likelihood:,-7758.7
No. Observations:,657,AIC:,15570.0
Df Residuals:,632,BIC:,15680.0
Df Model:,24,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,9.685e+04,2.87e+04,3.374,0.001,4.05e+04,1.53e+05
gender_Female,3210.6982,3.35e+04,0.096,0.924,-6.27e+04,6.91e+04
gender_Male,6696.3049,3.36e+04,0.200,0.842,-5.92e+04,7.26e+04
race_grouping_white,1.546e+04,5884.428,2.627,0.009,3901.135,2.7e+04
race_grouping_person_of_color,1.03e+04,6214.884,1.658,0.098,-1899.931,2.25e+04
age_group_5_25_under,-3.108e+04,8815.438,-3.526,0.000,-4.84e+04,-1.38e+04
age_group_5_25to29,-1.684e+04,4828.850,-3.488,0.001,-2.63e+04,-7358.232
age_group_5_30to34,-4151.1843,4408.927,-0.942,0.347,-1.28e+04,4506.734
age_group_5_35to39,6480.4160,4467.519,1.451,0.147,-2292.561,1.53e+04

0,1,2,3
Omnibus:,267.644,Durbin-Watson:,1.9
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1473.437
Skew:,1.749,Prob(JB):,0.0
Kurtosis:,9.449,Cond. No.,2590000000000000.0


In [273]:
merit_raises_combined_salaried_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'News') & (merit_raises_combined['pay_rate_type'] == 'Salaried')]
merit_raises_combined_salaried_regression = pd.get_dummies(merit_raises_combined_salaried_regression, columns=['gender','race_grouping','age_group_5'])

In [274]:
merit_raises_combined_salaried_regression = merit_raises_combined_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model9 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result9 = model9.fit()
result9.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.003
Model:,OLS,Adj. R-squared:,0.001
Method:,Least Squares,F-statistic:,1.232
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.292
Time:,20:31:53,Log-Likelihood:,-6627.8
No. Observations:,761,AIC:,13260.0
Df Residuals:,758,BIC:,13280.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-2.122e+15,9.84e+15,-0.216,0.829,-2.14e+16,1.72e+16
gender_Female,2.122e+15,9.84e+15,0.216,0.829,-1.72e+16,2.14e+16
gender_Male,2.122e+15,9.84e+15,0.216,0.829,-1.72e+16,2.14e+16

0,1,2,3
Omnibus:,347.322,Durbin-Watson:,1.876
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2241.002
Skew:,1.962,Prob(JB):,0.0
Kurtosis:,10.435,Cond. No.,393000000000000.0


In [275]:
model10 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result10 = model10.fit()
result10.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.007
Model:,OLS,Adj. R-squared:,0.004
Method:,Least Squares,F-statistic:,2.619
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0735
Time:,20:31:53,Log-Likelihood:,-6626.4
No. Observations:,761,AIC:,13260.0
Df Residuals:,758,BIC:,13270.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3487.7143,391.906,8.899,0.000,2718.364,4257.064
race_grouping_white,-353.2394,396.584,-0.891,0.373,-1131.772,425.293
race_grouping_person_of_color,-617.5097,408.291,-1.512,0.131,-1419.025,184.006

0,1,2,3
Omnibus:,337.805,Durbin-Watson:,1.86
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2092.674
Skew:,1.912,Prob(JB):,0.0
Kurtosis:,10.168,Cond. No.,16.6


In [276]:
model11 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result11 = model11.fit()
result11.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.01
Model:,OLS,Adj. R-squared:,0.004
Method:,Least Squares,F-statistic:,1.831
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.121
Time:,20:31:53,Log-Likelihood:,-6625.4
No. Observations:,761,AIC:,13260.0
Df Residuals:,756,BIC:,13280.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-2.279e+15,9.78e+15,-0.233,0.816,-2.15e+16,1.69e+16
gender_Female,2.279e+15,9.78e+15,0.233,0.816,-1.69e+16,2.15e+16
gender_Male,2.279e+15,9.78e+15,0.233,0.816,-1.69e+16,2.15e+16
race_grouping_white,-372.8852,397.016,-0.939,0.348,-1152.271,406.501
race_grouping_person_of_color,-620.5414,408.416,-1.519,0.129,-1422.306,181.224

0,1,2,3
Omnibus:,342.727,Durbin-Watson:,1.857
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2178.294
Skew:,1.936,Prob(JB):,0.0
Kurtosis:,10.329,Cond. No.,470000000000000.0


In [277]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result11.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,3058.11
1,0,1,1,0,3209.11
2,1,0,0,1,2810.46
3,0,1,0,1,2961.46


In [278]:
model12 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result12 = model12.fit()
result12.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.047
Model:,OLS,Adj. R-squared:,0.035
Method:,Least Squares,F-statistic:,4.078
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,3.96e-05
Time:,20:31:53,Log-Likelihood:,-6610.9
No. Observations:,761,AIC:,13240.0
Df Residuals:,751,BIC:,13290.0
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-6.917e+14,1e+16,-0.069,0.945,-2.03e+16,1.89e+16
gender_Female,6.676e+14,9.65e+15,0.069,0.945,-1.83e+16,1.96e+16
gender_Male,6.676e+14,9.65e+15,0.069,0.945,-1.83e+16,1.96e+16
age_group_5_25_under,-1.208e+13,1.75e+14,-0.069,0.945,-3.55e+14,3.31e+14
age_group_5_25to29,2.418e+13,3.5e+14,0.069,0.945,-6.62e+14,7.11e+14
age_group_5_30to34,2.418e+13,3.5e+14,0.069,0.945,-6.62e+14,7.11e+14
age_group_5_35to39,2.418e+13,3.5e+14,0.069,0.945,-6.62e+14,7.11e+14
age_group_5_40to44,2.418e+13,3.5e+14,0.069,0.945,-6.62e+14,7.11e+14
age_group_5_45to49,2.418e+13,3.5e+14,0.069,0.945,-6.62e+14,7.11e+14

0,1,2,3
Omnibus:,352.442,Durbin-Watson:,1.894
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2391.173
Skew:,1.977,Prob(JB):,0.0
Kurtosis:,10.731,Cond. No.,4.14e+16


In [279]:
model13 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result13 = model13.fit()
result13.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.05
Model:,OLS,Adj. R-squared:,0.038
Method:,Least Squares,F-statistic:,3.968
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,2.67e-05
Time:,20:31:53,Log-Likelihood:,-6609.4
No. Observations:,761,AIC:,13240.0
Df Residuals:,750,BIC:,13290.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.394e+15,1.06e+16,0.510,0.610,-1.54e+16,2.61e+16
race_grouping_white,-271.7360,404.582,-0.672,0.502,-1065.984,522.512
race_grouping_person_of_color,-594.1019,412.284,-1.441,0.150,-1403.470,215.266
age_group_5_25_under,-2.268e+13,4.44e+13,-0.510,0.610,-1.1e+14,6.45e+13
age_group_5_25to29,-5.394e+15,1.06e+16,-0.510,0.610,-2.61e+16,1.54e+16
age_group_5_30to34,-5.394e+15,1.06e+16,-0.510,0.610,-2.61e+16,1.54e+16
age_group_5_35to39,-5.394e+15,1.06e+16,-0.510,0.610,-2.61e+16,1.54e+16
age_group_5_40to44,-5.394e+15,1.06e+16,-0.510,0.610,-2.61e+16,1.54e+16
age_group_5_45to49,-5.394e+15,1.06e+16,-0.510,0.610,-2.61e+16,1.54e+16

0,1,2,3
Omnibus:,341.711,Durbin-Watson:,1.876
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2213.427
Skew:,1.921,Prob(JB):,0.0
Kurtosis:,10.419,Cond. No.,4.17e+16


In [280]:
model14 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result14 = model14.fit()
result14.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.054
Model:,OLS,Adj. R-squared:,0.04
Method:,Least Squares,F-statistic:,3.905
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.61e-05
Time:,20:31:53,Log-Likelihood:,-6607.8
No. Observations:,761,AIC:,13240.0
Df Residuals:,749,BIC:,13300.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-9.183e+14,1e+16,-0.092,0.927,-2.06e+16,1.87e+16
gender_Female,8.943e+14,9.75e+15,0.092,0.927,-1.82e+16,2e+16
gender_Male,8.943e+14,9.75e+15,0.092,0.927,-1.82e+16,2e+16
race_grouping_white,-252.9941,394.499,-0.641,0.522,-1027.450,521.461
race_grouping_person_of_color,-560.3133,404.519,-1.385,0.166,-1354.439,233.812
age_group_5_25_under,1.721e+13,1.88e+14,0.092,0.927,-3.51e+14,3.86e+14
age_group_5_25to29,2.405e+13,2.62e+14,0.092,0.927,-4.91e+14,5.39e+14
age_group_5_30to34,2.405e+13,2.62e+14,0.092,0.927,-4.91e+14,5.39e+14
age_group_5_35to39,2.405e+13,2.62e+14,0.092,0.927,-4.91e+14,5.39e+14

0,1,2,3
Omnibus:,348.013,Durbin-Watson:,1.876
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2330.961
Skew:,1.951,Prob(JB):,0.0
Kurtosis:,10.634,Cond. No.,3.59e+17


In [281]:
model15 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result15 = model15.fit()
result15.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.006
Model:,OLS,Adj. R-squared:,0.003
Method:,Least Squares,F-statistic:,2.077
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.126
Time:,20:31:53,Log-Likelihood:,-225.7
No. Observations:,721,AIC:,457.4
Df Residuals:,718,BIC:,471.1
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-7.958e+11,1.71e+12,-0.466,0.641,-4.15e+12,2.56e+12
gender_Female,7.958e+11,1.71e+12,0.466,0.641,-2.56e+12,4.15e+12
gender_Male,7.958e+11,1.71e+12,0.466,0.641,-2.56e+12,4.15e+12

0,1,2,3
Omnibus:,26.473,Durbin-Watson:,1.816
Prob(Omnibus):,0.0,Jarque-Bera (JB):,28.572
Skew:,0.486,Prob(JB):,6.25e-07
Kurtosis:,3.088,Cond. No.,294000000000000.0


In [282]:
model16 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result16 = model16.fit()
result16.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.042
Model:,OLS,Adj. R-squared:,0.04
Method:,Least Squares,F-statistic:,15.84
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.86e-07
Time:,20:31:53,Log-Likelihood:,-212.21
No. Observations:,721,AIC:,430.4
Df Residuals:,718,BIC:,444.2
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.7786,0.087,43.441,0.000,3.608,3.949
race_grouping_white,-0.1898,0.088,-2.155,0.031,-0.363,-0.017
race_grouping_person_of_color,-0.3378,0.091,-3.721,0.000,-0.516,-0.160

0,1,2,3
Omnibus:,17.586,Durbin-Watson:,1.801
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18.233
Skew:,0.384,Prob(JB):,0.00011
Kurtosis:,3.13,Cond. No.,16.2


In [283]:
model17 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result17 = model17.fit()
result17.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.046
Model:,OLS,Adj. R-squared:,0.041
Method:,Least Squares,F-statistic:,8.598
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,8.83e-07
Time:,20:31:54,Log-Likelihood:,-210.86
No. Observations:,721,AIC:,431.7
Df Residuals:,716,BIC:,454.6
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-8.661e+11,1.67e+12,-0.518,0.605,-4.15e+12,2.42e+12
gender_Female,8.661e+11,1.67e+12,0.518,0.605,-2.42e+12,4.15e+12
gender_Male,8.661e+11,1.67e+12,0.518,0.605,-2.42e+12,4.15e+12
race_grouping_white,-0.1978,0.088,-2.245,0.025,-0.371,-0.025
race_grouping_person_of_color,-0.3403,0.091,-3.750,0.000,-0.518,-0.162

0,1,2,3
Omnibus:,18.649,Durbin-Watson:,1.79
Prob(Omnibus):,0.0,Jarque-Bera (JB):,19.45
Skew:,0.398,Prob(JB):,5.98e-05
Kurtosis:,3.111,Cond. No.,352000000000000.0


In [284]:
model18 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result18 = model18.fit()
result18.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.043
Model:,OLS,Adj. R-squared:,0.029
Method:,Least Squares,F-statistic:,3.172
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000542
Time:,20:31:54,Log-Likelihood:,-212.02
No. Observations:,721,AIC:,446.0
Df Residuals:,710,BIC:,496.4
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-9.526e+11,1.43e+12,-0.664,0.507,-3.77e+12,1.86e+12
gender_Female,1.109e+12,1.67e+12,0.664,0.507,-2.17e+12,4.39e+12
gender_Male,1.109e+12,1.67e+12,0.664,0.507,-2.17e+12,4.39e+12
age_group_5_25_under,-6.634e+09,9.99e+09,-0.664,0.507,-2.62e+10,1.3e+10
age_group_5_25to29,-1.565e+11,2.36e+11,-0.664,0.507,-6.19e+11,3.06e+11
age_group_5_30to34,-1.565e+11,2.36e+11,-0.664,0.507,-6.19e+11,3.06e+11
age_group_5_35to39,-1.565e+11,2.36e+11,-0.664,0.507,-6.19e+11,3.06e+11
age_group_5_40to44,-1.565e+11,2.36e+11,-0.664,0.507,-6.19e+11,3.06e+11
age_group_5_45to49,-1.565e+11,2.36e+11,-0.664,0.507,-6.19e+11,3.06e+11

0,1,2,3
Omnibus:,19.66,Durbin-Watson:,1.843
Prob(Omnibus):,0.0,Jarque-Bera (JB):,20.7
Skew:,0.414,Prob(JB):,3.2e-05
Kurtosis:,3.05,Cond. No.,2.89e+16


In [285]:
model19 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result19 = model19.fit()
result19.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.063
Model:,OLS,Adj. R-squared:,0.05
Method:,Least Squares,F-statistic:,4.805
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.05e-06
Time:,20:31:54,Log-Likelihood:,-204.17
No. Observations:,721,AIC:,430.3
Df Residuals:,710,BIC:,480.7
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.792e+11,2.69e+12,0.253,0.801,-4.6e+12,5.96e+12
race_grouping_white,-0.2004,0.092,-2.168,0.030,-0.382,-0.019
race_grouping_person_of_color,-0.3231,0.092,-3.523,0.000,-0.503,-0.143
age_group_5_25_under,-4.609e+08,1.82e+09,-0.253,0.801,-4.04e+09,3.12e+09
age_group_5_25to29,-6.792e+11,2.69e+12,-0.253,0.801,-5.96e+12,4.6e+12
age_group_5_30to34,-6.792e+11,2.69e+12,-0.253,0.801,-5.96e+12,4.6e+12
age_group_5_35to39,-6.792e+11,2.69e+12,-0.253,0.801,-5.96e+12,4.6e+12
age_group_5_40to44,-6.792e+11,2.69e+12,-0.253,0.801,-5.96e+12,4.6e+12
age_group_5_45to49,-6.792e+11,2.69e+12,-0.253,0.801,-5.96e+12,4.6e+12

0,1,2,3
Omnibus:,14.925,Durbin-Watson:,1.832
Prob(Omnibus):,0.001,Jarque-Bera (JB):,15.278
Skew:,0.349,Prob(JB):,0.000481
Kurtosis:,3.146,Cond. No.,2.53e+16


In [286]:
model20 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result20 = model20.fit()
result20.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.073
Model:,OLS,Adj. R-squared:,0.058
Method:,Least Squares,F-statistic:,5.06
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.22e-07
Time:,20:31:54,Log-Likelihood:,-200.53
No. Observations:,721,AIC:,425.1
Df Residuals:,709,BIC:,480.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-1.035e+12,1.37e+12,-0.756,0.450,-3.72e+12,1.65e+12
gender_Female,1.223e+12,1.62e+12,0.756,0.450,-1.95e+12,4.4e+12
gender_Male,1.223e+12,1.62e+12,0.756,0.450,-1.95e+12,4.4e+12
race_grouping_white,-0.2016,0.088,-2.281,0.023,-0.375,-0.028
race_grouping_person_of_color,-0.3224,0.091,-3.555,0.000,-0.500,-0.144
age_group_5_25_under,-6.881e+09,9.11e+09,-0.756,0.450,-2.48e+10,1.1e+10
age_group_5_25to29,-1.878e+11,2.49e+11,-0.756,0.450,-6.76e+11,3e+11
age_group_5_30to34,-1.878e+11,2.49e+11,-0.756,0.450,-6.76e+11,3e+11
age_group_5_35to39,-1.878e+11,2.49e+11,-0.756,0.450,-6.76e+11,3e+11

0,1,2,3
Omnibus:,14.641,Durbin-Watson:,1.826
Prob(Omnibus):,0.001,Jarque-Bera (JB):,15.004
Skew:,0.349,Prob(JB):,0.000552
Kurtosis:,3.109,Cond. No.,1.13e+17


In [287]:
news_hourly_regression = news_hourly[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
news_hourly_regression = pd.get_dummies(news_hourly_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [288]:
news_hourly_regression = news_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model21 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result21 = model2.fit()
result21.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.027
Model:,OLS,Adj. R-squared:,0.024
Method:,Least Squares,F-statistic:,8.957
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000145
Time:,20:31:54,Log-Likelihood:,-7945.3
No. Observations:,657,AIC:,15900.0
Df Residuals:,654,BIC:,15910.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.256e+05,6939.188,18.102,0.000,1.12e+05,1.39e+05
race_grouping_white,574.5152,7242.215,0.079,0.937,-1.36e+04,1.48e+04
race_grouping_person_of_color,-1.549e+04,7650.339,-2.024,0.043,-3.05e+04,-464.535

0,1,2,3
Omnibus:,167.823,Durbin-Watson:,1.711
Prob(Omnibus):,0.0,Jarque-Bera (JB):,374.307
Skew:,1.366,Prob(JB):,5.25e-82
Kurtosis:,5.492,Cond. No.,9.02


In [289]:
model22 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result22 = model22.fit()
result22.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.02
Model:,OLS,Adj. R-squared:,-0.005
Method:,Least Squares,F-statistic:,0.8111
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.448
Time:,20:31:54,Log-Likelihood:,-330.05
No. Observations:,83,AIC:,666.1
Df Residuals:,80,BIC:,673.4
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,40.8000,13.144,3.104,0.003,14.643,66.957
race_grouping_white,-4.7002,13.256,-0.355,0.724,-31.081,21.681
race_grouping_person_of_color,-8.5133,13.415,-0.635,0.527,-35.209,18.183

0,1,2,3
Omnibus:,10.792,Durbin-Watson:,1.417
Prob(Omnibus):,0.005,Jarque-Bera (JB):,11.225
Skew:,0.743,Prob(JB):,0.00365
Kurtosis:,4.02,Cond. No.,20.0


In [290]:
model23 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result23 = model23.fit()
result23.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.064
Model:,OLS,Adj. R-squared:,0.029
Method:,Least Squares,F-statistic:,1.813
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.152
Time:,20:31:54,Log-Likelihood:,-328.12
No. Observations:,83,AIC:,664.2
Df Residuals:,79,BIC:,673.9
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,25.3109,8.670,2.919,0.005,8.054,42.568
gender_Female,15.4891,4.416,3.507,0.001,6.699,24.279
gender_Male,9.8218,4.728,2.078,0.041,0.412,19.232
race_grouping_white,-2.2573,13.094,-0.172,0.864,-28.321,23.806
race_grouping_person_of_color,-6.6242,13.225,-0.501,0.618,-32.948,19.699

0,1,2,3
Omnibus:,7.875,Durbin-Watson:,1.345
Prob(Omnibus):,0.019,Jarque-Bera (JB):,7.48
Skew:,0.612,Prob(JB):,0.0238
Kurtosis:,3.816,Cond. No.,3520000000000000.0


In [291]:
new_news_hourly_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_news_hourly_regression['predicted'] = result23.predict(new_news_hourly_regression)
new_news_hourly_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,38.54
1,0,1,1,0,40,32.88
2,1,0,0,1,40,34.18
3,0,1,0,1,40,28.51


In [292]:
model24 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result24 = model24.fit()
result24.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.363
Model:,OLS,Adj. R-squared:,0.275
Method:,Least Squares,F-statistic:,4.108
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.00017
Time:,20:31:54,Log-Likelihood:,-312.15
No. Observations:,83,AIC:,646.3
Df Residuals:,72,BIC:,672.9
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,23.2877,0.882,26.410,0.000,21.530,25.046
gender_Female,14.1092,1.294,10.905,0.000,11.530,16.688
gender_Male,9.1785,1.454,6.314,0.000,6.281,12.076
age_group_5_25_under,-7.0396,3.027,-2.325,0.023,-13.075,-1.005
age_group_5_25to29,-8.5046,2.910,-2.923,0.005,-14.305,-2.704
age_group_5_30to34,-3.2016,2.998,-1.068,0.289,-9.177,2.774
age_group_5_35to39,3.6055,4.713,0.765,0.447,-5.790,13.001
age_group_5_40to44,-5.4847,4.685,-1.171,0.246,-14.824,3.855
age_group_5_45to49,11.0866,5.957,1.861,0.067,-0.789,22.963

0,1,2,3
Omnibus:,2.772,Durbin-Watson:,1.665
Prob(Omnibus):,0.25,Jarque-Bera (JB):,2.066
Skew:,0.283,Prob(JB):,0.356
Kurtosis:,3.525,Cond. No.,8060000000000000.0


In [293]:
model25 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result25 = model25.fit()
result25.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.338
Model:,OLS,Adj. R-squared:,0.235
Method:,Least Squares,F-statistic:,3.294
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.00108
Time:,20:31:54,Log-Likelihood:,-313.77
No. Observations:,83,AIC:,651.5
Df Residuals:,71,BIC:,680.6
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,43.1529,10.818,3.989,0.000,21.582,64.724
race_grouping_white,-9.7635,11.921,-0.819,0.416,-33.533,14.006
race_grouping_person_of_color,-9.7817,12.133,-0.806,0.423,-33.974,14.410
age_group_5_25_under,-4.0782,3.419,-1.193,0.237,-10.896,2.739
age_group_5_25to29,-6.7920,3.202,-2.121,0.037,-13.178,-0.406
age_group_5_30to34,-2.3529,3.075,-0.765,0.447,-8.484,3.778
age_group_5_35to39,4.6546,5.011,0.929,0.356,-5.338,14.647
age_group_5_40to44,-3.4384,5.070,-0.678,0.500,-13.548,6.671
age_group_5_45to49,13.4506,6.282,2.141,0.036,0.926,25.976

0,1,2,3
Omnibus:,6.397,Durbin-Watson:,1.685
Prob(Omnibus):,0.041,Jarque-Bera (JB):,5.827
Skew:,0.524,Prob(JB):,0.0543
Kurtosis:,3.766,Cond. No.,6940000000000000.0


In [294]:
model26 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result26 = model26.fit()
result26.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.367
Model:,OLS,Adj. R-squared:,0.258
Method:,Least Squares,F-statistic:,3.377
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000642
Time:,20:31:54,Log-Likelihood:,-311.92
No. Observations:,83,AIC:,649.8
Df Residuals:,70,BIC:,681.3
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,27.7804,7.402,3.753,0.000,13.017,42.544
gender_Female,16.2585,3.749,4.336,0.000,8.781,23.736
gender_Male,11.5220,4.107,2.805,0.007,3.330,19.714
race_grouping_white,-7.1864,11.831,-0.607,0.546,-30.782,16.410
race_grouping_person_of_color,-7.3203,12.030,-0.608,0.545,-31.314,16.674
age_group_5_25_under,-6.4603,3.329,-1.941,0.056,-13.099,0.179
age_group_5_25to29,-8.0062,3.060,-2.616,0.011,-14.109,-1.903
age_group_5_30to34,-3.2389,3.033,-1.068,0.289,-9.288,2.810
age_group_5_35to39,4.0334,4.883,0.826,0.412,-5.706,13.773

0,1,2,3
Omnibus:,3.153,Durbin-Watson:,1.648
Prob(Omnibus):,0.207,Jarque-Bera (JB):,2.433
Skew:,0.314,Prob(JB):,0.296
Kurtosis:,3.555,Cond. No.,8540000000000000.0


In [295]:
# model27 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4')
# result27 = model27.fit()
# result27.summary()

In [296]:
# model28 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4 + years_of_service_grouped_0 + years_of_service_grouped_1to2 + years_of_service_grouped_3to5 + years_of_service_grouped_6to10 + years_of_service_grouped_11to15 + years_of_service_grouped_16to20 + years_of_service_grouped_21to25 + years_of_service_grouped_25_over')
# result28 = model28.fit()
# result28.summary()

In [297]:
merit_raises_combined_hourly_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'News') & (merit_raises_combined['pay_rate_type'] == 'Hourly')]
merit_raises_combined_hourly_regression = pd.get_dummies(merit_raises_combined_hourly_regression, columns=['gender','race_grouping','age_group_5'])

In [298]:
merit_raises_combined_hourly_regression = merit_raises_combined_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model29 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result29 = model29.fit()
result29.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.007
Model:,OLS,Adj. R-squared:,-0.003
Method:,Least Squares,F-statistic:,0.716
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.399
Time:,20:31:54,Log-Likelihood:,-197.46
No. Observations:,105,AIC:,398.9
Df Residuals:,103,BIC:,404.2
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.0379,0.108,9.624,0.000,0.824,1.252
gender_Female,0.6559,0.157,4.182,0.000,0.345,0.967
gender_Male,0.3821,0.183,2.085,0.040,0.019,0.745

0,1,2,3
Omnibus:,123.881,Durbin-Watson:,1.909
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2564.506
Skew:,4.023,Prob(JB):,0.0
Kurtosis:,25.835,Cond. No.,3520000000000000.0


In [299]:
model30 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result30 = model30.fit()
result30.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.029
Model:,OLS,Adj. R-squared:,0.019
Method:,Least Squares,F-statistic:,3.036
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0844
Time:,20:31:54,Log-Likelihood:,-196.3
No. Observations:,105,AIC:,396.6
Df Residuals:,103,BIC:,401.9
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.9666,0.117,8.294,0.000,0.735,1.198
race_grouping_white,0.7879,0.156,5.039,0.000,0.478,1.098
race_grouping_person_of_color,0.1787,0.208,0.857,0.393,-0.235,0.592

0,1,2,3
Omnibus:,123.45,Durbin-Watson:,1.778
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2635.259
Skew:,3.98,Prob(JB):,0.0
Kurtosis:,26.216,Cond. No.,2380000000000000.0


In [300]:
model31 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result31 = model31.fit()
result31.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.033
Model:,OLS,Adj. R-squared:,0.014
Method:,Least Squares,F-statistic:,1.725
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.183
Time:,20:31:54,Log-Likelihood:,-196.08
No. Observations:,105,AIC:,398.2
Df Residuals:,102,BIC:,406.1
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.7145,0.089,8.019,0.000,0.538,0.891
gender_Female,0.4633,0.160,2.901,0.005,0.147,0.780
gender_Male,0.2512,0.175,1.435,0.154,-0.096,0.599
race_grouping_white,0.6484,0.162,3.999,0.000,0.327,0.970
race_grouping_person_of_color,0.0661,0.200,0.330,0.742,-0.331,0.463

0,1,2,3
Omnibus:,122.482,Durbin-Watson:,1.806
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2547.85
Skew:,3.943,Prob(JB):,0.0
Kurtosis:,25.808,Cond. No.,4380000000000000.0


In [301]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result31.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,1.83
1,0,1,1,0,1.61
2,1,0,0,1,1.24
3,0,1,0,1,1.03


In [302]:
model32 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result32 = model32.fit()
result32.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.059
Model:,OLS,Adj. R-squared:,-0.03
Method:,Least Squares,F-statistic:,0.6599
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.743
Time:,20:31:54,Log-Likelihood:,-194.64
No. Observations:,105,AIC:,409.3
Df Residuals:,95,BIC:,435.8
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.0026,0.109,9.210,0.000,0.786,1.219
gender_Female,0.6319,0.176,3.600,0.001,0.283,0.980
gender_Male,0.3707,0.194,1.909,0.059,-0.015,0.756
age_group_5_25_under,3.923e-16,2.08e-16,1.885,0.063,-2.1e-17,8.05e-16
age_group_5_25to29,0.3173,0.421,0.753,0.453,-0.519,1.153
age_group_5_30to34,-0.2924,0.369,-0.793,0.430,-1.024,0.439
age_group_5_35to39,-0.0689,0.448,-0.154,0.878,-0.959,0.821
age_group_5_40to44,0.0943,0.613,0.154,0.878,-1.122,1.310
age_group_5_45to49,0.3428,0.448,0.765,0.446,-0.547,1.233

0,1,2,3
Omnibus:,121.722,Durbin-Watson:,1.8
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2568.325
Skew:,3.893,Prob(JB):,0.0
Kurtosis:,25.944,Cond. No.,1.85e+17


In [303]:
model33 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result33 = model33.fit()
result33.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.087
Model:,OLS,Adj. R-squared:,0.001
Method:,Least Squares,F-statistic:,1.012
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.436
Time:,20:31:55,Log-Likelihood:,-193.02
No. Observations:,105,AIC:,406.0
Df Residuals:,95,BIC:,432.6
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.9062,0.121,7.505,0.000,0.667,1.146
race_grouping_white,0.8027,0.165,4.866,0.000,0.475,1.130
race_grouping_person_of_color,0.1036,0.221,0.468,0.641,-0.336,0.543
age_group_5_25_under,-4.956e-17,2.88e-16,-0.172,0.864,-6.22e-16,5.23e-16
age_group_5_25to29,0.4679,0.417,1.123,0.264,-0.359,1.295
age_group_5_30to34,-0.2560,0.364,-0.704,0.483,-0.978,0.466
age_group_5_35to39,-0.2739,0.451,-0.607,0.545,-1.169,0.622
age_group_5_40to44,0.1658,0.603,0.275,0.784,-1.031,1.363
age_group_5_45to49,0.2543,0.442,0.575,0.566,-0.623,1.132

0,1,2,3
Omnibus:,118.288,Durbin-Watson:,1.675
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2407.066
Skew:,3.731,Prob(JB):,0.0
Kurtosis:,25.238,Cond. No.,1.07e+17


In [304]:
model34 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result34 = model34.fit()
result34.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.088
Model:,OLS,Adj. R-squared:,-0.008
Method:,Least Squares,F-statistic:,0.9125
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.525
Time:,20:31:55,Log-Likelihood:,-192.96
No. Observations:,105,AIC:,407.9
Df Residuals:,94,BIC:,437.1
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.6901,0.093,7.445,0.000,0.506,0.874
gender_Female,0.4033,0.184,2.197,0.030,0.039,0.768
gender_Male,0.2869,0.188,1.527,0.130,-0.086,0.660
race_grouping_white,0.6803,0.175,3.883,0.000,0.332,1.028
race_grouping_person_of_color,0.0098,0.217,0.045,0.964,-0.421,0.441
age_group_5_25_under,-5.684e-17,1.2e-16,-0.473,0.638,-2.96e-16,1.82e-16
age_group_5_25to29,0.4240,0.424,1.001,0.320,-0.417,1.265
age_group_5_30to34,-0.2809,0.366,-0.768,0.445,-1.007,0.446
age_group_5_35to39,-0.2806,0.456,-0.615,0.540,-1.186,0.625

0,1,2,3
Omnibus:,118.098,Durbin-Watson:,1.695
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2383.378
Skew:,3.725,Prob(JB):,0.0
Kurtosis:,25.119,Cond. No.,7.28e+16


In [305]:
model35 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result35 = model35.fit()
result35.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.01
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.9173
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.341
Time:,20:31:55,Log-Likelihood:,-39.114
No. Observations:,97,AIC:,82.23
Df Residuals:,95,BIC:,87.38
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3409,0.026,90.769,0.000,2.290,2.392
gender_Female,1.2075,0.037,32.445,0.000,1.134,1.281
gender_Male,1.1334,0.044,25.729,0.000,1.046,1.221

0,1,2,3
Omnibus:,7.85,Durbin-Watson:,1.804
Prob(Omnibus):,0.02,Jarque-Bera (JB):,7.255
Skew:,0.6,Prob(JB):,0.0266
Kurtosis:,2.404,Cond. No.,5170000000000000.0


In [306]:
model36 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result36 = model36.fit()
result36.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.057
Model:,OLS,Adj. R-squared:,0.047
Method:,Least Squares,F-statistic:,5.767
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0183
Time:,20:31:55,Log-Likelihood:,-36.722
No. Observations:,97,AIC:,77.44
Df Residuals:,95,BIC:,82.59
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3191,0.027,86.016,0.000,2.266,2.373
race_grouping_white,1.2566,0.037,34.408,0.000,1.184,1.329
race_grouping_person_of_color,1.0624,0.048,22.150,0.000,0.967,1.158

0,1,2,3
Omnibus:,4.825,Durbin-Watson:,1.889
Prob(Omnibus):,0.09,Jarque-Bera (JB):,4.105
Skew:,0.409,Prob(JB):,0.128
Kurtosis:,2.412,Cond. No.,2.01e+16


In [307]:
model37 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result37 = model37.fit()
result37.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.061
Model:,OLS,Adj. R-squared:,0.041
Method:,Least Squares,F-statistic:,3.052
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.052
Time:,20:31:55,Log-Likelihood:,-36.529
No. Observations:,97,AIC:,79.06
Df Residuals:,94,BIC:,86.78
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.7369,0.021,84.064,0.000,1.696,1.778
gender_Female,0.8919,0.038,23.617,0.000,0.817,0.967
gender_Male,0.8450,0.042,20.336,0.000,0.762,0.927
race_grouping_white,0.9616,0.038,25.216,0.000,0.886,1.037
race_grouping_person_of_color,0.7753,0.046,16.782,0.000,0.684,0.867

0,1,2,3
Omnibus:,5.208,Durbin-Watson:,1.901
Prob(Omnibus):,0.074,Jarque-Bera (JB):,4.606
Skew:,0.451,Prob(JB):,0.1
Kurtosis:,2.431,Cond. No.,2.28e+16


In [308]:
model38 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result38 = model38.fit()
result38.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.145
Model:,OLS,Adj. R-squared:,0.056
Method:,Least Squares,F-statistic:,1.637
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.117
Time:,20:31:55,Log-Likelihood:,-31.993
No. Observations:,97,AIC:,83.99
Df Residuals:,87,BIC:,109.7
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1907,0.025,89.100,0.000,2.142,2.240
gender_Female,1.1670,0.041,28.529,0.000,1.086,1.248
gender_Male,1.0238,0.046,22.311,0.000,0.933,1.115
age_group_5_25_under,-8.104e-17,3.6e-17,-2.253,0.027,-1.53e-16,-9.56e-18
age_group_5_25to29,0.0677,0.096,0.705,0.483,-0.123,0.258
age_group_5_30to34,0.1601,0.085,1.888,0.062,-0.008,0.329
age_group_5_35to39,0.1113,0.103,1.084,0.281,-0.093,0.315
age_group_5_40to44,0.2567,0.134,1.912,0.059,-0.010,0.524
age_group_5_45to49,0.4222,0.099,4.286,0.000,0.226,0.618

0,1,2,3
Omnibus:,3.628,Durbin-Watson:,1.942
Prob(Omnibus):,0.163,Jarque-Bera (JB):,3.556
Skew:,0.422,Prob(JB):,0.169
Kurtosis:,2.591,Cond. No.,1.19e+18


In [309]:
model39 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result39 = model39.fit()
result39.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.163
Model:,OLS,Adj. R-squared:,0.077
Method:,Least Squares,F-statistic:,1.885
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0647
Time:,20:31:55,Log-Likelihood:,-30.94
No. Observations:,97,AIC:,81.88
Df Residuals:,87,BIC:,107.6
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1715,0.027,80.597,0.000,2.118,2.225
race_grouping_white,1.1795,0.038,31.191,0.000,1.104,1.255
race_grouping_person_of_color,0.9920,0.050,19.838,0.000,0.893,1.091
age_group_5_25_under,-1.19e-16,5.67e-17,-2.100,0.039,-2.32e-16,-6.38e-18
age_group_5_25to29,0.1278,0.095,1.343,0.183,-0.061,0.317
age_group_5_30to34,0.1740,0.084,2.068,0.042,0.007,0.341
age_group_5_35to39,0.0399,0.104,0.386,0.701,-0.166,0.246
age_group_5_40to44,0.2782,0.133,2.095,0.039,0.014,0.542
age_group_5_45to49,0.3886,0.098,3.983,0.000,0.195,0.583

0,1,2,3
Omnibus:,2.252,Durbin-Watson:,1.971
Prob(Omnibus):,0.324,Jarque-Bera (JB):,2.27
Skew:,0.346,Prob(JB):,0.321
Kurtosis:,2.713,Cond. No.,3.2e+16


In [310]:
model40 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result40 = model40.fit()
result40.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.175
Model:,OLS,Adj. R-squared:,0.079
Method:,Least Squares,F-statistic:,1.82
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0689
Time:,20:31:55,Log-Likelihood:,-30.271
No. Observations:,97,AIC:,82.54
Df Residuals:,86,BIC:,110.9
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.6562,0.021,80.560,0.000,1.615,1.697
gender_Female,0.8756,0.044,19.823,0.000,0.788,0.963
gender_Male,0.7807,0.045,17.328,0.000,0.691,0.870
race_grouping_white,0.9065,0.041,22.069,0.000,0.825,0.988
race_grouping_person_of_color,0.7497,0.050,15.064,0.000,0.651,0.849
age_group_5_25_under,6.399e-17,5.21e-17,1.228,0.223,-3.96e-17,1.68e-16
age_group_5_25to29,0.0483,0.097,0.496,0.621,-0.145,0.242
age_group_5_30to34,0.1150,0.084,1.365,0.176,-0.052,0.282
age_group_5_35to39,0.0044,0.105,0.042,0.967,-0.205,0.214

0,1,2,3
Omnibus:,2.988,Durbin-Watson:,1.997
Prob(Omnibus):,0.224,Jarque-Bera (JB):,2.992
Skew:,0.402,Prob(JB):,0.224
Kurtosis:,2.693,Cond. No.,3.83e+16


## Commercial

### Gender

In [311]:
current_commercial_gender_salaried = commercial_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_gender_salaried)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,85
Male,41


In [312]:
current_commercial_gender_hourly = commercial_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_gender_hourly)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,75
Male,62


In [313]:
current_commercial_gender_salaried_median = commercial_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,85,90702.3
Male,41,89382.07


In [314]:
current_commercial_gender_hourly_median = commercial_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,75,31.57
Male,62,26.78


In [315]:
current_commercial_gender_age_salaried = commercial_salaried.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_commercial_gender_age_salaried

gender
Male     40.00
Female   34.00
Name: age, dtype: float64

In [316]:
current_commercial_gender_age_hourly = commercial_hourly.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_commercial_gender_age_hourly

gender
Male     47.00
Female   40.00
Name: age, dtype: float64

In [317]:
current_commercial_gender_age_5_salary = commercial_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,Female,30.0,83389.64
25-29,Male,9.0,90000.0
30-34,Female,13.0,97695.6
35-39,Female,14.0,133890.0
35-39,Male,5.0,80000.0
45-49,Female,11.0,123000.0
50-54,Male,5.0,89382.07
55-59,Female,8.0,97813.0
55-59,Male,6.0,92503.83
65+,Male,6.0,93162.67


In [318]:
current_commercial_gender_age_5_hourly = commercial_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,6,30.12
25-29,Female,18,35.13
25-29,Male,10,26.98
30-34,Female,9,33.33
40-44,Male,8,27.65
45-49,Female,10,32.27
45-49,Male,9,25.41
50-54,Male,5,24.24
55-59,Female,10,28.82
55-59,Male,9,28.65


In [319]:
current_commercial_gender_age_10_salary = commercial_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,Female,43.0,85000.0
25-34,Male,10.0,91500.0
35-44,Female,17.0,153000.0
35-44,Male,8.0,89031.01
45-54,Female,13.0,96170.0
45-54,Male,6.0,89303.54
55-64,Female,10.0,95154.04
55-64,Male,7.0,95050.62
65+,Male,6.0,93162.67


In [320]:
current_commercial_gender_age_10_hourly = commercial_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,6,30.12
25-34,Female,27,34.61
25-34,Male,14,26.98
35-44,Female,8,32.83
35-44,Male,12,28.85
45-54,Female,13,31.06
45-54,Male,14,25.09
55-64,Female,15,26.83
55-64,Male,15,28.65
65+,Female,6,31.26


In [321]:
current_commercial_gender_salaried_under_40 = commercial_salaried[commercial_salaried['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,59,90000.0
Male,19,84780.0


In [322]:
current_commercial_gender_salaried_over_40 = commercial_salaried[commercial_salaried['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,26,98381.29
Male,22,92236.74


In [323]:
current_commercial_gender_hourly_under_40 = commercial_hourly[commercial_hourly['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,37,33.11
Male,19,27.79


In [324]:
current_commercial_gender_hourly_over_40 = commercial_hourly[commercial_hourly['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,38,29.36
Male,43,26.75


### Race and ethnicity

In [325]:
current_commercial_race_salaried = commercial_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_salaried)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),88
Black or African American (United States of America),16
Asian (United States of America),13
Hispanic or Latino (United States of America),5


In [326]:
current_commercial_race_hourly = commercial_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_hourly)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
Black or African American (United States of America),75
White (United States of America),41
Asian (United States of America),8
Hispanic or Latino (United States of America),6


In [327]:
current_commercial_race_group_salaried = commercial_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_group_salaried)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
white,88
person of color,37


In [328]:
current_commercial_race_group_hourly = commercial_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_group_hourly)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
person of color,94
white,41


In [329]:
current_commercial_race_median_salaried = commercial_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),88,95415.31
Black or African American (United States of America),16,83584.64
Hispanic or Latino (United States of America),5,83048.54
Asian (United States of America),13,80000.0


In [330]:
current_commercial_race_median_hourly = commercial_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),41,32.36
Asian (United States of America),8,29.12
Black or African American (United States of America),75,28.23
Hispanic or Latino (United States of America),6,25.06


In [331]:
current_commercial_race_group_median_salaried = commercial_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_group_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,88,95415.31
person of color,37,80000.56


In [332]:
current_commercial_race_group_median_hourly = commercial_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_group_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,41,32.36
person of color,94,28.42


In [333]:
current_commercial_race_age_salaried = commercial_salaried.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_commercial_race_age_salaried

race_ethnicity
Black or African American (United States of America)   48.00
White (United States of America)                       36.00
Prefer Not to Disclose (United States of America)      33.00
Asian (United States of America)                       32.00
Hispanic or Latino (United States of America)          29.00
Two or More Races (United States of America)           28.00
Name: age, dtype: float64

In [334]:
current_commercial_race_age_hourly = commercial_hourly.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_commercial_race_age_hourly

race_ethnicity
Black or African American (United States of America)          48.00
American Indian or Alaska Native (United States of America)   40.00
White (United States of America)                              40.00
Prefer Not to Disclose (United States of America)             34.00
Two or More Races (United States of America)                  33.00
Hispanic or Latino (United States of America)                 30.00
Asian (United States of America)                              28.50
Name: age, dtype: float64

In [335]:
current_commercial_race_age_5_salary = commercial_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),5.0,80780.0
25-29,Asian (United States of America),6.0,77625.0
25-29,White (United States of America),26.0,86356.5
30-34,White (United States of America),10.0,98847.8
35-39,White (United States of America),15.0,155780.0
45-49,Black or African American (United States of America),5.0,89225.0
45-49,White (United States of America),6.0,151956.75
50-54,White (United States of America),5.0,89382.07
55-59,White (United States of America),12.0,92503.83


In [336]:
current_commercial_race_age_5_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,Black or African American (United States of America),8.0,29.94
25-29,White (United States of America),13.0,35.9
30-34,Black or African American (United States of America),5.0,27.88
35-39,Black or African American (United States of America),5.0,32.51
40-44,Black or African American (United States of America),5.0,28.96
45-49,Black or African American (United States of America),15.0,30.14
50-54,Black or African American (United States of America),6.0,23.99
55-59,Black or African American (United States of America),14.0,28.99
55-59,White (United States of America),5.0,27.98
60-64,Black or African American (United States of America),8.0,25.75


In [337]:
current_commercial_race_age_10_salary = commercial_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),5.0,80780.0
25-34,Asian (United States of America),9.0,80000.0
25-34,White (United States of America),36.0,89181.3
35-44,White (United States of America),19.0,153000.0
45-54,Black or African American (United States of America),7.0,83000.0
45-54,White (United States of America),11.0,141678.09
55-64,White (United States of America),14.0,96872.7


In [338]:
current_commercial_race_age_10_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,Black or African American (United States of America),13.0,29.74
25-34,White (United States of America),16.0,35.38
35-44,Black or African American (United States of America),10.0,30.96
35-44,White (United States of America),5.0,31.57
45-54,Black or African American (United States of America),21.0,27.79
45-54,White (United States of America),5.0,35.66
55-64,Black or African American (United States of America),22.0,26.81
55-64,White (United States of America),8.0,28.31
65+,Black or African American (United States of America),6.0,26.7


In [339]:
current_commercial_race_group_age_5_salary = commercial_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,5.0,80780.0
25-29,person of color,13.0,78500.0
25-29,white,26.0,86356.5
30-34,white,10.0,98847.8
35-39,white,15.0,155780.0
45-49,person of color,6.0,86112.5
45-49,white,6.0,151956.75
50-54,white,5.0,89382.07
55-59,white,12.0,92503.83


In [340]:
current_commercial_race_group_age_5_hourly = commercial_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,person of color,14.0,29.68
25-29,white,13.0,35.9
30-34,person of color,10.0,29.07
35-39,person of color,7.0,32.51
40-44,person of color,7.0,28.96
45-49,person of color,16.0,28.96
50-54,person of color,6.0,23.99
55-59,person of color,14.0,28.99
55-59,white,5.0,27.98
60-64,person of color,8.0,25.75


In [341]:
current_commercial_race_group_age_10_salary = commercial_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,5.0,80780.0
25-34,person of color,16.0,80000.0
25-34,white,36.0,89181.3
35-44,person of color,6.0,87845.99
35-44,white,19.0,153000.0
45-54,person of color,8.0,81500.28
45-54,white,11.0,141678.09
55-64,white,14.0,96872.7


In [342]:
current_commercial_race_group_age_10_hourly = commercial_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,person of color,24.0,29.68
25-34,white,16.0,35.38
35-44,person of color,14.0,30.96
35-44,white,5.0,31.57
45-54,person of color,22.0,26.6
45-54,white,5.0,35.66
55-64,person of color,22.0,26.81
55-64,white,8.0,28.31
65+,person of color,8.0,26.78


In [343]:
current_commercial_race_under_40_salaried = commercial_salaried[commercial_salaried['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_under_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),56,93852.75
Asian (United States of America),12,80000.0


In [344]:
current_commercial_race_over_40_salaried = commercial_salaried[commercial_salaried['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_over_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),32,104022.79
Black or African American (United States of America),12,85573.5


In [345]:
current_commercial_race_under_40_hourly = commercial_hourly[commercial_hourly['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_under_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),20,34.74
Black or African American (United States of America),21,30.19
Asian (United States of America),5,29.63
Hispanic or Latino (United States of America),5,23.56


In [346]:
current_commercial_race_over_40_hourly = commercial_hourly[commercial_hourly['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_over_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),21,31.29
Black or African American (United States of America),54,26.81


### Gender x race/ethnicity

In [347]:
current_commercial_race_gender_salaried = commercial_salaried.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,8
Asian (United States of America),Male,5
Black or African American (United States of America),Female,9
Black or African American (United States of America),Male,7
White (United States of America),Female,62
White (United States of America),Male,26


In [348]:
current_commercial_race_gender_hourly = commercial_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,5
Black or African American (United States of America),Female,40
Black or African American (United States of America),Male,35
White (United States of America),Female,23
White (United States of America),Male,18


In [349]:
current_commercial_race_gender_median_salaried = commercial_salaried.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_median_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,22,80879.99
person of color,Male,15,80000.56
white,Female,62,98047.8
white,Male,26,94693.21


In [350]:
current_commercial_race_gender_median_hourly = commercial_hourly.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_median_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,52,29.7
person of color,Male,42,26.7
white,Female,23,34.61
white,Male,18,28.7


In [351]:
current_commercial_race_gender_under_40_salaried = commercial_salaried[commercial_salaried['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_under_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,8,81350.05
White (United States of America),Female,45,94202.1
White (United States of America),Male,11,93000.0


In [352]:
current_commercial_race_gender_under_40_hourly = commercial_hourly[commercial_hourly['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_under_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Black or African American (United States of America),Female,14,30.67
Black or African American (United States of America),Male,7,29.74
White (United States of America),Female,14,35.38
White (United States of America),Male,6,29.26


In [353]:
current_commercial_race_gender_over_40_salaried = commercial_salaried[commercial_salaried['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_over_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Black or African American (United States of America),Female,7,83000.0
Black or African American (United States of America),Male,5,86977.73
White (United States of America),Female,17,123000.0
White (United States of America),Male,15,95050.62


In [354]:
current_commercial_race_gender_over_40_hourly = commercial_hourly[commercial_hourly['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_over_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Black or African American (United States of America),Female,26,27.79
Black or African American (United States of America),Male,28,26.66
White (United States of America),Female,9,33.46
White (United States of America),Male,12,28.7


### Years of service

In [355]:
current_commercial_yos_salary = commercial_salaried.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_salary)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,14,84890.0
1-2,43,87075.0
3-5,24,99107.68
6-10,18,134000.05
11-15,7,83000.0
16-20,7,89957.05
21-25,7,86977.73
25+,6,89609.95


In [356]:
current_commercial_yos_hourly = commercial_hourly.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_hourly)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,16,31.92
1-2,31,29.63
3-5,22,27.88
6-10,16,27.31
11-15,16,28.85
16-20,13,28.61
21-25,11,29.74
25+,12,28.64


In [357]:
current_commercial_yos_gender_salary = commercial_salaried.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,8,84890.0
0,Male,6,82500.0
1-2,Female,34,86877.54
1-2,Male,9,89382.07
3-5,Female,19,99465.35
3-5,Male,5,98750.0
6-10,Female,11,112000.0
6-10,Male,7,144476.99
16-20,Female,6,96685.35
21-25,Male,6,82338.57


In [358]:
current_commercial_yos_gender_hourly = commercial_hourly.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,10.0,34.1
0,Male,6.0,27.32
1-2,Female,17.0,31.06
1-2,Male,14.0,24.62
3-5,Female,12.0,33.09
3-5,Male,10.0,25.57
6-10,Male,12.0,26.61
11-15,Male,12.0,28.73
16-20,Female,8.0,27.33
16-20,Male,5.0,32.36


In [359]:
current_commercial_yos_race_salary = commercial_salaried.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,White (United States of America),8.0,92890.0
1-2,Asian (United States of America),7.0,77655.0
1-2,White (United States of America),31.0,89382.07
3-5,White (United States of America),18.0,99732.68
6-10,White (United States of America),15.0,144476.99
16-20,White (United States of America),5.0,114610.73
21-25,White (United States of America),5.0,99347.61


In [360]:
current_commercial_yos_race_hourly = commercial_hourly.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Black or African American (United States of America),8.0,29.69
0,White (United States of America),7.0,35.9
1-2,Asian (United States of America),5.0,29.63
1-2,Black or African American (United States of America),13.0,29.18
1-2,White (United States of America),7.0,33.11
3-5,Black or African American (United States of America),11.0,26.57
3-5,White (United States of America),9.0,31.57
6-10,Black or African American (United States of America),9.0,26.75
6-10,White (United States of America),5.0,31.29
11-15,Black or African American (United States of America),8.0,30.05


In [361]:
current_commercial_yos_race_gender_salary = commercial_salaried.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,white,Female,5.0,110000.0
1-2,person of color,Female,10.0,77717.5
1-2,white,Female,23.0,88362.6
1-2,white,Male,8.0,91191.04
3-5,white,Female,16.0,109732.68
6-10,white,Female,10.0,125000.27
6-10,white,Male,5.0,155942.78


In [362]:
current_commercial_yos_race_gender_hourly = commercial_hourly.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Male,5.0,25.41
0,white,Female,6.0,35.38
1-2,person of color,Female,12.0,30.16
1-2,person of color,Male,10.0,25.41
1-2,white,Female,5.0,33.11
3-5,person of color,Female,6.0,27.33
3-5,person of color,Male,7.0,26.57
3-5,white,Female,6.0,36.11
6-10,person of color,Male,8.0,26.61
11-15,person of color,Male,6.0,28.84


### Age

In [363]:
current_median_commercial_age_5_salaried = commercial_salaried.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,6,80390.0
25-29,39,84780.0
30-34,14,98847.8
35-39,19,112000.0
40-44,6,125875.0
45-49,12,109585.0
50-54,7,80000.56
55-59,14,92503.83
65+,6,93162.67


In [364]:
current_median_commercial_age_5_hourly = commercial_hourly.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,7,30.06
25-29,28,31.61
30-34,13,33.11
35-39,8,32.44
40-44,12,28.85
45-49,19,31.06
50-54,8,24.2
55-59,19,28.65
60-64,11,26.0
65+,12,27.52


In [365]:
current_median_commercial_age_10_salaried = commercial_salaried.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,6,80390.0
25-34,53,85045.1
35-44,25,112000.0
45-54,19,94516.42
55-64,17,95050.62
65+,6,93162.67


In [366]:
current_median_commercial_age_10_hourly = commercial_hourly.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,7,30.06
25-34,41,32.05
35-44,20,30.96
45-54,27,28.61
55-64,30,27.14
65+,12,27.52


In [367]:
current_commercial_age_5_yos_salary = commercial_salaried.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_5_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,0,9.0,84780.0
25-29,1-2,22.0,81217.5
25-29,3-5,7.0,83730.75
30-34,1-2,7.0,90702.3
30-34,3-5,5.0,109568.81
35-39,6-10,8.0,133890.0


In [368]:
current_commercial_age_5_yos_hourly = commercial_hourly.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_5_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,0,9.0,35.9
25-29,1-2,12.0,30.45
25-29,3-5,7.0,27.79
30-34,1-2,5.0,33.11
55-59,3-5,5.0,27.98
55-59,25+,5.0,32.21


In [369]:
current_commercial_age_10_yos_salary = commercial_salaried.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_10_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,0,10.0,84780.0
25-34,1-2,29.0,85045.1
25-34,3-5,12.0,97232.68
35-44,3-5,5.0,153000.0
35-44,6-10,9.0,155780.0
45-54,1-2,5.0,94516.42


In [370]:
current_commercial_age_10_yos_hourly = commercial_hourly.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_10_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,0,13.0,33.33
25-34,1-2,17.0,31.17
25-34,3-5,9.0,27.79
35-44,11-15,7.0,32.12
45-54,1-2,5.0,23.83
45-54,6-10,6.0,24.67
45-54,21-25,5.0,32.47
55-64,3-5,6.0,26.49
55-64,16-20,5.0,26.0
55-64,21-25,5.0,27.35


In [371]:
current_median_commercial_age_5_gender_salaried = commercial_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,Female,30.0,83389.64
25-29,Male,9.0,90000.0
30-34,Female,13.0,97695.6
35-39,Female,14.0,133890.0
35-39,Male,5.0,80000.0
45-49,Female,11.0,123000.0
50-54,Male,5.0,89382.07
55-59,Female,8.0,97813.0
55-59,Male,6.0,92503.83
65+,Male,6.0,93162.67


In [372]:
current_median_commercial_age_5_gender_hourly = commercial_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,6,30.12
25-29,Female,18,35.13
25-29,Male,10,26.98
30-34,Female,9,33.33
40-44,Male,8,27.65
45-49,Female,10,32.27
45-49,Male,9,25.41
50-54,Male,5,24.24
55-59,Female,10,28.82
55-59,Male,9,28.65


In [373]:
current_median_commercial_age_10_gender_salaried = commercial_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,Female,43.0,85000.0
25-34,Male,10.0,91500.0
35-44,Female,17.0,153000.0
35-44,Male,8.0,89031.01
45-54,Female,13.0,96170.0
45-54,Male,6.0,89303.54
55-64,Female,10.0,95154.04
55-64,Male,7.0,95050.62
65+,Male,6.0,93162.67


In [374]:
current_median_commercial_age_10_gender_hourly = commercial_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,6,30.12
25-34,Female,27,34.61
25-34,Male,14,26.98
35-44,Female,8,32.83
35-44,Male,12,28.85
45-54,Female,13,31.06
45-54,Male,14,25.09
55-64,Female,15,26.83
55-64,Male,15,28.65
65+,Female,6,31.26


In [375]:
current_median_commercial_age_5_race_salaried = commercial_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),5.0,80780.0
25-29,Asian (United States of America),6.0,77625.0
25-29,White (United States of America),26.0,86356.5
30-34,White (United States of America),10.0,98847.8
35-39,White (United States of America),15.0,155780.0
45-49,Black or African American (United States of America),5.0,89225.0
45-49,White (United States of America),6.0,151956.75
50-54,White (United States of America),5.0,89382.07
55-59,White (United States of America),12.0,92503.83


In [376]:
current_median_commercial_age_5_race_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,Black or African American (United States of America),8.0,29.94
25-29,White (United States of America),13.0,35.9
30-34,Black or African American (United States of America),5.0,27.88
35-39,Black or African American (United States of America),5.0,32.51
40-44,Black or African American (United States of America),5.0,28.96
45-49,Black or African American (United States of America),15.0,30.14
50-54,Black or African American (United States of America),6.0,23.99
55-59,Black or African American (United States of America),14.0,28.99
55-59,White (United States of America),5.0,27.98
60-64,Black or African American (United States of America),8.0,25.75


In [377]:
current_median_commercial_age_5_race_group_salaried = commercial_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,5.0,80780.0
25-29,person of color,13.0,78500.0
25-29,white,26.0,86356.5
30-34,white,10.0,98847.8
35-39,white,15.0,155780.0
45-49,person of color,6.0,86112.5
45-49,white,6.0,151956.75
50-54,white,5.0,89382.07
55-59,white,12.0,92503.83


In [378]:
current_median_commercial_age_5_race_group_hourly = commercial_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
25-29,person of color,14.0,29.68
25-29,white,13.0,35.9
30-34,person of color,10.0,29.07
35-39,person of color,7.0,32.51
40-44,person of color,7.0,28.96
45-49,person of color,16.0,28.96
50-54,person of color,6.0,23.99
55-59,person of color,14.0,28.99
55-59,white,5.0,27.98
60-64,person of color,8.0,25.75


In [379]:
current_median_commercial_age_10_race_salaried = commercial_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),5.0,80780.0
25-34,Asian (United States of America),9.0,80000.0
25-34,White (United States of America),36.0,89181.3
35-44,White (United States of America),19.0,153000.0
45-54,Black or African American (United States of America),7.0,83000.0
45-54,White (United States of America),11.0,141678.09
55-64,White (United States of America),14.0,96872.7


In [380]:
current_median_commercial_age_10_race_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,Black or African American (United States of America),13.0,29.74
25-34,White (United States of America),16.0,35.38
35-44,Black or African American (United States of America),10.0,30.96
35-44,White (United States of America),5.0,31.57
45-54,Black or African American (United States of America),21.0,27.79
45-54,White (United States of America),5.0,35.66
55-64,Black or African American (United States of America),22.0,26.81
55-64,White (United States of America),8.0,28.31
65+,Black or African American (United States of America),6.0,26.7


In [381]:
current_median_commercial_age_10_race_group_salaried = commercial_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,5.0,80780.0
25-34,person of color,16.0,80000.0
25-34,white,36.0,89181.3
35-44,person of color,6.0,87845.99
35-44,white,19.0,153000.0
45-54,person of color,8.0,81500.28
45-54,white,11.0,141678.09
55-64,white,14.0,96872.7


In [382]:
current_median_commercial_age_10_race_group_hourly = commercial_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
25-34,person of color,24.0,29.68
25-34,white,16.0,35.38
35-44,person of color,14.0,30.96
35-44,white,5.0,31.57
45-54,person of color,22.0,26.6
45-54,white,5.0,35.66
55-64,person of color,22.0,26.81
55-64,white,8.0,28.31
65+,person of color,8.0,26.78


In [383]:
current_median_commercial_age_5_race_gender_salaried = commercial_salaried.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-29,Asian (United States of America),Female,5.0,77595.0
25-29,White (United States of America),Female,20.0,84809.55
25-29,White (United States of America),Male,6.0,93935.0
30-34,White (United States of America),Female,9.0,97695.6
35-39,White (United States of America),Female,14.0,133890.0
45-49,White (United States of America),Female,6.0,151956.75
55-59,White (United States of America),Female,6.0,97813.0
55-59,White (United States of America),Male,6.0,92503.83


In [384]:
current_median_commercial_age_5_race_gender_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-29,Black or African American (United States of America),Female,6.0,32.9
25-29,White (United States of America),Female,8.0,37.77
25-29,White (United States of America),Male,5.0,26.16
45-49,Black or African American (United States of America),Female,6.0,31.57
45-49,Black or African American (United States of America),Male,9.0,25.41
55-59,Black or African American (United States of America),Female,9.0,29.66
55-59,Black or African American (United States of America),Male,5.0,28.97
60-64,Black or African American (United States of America),Female,5.0,25.5


In [385]:
current_median_commercial_age_5_race_group_gender_salaried = commercial_salaried.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-29,person of color,Female,10.0,77625.0
25-29,white,Female,20.0,84809.55
25-29,white,Male,6.0,93935.0
30-34,white,Female,9.0,97695.6
35-39,white,Female,14.0,133890.0
45-49,person of color,Female,5.0,83000.0
45-49,white,Female,6.0,151956.75
55-59,white,Female,6.0,97813.0
55-59,white,Male,6.0,92503.83


In [386]:
current_median_commercial_age_5_race_group_gender_hourly = commercial_hourly.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-29,person of color,Female,10.0,30.66
25-29,white,Female,8.0,37.77
25-29,white,Male,5.0,26.16
30-34,person of color,Female,6.0,30.6
40-44,person of color,Male,5.0,26.56
45-49,person of color,Female,7.0,31.06
45-49,person of color,Male,9.0,25.41
55-59,person of color,Female,9.0,29.66
55-59,person of color,Male,5.0,28.97
60-64,person of color,Female,5.0,25.5


In [387]:
current_median_commercial_age_10_race_gender_salaried = commercial_salaried.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-34,Asian (United States of America),Female,8.0,81350.05
25-34,White (United States of America),Female,29.0,86680.09
25-34,White (United States of America),Male,7.0,94870.0
35-44,White (United States of America),Female,17.0,153000.0
45-54,Black or African American (United States of America),Female,5.0,83000.0
45-54,White (United States of America),Female,7.0,148624.0
55-64,White (United States of America),Female,7.0,100592.59
55-64,White (United States of America),Male,7.0,95050.62


In [388]:
current_median_commercial_age_10_race_gender_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-34,Black or African American (United States of America),Female,9.0,30.14
25-34,White (United States of America),Female,11.0,35.9
25-34,White (United States of America),Male,5.0,26.16
35-44,Black or African American (United States of America),Male,6.0,29.09
45-54,Black or African American (United States of America),Female,9.0,30.14
45-54,Black or African American (United States of America),Male,12.0,25.09
55-64,Black or African American (United States of America),Female,14.0,26.81
55-64,Black or African American (United States of America),Male,8.0,27.81
55-64,White (United States of America),Male,7.0,28.65


In [389]:
current_median_commercial_age_10_race_group_gender_salaried = commercial_salaried.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-34,person of color,Female,13.0,78500.0
25-34,white,Female,29.0,86680.09
25-34,white,Male,7.0,94870.0
35-44,person of color,Male,6.0,87845.99
35-44,white,Female,17.0,153000.0
45-54,person of color,Female,6.0,80879.99
45-54,white,Female,7.0,148624.0
55-64,white,Female,7.0,100592.59
55-64,white,Male,7.0,95050.62


In [390]:
current_median_commercial_age_10_race_group_gender_hourly = commercial_hourly.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
25-34,person of color,Female,16.0,30.66
25-34,person of color,Male,8.0,28.0
25-34,white,Female,11.0,35.9
25-34,white,Male,5.0,26.16
35-44,person of color,Female,6.0,33.25
35-44,person of color,Male,8.0,27.76
45-54,person of color,Female,10.0,29.38
45-54,person of color,Male,12.0,25.09
55-64,person of color,Female,14.0,26.81
55-64,person of color,Male,8.0,27.81


### Departments

In [391]:
current_commercial_median_department_salaried = commercial_salaried.groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_salaried)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,9,95000.0
Client Solutions,91,90241.58
Audience Development and Insights,18,90000.0
Production,5,75000.51


In [392]:
current_commercial_median_department_hourly = commercial_hourly.groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_hourly)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Marketing,7,39.64
Public Relations,7,38.4
Washington Post Live,7,33.33
Client Solutions,48,31.23
Finance,23,30.26
Production,33,25.41
Customer Care and Logistics,12,21.67


In [393]:
current_commercial_median_department_gender_salaried = commercial_salaried.groupby(['department','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Finance,Female,6,95585.0
Audience Development and Insights,Female,8,94391.7
Client Solutions,Female,66,90471.94
Client Solutions,Male,25,89957.05
Audience Development and Insights,Male,10,87500.0


In [394]:
current_commercial_median_department_gender_hourly = commercial_hourly.groupby(['department','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Marketing,Female,7,39.64
Public Relations,Female,7,38.4
Washington Post Live,Female,5,33.33
Client Solutions,Male,18,32.24
Finance,Female,16,30.6
Finance,Male,7,30.26
Client Solutions,Female,30,30.16
Production,Male,32,25.41
Customer Care and Logistics,Female,9,21.63


In [395]:
current_commercial_median_department_race_salaried = commercial_salaried.groupby(['department','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Client Solutions,White (United States of America),68,99406.48
Audience Development and Insights,White (United States of America),14,93935.0
Client Solutions,Black or African American (United States of America),9,78759.98
Client Solutions,Asian (United States of America),10,77677.21


In [396]:
current_commercial_median_department_race_hourly = commercial_hourly.groupby(['department','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Marketing,White (United States of America),5,39.64
Client Solutions,White (United States of America),16,33.28
Client Solutions,Asian (United States of America),5,31.17
Finance,Black or African American (United States of America),17,30.26
Client Solutions,Black or African American (United States of America),23,29.18
Production,White (United States of America),7,25.23
Production,Black or African American (United States of America),22,25.2
Customer Care and Logistics,Black or African American (United States of America),8,21.24


In [397]:
current_commercial_median_department_race_gender_salaried = commercial_salaried.groupby(['department','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Client Solutions,White (United States of America),Female,51,99465.35
Client Solutions,White (United States of America),Male,17,99347.61
Audience Development and Insights,White (United States of America),Female,7,98783.4
Audience Development and Insights,White (United States of America),Male,7,93000.0
Client Solutions,Black or African American (United States of America),Female,6,78629.99
Client Solutions,Asian (United States of America),Female,6,77625.0


In [398]:
current_commercial_median_department_race_gender_hourly = commercial_hourly.groupby(['department','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Marketing,White (United States of America),Female,5,39.64
Client Solutions,White (United States of America),Female,8,34.03
Client Solutions,White (United States of America),Male,8,32.24
Client Solutions,Black or African American (United States of America),Male,8,31.41
Finance,Black or African American (United States of America),Female,12,30.6
Finance,Black or African American (United States of America),Male,5,30.26
Client Solutions,Black or African American (United States of America),Female,15,28.61
Production,Black or African American (United States of America),Male,21,25.41
Production,White (United States of America),Male,7,25.23
Customer Care and Logistics,Black or African American (United States of America),Female,8,21.24


In [399]:
current_commercial_median_department_race_group_gender_salaried = commercial_salaried.groupby(['department','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Client Solutions,white,Female,51,99465.35
Client Solutions,white,Male,17,99347.61
Audience Development and Insights,white,Female,7,98783.4
Audience Development and Insights,white,Male,7,93000.0
Client Solutions,person of color,Male,8,82084.64
Client Solutions,person of color,Female,15,77655.0


In [400]:
current_commercial_median_department_race_group_gender_hourly = commercial_hourly.groupby(['department','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Marketing,white,Female,5,39.64
Client Solutions,white,Female,8,34.03
Client Solutions,white,Male,8,32.24
Client Solutions,person of color,Male,10,31.41
Finance,person of color,Female,13,30.14
Finance,person of color,Male,6,29.75
Client Solutions,person of color,Female,22,28.61
Production,person of color,Male,24,25.41
Production,white,Male,7,25.23
Customer Care and Logistics,person of color,Female,8,21.24


In [401]:
current_commercial_median_department_race_gender_age5_salaried = commercial_salaried.groupby(['department','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Client Solutions,White (United States of America),Female,35-39,12.0,158686.7
Client Solutions,White (United States of America),Female,45-49,5.0,155289.5
Client Solutions,White (United States of America),Female,55-59,5.0,107453.0
Client Solutions,White (United States of America),Female,30-34,8.0,93029.1
Client Solutions,White (United States of America),Female,25-29,16.0,84284.93
Client Solutions,Asian (United States of America),Female,25-29,5.0,77595.0


In [402]:
current_commercial_median_department_race_gender_age5_hourly = commercial_hourly.groupby(['department','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Marketing,White (United States of America),Female,25-29,5.0,39.64
Finance,Black or African American (United States of America),Female,45-49,5.0,31.06
Production,Black or African American (United States of America),Male,45-49,6.0,24.67


In [403]:
current_commercial_median_department_race_group_gender_age5_salaried = commercial_salaried.groupby(['department','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Client Solutions,white,Female,35-39,12.0,158686.7
Client Solutions,white,Female,45-49,5.0,155289.5
Client Solutions,white,Female,55-59,5.0,107453.0
Client Solutions,white,Female,30-34,8.0,93029.1
Client Solutions,white,Female,25-29,16.0,84284.93
Client Solutions,person of color,Female,25-29,9.0,77595.0


In [404]:
current_commercial_median_department_race_group_gender_age5_hourly = commercial_hourly.groupby(['department','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Marketing,white,Female,25-29,5.0,39.64
Finance,person of color,Female,45-49,5.0,31.06
Production,person of color,Male,65+,5.0,26.75
Production,person of color,Male,45-49,6.0,24.67


### Job profiles

In [405]:
current_commercial_median_job_salaried = commercial_salaried.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_salaried)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
450220 - Sales Representative,24,164497.8
551104 - Senior Financial Accountant,5,96170.0
450120 - Account Manager,19,93503.4
280428 - Designer - Content,6,87000.3
481205 - Data Analyst,8,84780.0
340227 - Artist,5,77699.41
660127 - Make-Up Person,5,75000.51
231303 - Client Service Manager,14,73114.02


In [406]:
current_commercial_median_job_hourly = commercial_hourly.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_hourly)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
240101 - Digital Marketing Specialist,5,41.03
341027 - Desktop Publisher,6,32.24
574504 - Senior Accounting Specialist,16,31.68
565005 - Accounting Specialist,7,28.74
470121 - Account Executive,11,26.05
600318 - Circulation Driver (Class A),29,25.23


In [407]:
current_commercial_median_job_gender_salaried = commercial_salaried.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
450220 - Sales Representative,Female,20,162794.5
450120 - Account Manager,Male,7,95691.99
450120 - Account Manager,Female,12,91609.44
481205 - Data Analyst,Male,6,82780.0
231303 - Client Service Manager,Female,12,73114.02


In [408]:
current_commercial_median_job_gender_hourly = commercial_hourly.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
574504 - Senior Accounting Specialist,Female,12,31.82
470121 - Account Executive,Female,10,25.77
600318 - Circulation Driver (Class A),Male,28,25.32


In [409]:
current_commercial_median_job_race_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
450220 - Sales Representative,White (United States of America),23,163995.6
450120 - Account Manager,White (United States of America),11,107453.0
481205 - Data Analyst,White (United States of America),5,84780.0
450120 - Account Manager,Black or African American (United States of America),5,84169.27
231303 - Client Service Manager,White (United States of America),10,71229.54


In [410]:
current_commercial_median_job_race_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
574504 - Senior Accounting Specialist,Black or African American (United States of America),11,32.08
565005 - Accounting Specialist,Black or African American (United States of America),5,29.01
600318 - Circulation Driver (Class A),White (United States of America),6,25.77
470121 - Account Executive,Black or African American (United States of America),7,25.5
600318 - Circulation Driver (Class A),Black or African American (United States of America),19,24.78


In [411]:
current_commercial_median_job_race_gender_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
450220 - Sales Representative,White (United States of America),Female,20,162794.5
450120 - Account Manager,White (United States of America),Female,7,100592.59
231303 - Client Service Manager,White (United States of America),Female,8,71229.54


In [412]:
current_commercial_median_job_race_gender_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
574504 - Senior Accounting Specialist,Black or African American (United States of America),Female,9,32.08
600318 - Circulation Driver (Class A),White (United States of America),Male,6,25.77
470121 - Account Executive,Black or African American (United States of America),Female,7,25.5
600318 - Circulation Driver (Class A),Black or African American (United States of America),Male,18,24.67


In [413]:
current_commercial_median_job_race_group_gender_salaried = commercial_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
non-newsroom,white,Female,62,98047.8
non-newsroom,white,Male,26,94693.21
non-newsroom,person of color,Female,22,80879.99
non-newsroom,person of color,Male,15,80000.56


In [414]:
current_commercial_median_job_race_group_gender_hourly = commercial_hourly.groupby(['job_profile_current','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
574504 - Senior Accounting Specialist,person of color,Female,10,31.57
600318 - Circulation Driver (Class A),white,Male,6,25.77
470121 - Account Executive,person of color,Female,8,25.5
600318 - Circulation Driver (Class A),person of color,Male,21,25.41


In [415]:
current_commercial_median_job_race_gender_age5_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
450220 - Sales Representative,White (United States of America),Female,35-39,9.0,163995.6
231303 - Client Service Manager,White (United States of America),Female,25-29,6.0,70448.15


In [416]:
current_commercial_median_job_race_gender_age5_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
574504 - Senior Accounting Specialist,Black or African American (United States of America),Female,45-49,5.0,31.06
600318 - Circulation Driver (Class A),Black or African American (United States of America),Male,45-49,5.0,24.56


In [417]:
current_commercial_median_job_race_group_gender_age5_salaried = commercial_salaried.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
450220 - Sales Representative,white,Female,35-39,9.0,163995.6
231303 - Client Service Manager,white,Female,25-29,6.0,70448.15


In [418]:
current_commercial_median_job_race_group_gender_age5_hourly = commercial_hourly.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
574504 - Senior Accounting Specialist,person of color,Female,45-49,5.0,31.06
600318 - Circulation Driver (Class A),person of color,Male,45-49,5.0,24.56


### Performance evaluations

In [419]:
commercial_ratings = ratings_combined[ratings_combined['dept'] == "Commercial"]

In [420]:
commercial_ratings_gender = commercial_ratings.groupby(['gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
commercial_ratings_gender

Unnamed: 0_level_0,performance_rating,performance_rating
Unnamed: 0_level_1,count_nonzero,median
gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,3952,3.3
Male,2769,3.2


In [421]:
commercial_ratings_race = commercial_ratings.groupby(['race_ethnicity']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(commercial_ratings_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Native Hawaiian or Other Pacific Islander (United States of America),13,3.7
American Indian or Alaska Native (United States of America),13,3.4
Two or More Races (United States of America),195,3.4
Prefer Not to Disclose (United States of America),91,3.3
White (United States of America),3250,3.3
Asian (United States of America),546,3.2
Black or African American (United States of America),2327,3.2
Hispanic or Latino (United States of America),286,3.2


In [422]:
commercial_ratings_race_gender = commercial_ratings.groupby(['race_ethnicity','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_ratings_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
American Indian or Alaska Native (United States of America),Female,13,3.4
Asian (United States of America),Female,377,3.3
Asian (United States of America),Male,169,3.15
Black or African American (United States of America),Female,1144,3.3
Black or African American (United States of America),Male,1183,3.1
Hispanic or Latino (United States of America),Female,156,3.2
Hispanic or Latino (United States of America),Male,130,3.2
Native Hawaiian or Other Pacific Islander (United States of America),Female,13,3.7
Prefer Not to Disclose (United States of America),Female,52,3.3
Prefer Not to Disclose (United States of America),Male,39,3.0


### Pay changes

In [423]:
commercial_change = reason_for_change_combined[reason_for_change_combined['dept'] == 'Commercial']

In [424]:
commercial_change_gender = commercial_change.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,gender,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,Female,720
Request Compensation Change > Adjustment > Contract Increase,Male,530
Merit > Performance > Annual Performance Appraisal,Female,437
Request Compensation Change > Adjustment > Change Plan Assignment,Female,369
Merit > Performance > Annual Performance Appraisal,Male,293
Request Compensation Change > Adjustment > Change Plan Assignment,Male,187
Promotion > Promotion > Promotion,Female,173
Transfer > Transfer > Move to another manager,Female,121
Request Compensation Change > Adjustment > Market Adjustment,Female,97
Hire Employee > New Hire > Fill Vacancy,Female,93


In [425]:
commercial_change_race = commercial_change[commercial_change['business_process_reason'] == 'Merit > Performance > Annual Performance Appraisal'].groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,race_ethnicity,Unnamed: 2_level_1
Merit > Performance > Annual Performance Appraisal,White (United States of America),318
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),316
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),46
Merit > Performance > Annual Performance Appraisal,Hispanic or Latino (United States of America),26
Merit > Performance > Annual Performance Appraisal,Two or More Races (United States of America),14


In [426]:
commercial_change_race_gender = commercial_change[commercial_change['business_process_reason'] == 'Merit > Performance > Annual Performance Appraisal'].groupby(['business_process_reason','race_ethnicity','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero
business_process_reason,race_ethnicity,gender,Unnamed: 3_level_1
Merit > Performance > Annual Performance Appraisal,White (United States of America),Female,198
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),Female,173
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),Male,143
Merit > Performance > Annual Performance Appraisal,White (United States of America),Male,120
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),Female,28
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),Male,18
Merit > Performance > Annual Performance Appraisal,Hispanic or Latino (United States of America),Female,18
Merit > Performance > Annual Performance Appraisal,Two or More Races (United States of America),Female,11
Merit > Performance > Annual Performance Appraisal,Hispanic or Latino (United States of America),Male,8


### Performance evaluations x merit raises

In [427]:
import re
reason_for_change_combined['merit_raises'] = reason_for_change_combined['business_process_reason'].str.contains('Merit', re.IGNORECASE)

In [428]:
twenty14 = np.datetime64('2016-04-01')
twenty15 = np.datetime64('2017-04-01')
twenty16 = np.datetime64('2018-04-01')
twenty17 = np.datetime64('2019-04-01')
twenty18 = np.datetime64('2020-04-01')

def raise_time(row):
    if row['effective_date'] < twenty14:
        return 'before 2015'
    if row['effective_date'] < twenty15:
        return '2015'
    if row['effective_date'] < twenty16:
        return '2016'
    if row['effective_date'] < twenty17:
        return '2017'
    if row['effective_date'] < twenty18:
        return '2018'
    return 'unknown'

reason_for_change_combined['raise_after'] = reason_for_change_combined.apply(lambda row: raise_time(row), axis=1)

In [429]:
merit_raises_commercial_gender_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
merit_raises_commercial_gender_salaried

Unnamed: 0_level_0,base_pay_change,base_pay_change
Unnamed: 0_level_1,count_nonzero,median
gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,170,1612.05
Male,82,1281.37


In [430]:
merit_raises_commercial_gender_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
merit_raises_commercial_gender_hourly

Unnamed: 0_level_0,base_pay_change,base_pay_change
Unnamed: 0_level_1,count_nonzero,median
gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,213,0.45
Male,192,0.35


In [431]:
merit_raises_commercial_race_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),166,1611.8
Asian (United States of America),30,1403.82
Black or African American (United States of America),42,1178.33
Two or More Races (United States of America),7,1000.0


In [432]:
merit_raises_commercial_race_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Two or More Races (United States of America),5,0.58
White (United States of America),121,0.46
Hispanic or Latino (United States of America),16,0.43
Asian (United States of America),13,0.42
Black or African American (United States of America),246,0.35


In [433]:
merit_raises_commercial_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_group_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,166,1611.8
person of color,83,1225.0


In [434]:
merit_raises_commercial_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_group_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,121,0.46
person of color,283,0.37


In [435]:
merit_raises_commercial_gender_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_gender_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,119,1719.5
Male,white,47,1303.24
Female,person of color,48,1273.68
Male,person of color,35,1134.24


In [436]:
merit_raises_commercial_gender_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_gender_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,55,0.56
Male,white,66,0.41
Female,person of color,158,0.41
Male,person of color,125,0.33


In [437]:
fifteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,6,814.24


In [438]:
fifteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,6,3.45


In [439]:
sixteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,5,1729.4
Female,white,6,1527.03
Male,person of color,5,1442.77
Male,white,5,1355.89


In [440]:
sixteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,5,3.5
Female,white,6,3.8
Male,person of color,5,3.3
Male,white,5,3.2


In [441]:
seventeen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,10,1425.74
Male,person of color,6,950.1


In [442]:
seventeen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,10,3.3
Male,person of color,6,3.25


In [443]:
eighteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,8,1307.8
Female,white,17,1844.28
Male,person of color,5,1050.0


In [444]:
eighteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,8,3.4
Female,white,17,3.5
Male,person of color,5,3.4


In [445]:
merit_raises_15 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2015') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_16 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2016') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_17 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2017') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_18 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2018') & (reason_for_change_combined['merit_raises'] == True)]

merit_raises_15 = merit_raises_15[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
merit_raises_16 = merit_raises_16[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2016_annual_performance_rating']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
merit_raises_17 = merit_raises_17[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2017_annual_performance_rating']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
merit_raises_18 = merit_raises_18[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2018_annual_performance_rating']].rename(columns={'2018_annual_performance_rating':'performance_rating'})

merit_raises_15 = pd.DataFrame(merit_raises_15)
merit_raises_16 = pd.DataFrame(merit_raises_16)
merit_raises_17 = pd.DataFrame(merit_raises_17)
merit_raises_18 = pd.DataFrame(merit_raises_18)

merit_raises_combined = pd.concat([merit_raises_15,merit_raises_16,merit_raises_17,merit_raises_18])

In [446]:
commercial_salaried_raises = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Salaried'].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(commercial_salaried_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,108,2500.0
Female,unknown,9,2720.0
Female,white,289,2500.0
Male,person of color,95,2059.93
Male,unknown,6,3250.0
Male,white,348,3000.0


In [447]:
commercial_salaried_raises_scores = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Salaried'].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_salaried_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,108,3.4
Female,unknown,9,3.9
Female,white,289,3.5
Male,person of color,95,3.4
Male,unknown,6,3.75
Male,white,348,3.6


In [448]:
commercial_hourly_raises = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Hourly'].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(commercial_hourly_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,102,0.38
Female,white,80,0.94
Male,person of color,100,0.34
Male,white,63,0.45


In [449]:
commercial_hourly_raises_scores = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Hourly'].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_hourly_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,102,3.3
Female,white,80,3.5
Male,person of color,100,3.2
Male,white,63,3.3


### Regression

In [450]:
commercial_salaried_regression = commercial_salaried[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
commercial_salaried_regression = pd.get_dummies(commercial_salaried_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [451]:
commercial_salaried_regression = commercial_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model41 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result41 = model41.fit()
result41.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.011
Model:,OLS,Adj. R-squared:,-0.005
Method:,Least Squares,F-statistic:,0.6759
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.511
Time:,20:32:01,Log-Likelihood:,-1494.7
No. Observations:,126,AIC:,2995.0
Df Residuals:,123,BIC:,3004.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.689e+04,2201.909,30.377,0.000,6.25e+04,7.12e+04
gender_Female,3.728e+04,3095.378,12.045,0.000,3.12e+04,4.34e+04
gender_Male,2.96e+04,3828.922,7.732,0.000,2.2e+04,3.72e+04

0,1,2,3
Omnibus:,23.432,Durbin-Watson:,1.617
Prob(Omnibus):,0.0,Jarque-Bera (JB):,30.119
Skew:,1.176,Prob(JB):,2.88e-07
Kurtosis:,3.456,Cond. No.,1210000000000000.0


In [452]:
model42 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result42 = model42.fit()
result42.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.099
Model:,OLS,Adj. R-squared:,0.085
Method:,Least Squares,F-statistic:,6.794
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.00159
Time:,20:32:01,Log-Likelihood:,-1488.8
No. Observations:,126,AIC:,2984.0
Df Residuals:,123,BIC:,2992.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.04e+05,3.31e+04,3.137,0.002,3.84e+04,1.7e+05
race_grouping_white,4738.6158,3.33e+04,0.142,0.887,-6.12e+04,7.07e+04
race_grouping_person_of_color,-1.92e+04,3.36e+04,-0.571,0.569,-8.57e+04,4.73e+04

0,1,2,3
Omnibus:,20.566,Durbin-Watson:,1.586
Prob(Omnibus):,0.0,Jarque-Bera (JB):,25.036
Skew:,1.063,Prob(JB):,3.66e-06
Kurtosis:,3.497,Cond. No.,24.7


In [453]:
model43 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result43 = model43.fit()
result43.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.104
Model:,OLS,Adj. R-squared:,0.082
Method:,Least Squares,F-statistic:,4.745
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.00364
Time:,20:32:01,Log-Likelihood:,-1488.4
No. Observations:,126,AIC:,2985.0
Df Residuals:,122,BIC:,2996.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.758e+04,2.22e+04,3.040,0.003,2.36e+04,1.12e+05
gender_Female,3.642e+04,1.13e+04,3.233,0.002,1.41e+04,5.87e+04
gender_Male,3.116e+04,1.18e+04,2.630,0.010,7709.059,5.46e+04
race_grouping_white,6291.0373,3.34e+04,0.188,0.851,-5.99e+04,7.25e+04
race_grouping_person_of_color,-1.707e+04,3.37e+04,-0.506,0.614,-8.38e+04,4.97e+04

0,1,2,3
Omnibus:,19.907,Durbin-Watson:,1.581
Prob(Omnibus):,0.0,Jarque-Bera (JB):,23.987
Skew:,1.042,Prob(JB):,6.18e-06
Kurtosis:,3.479,Cond. No.,1440000000000000.0


In [454]:
new_commercial_salaried_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_commercial_salaried_regression['predicted'] = result43.predict(new_commercial_salaried_regression)
new_commercial_salaried_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,110291.04
1,0,1,1,0,40,105036.69
2,1,0,0,1,40,86932.53
3,0,1,0,1,40,81678.18


In [455]:
model44 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result44 = model44.fit()
result44.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.243
Model:,OLS,Adj. R-squared:,0.177
Method:,Least Squares,F-statistic:,3.691
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000263
Time:,20:32:01,Log-Likelihood:,-1477.8
No. Observations:,126,AIC:,2978.0
Df Residuals:,115,BIC:,3009.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.431e+04,2243.933,28.660,0.000,5.99e+04,6.88e+04
gender_Female,3.499e+04,3475.108,10.068,0.000,2.81e+04,4.19e+04
gender_Male,2.932e+04,3765.430,7.787,0.000,2.19e+04,3.68e+04
age_group_5_25_under,-1.534e+04,1.21e+04,-1.263,0.209,-3.94e+04,8720.201
age_group_5_25to29,-1.306e+04,5801.246,-2.252,0.026,-2.46e+04,-1573.642
age_group_5_30to34,8777.5553,8641.844,1.016,0.312,-8340.274,2.59e+04
age_group_5_35to39,2.904e+04,7399.029,3.925,0.000,1.44e+04,4.37e+04
age_group_5_40to44,3.197e+04,1.2e+04,2.654,0.009,8105.201,5.58e+04
age_group_5_45to49,2.064e+04,9156.128,2.254,0.026,2503.649,3.88e+04

0,1,2,3
Omnibus:,8.578,Durbin-Watson:,1.762
Prob(Omnibus):,0.014,Jarque-Bera (JB):,8.412
Skew:,0.612,Prob(JB):,0.0149
Kurtosis:,3.321,Cond. No.,4760000000000000.0


In [456]:
model45 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result45 = model45.fit()
result45.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.346
Model:,OLS,Adj. R-squared:,0.282
Method:,Least Squares,F-statistic:,5.473
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,5.85e-07
Time:,20:32:01,Log-Likelihood:,-1468.7
No. Observations:,126,AIC:,2961.0
Df Residuals:,114,BIC:,2995.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,9.203e+04,2.77e+04,3.327,0.001,3.72e+04,1.47e+05
race_grouping_white,9848.9444,3.05e+04,0.323,0.747,-5.06e+04,7.02e+04
race_grouping_person_of_color,-1.569e+04,3.08e+04,-0.509,0.611,-7.67e+04,4.53e+04
age_group_5_25_under,-1.744e+04,1.17e+04,-1.495,0.138,-4.05e+04,5665.917
age_group_5_25to29,-8438.9348,6003.368,-1.406,0.163,-2.03e+04,3453.691
age_group_5_30to34,1.197e+04,7761.857,1.542,0.126,-3406.884,2.73e+04
age_group_5_35to39,3.035e+04,7435.141,4.082,0.000,1.56e+04,4.51e+04
age_group_5_40to44,3.507e+04,1.16e+04,3.013,0.003,1.2e+04,5.81e+04
age_group_5_45to49,3.036e+04,8837.574,3.435,0.001,1.29e+04,4.79e+04

0,1,2,3
Omnibus:,6.245,Durbin-Watson:,1.832
Prob(Omnibus):,0.044,Jarque-Bera (JB):,5.876
Skew:,0.517,Prob(JB):,0.053
Kurtosis:,3.22,Cond. No.,6360000000000000.0


In [457]:
model46 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result46 = model46.fit()
result46.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.346
Model:,OLS,Adj. R-squared:,0.277
Method:,Least Squares,F-statistic:,4.987
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.37e-06
Time:,20:32:01,Log-Likelihood:,-1468.6
No. Observations:,126,AIC:,2963.0
Df Residuals:,113,BIC:,3000.0
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.306e+04,1.91e+04,3.301,0.001,2.52e+04,1.01e+05
gender_Female,3.259e+04,9988.518,3.263,0.001,1.28e+04,5.24e+04
gender_Male,3.047e+04,1.02e+04,2.989,0.003,1.03e+04,5.07e+04
race_grouping_white,9953.6974,3.06e+04,0.325,0.746,-5.07e+04,7.06e+04
race_grouping_person_of_color,-1.533e+04,3.09e+04,-0.496,0.621,-7.66e+04,4.6e+04
age_group_5_25_under,-1.98e+04,1.16e+04,-1.704,0.091,-4.28e+04,3221.269
age_group_5_25to29,-1.176e+04,5856.156,-2.009,0.047,-2.34e+04,-162.446
age_group_5_30to34,8343.5878,8102.615,1.030,0.305,-7709.155,2.44e+04
age_group_5_35to39,2.712e+04,7267.676,3.732,0.000,1.27e+04,4.15e+04

0,1,2,3
Omnibus:,6.419,Durbin-Watson:,1.826
Prob(Omnibus):,0.04,Jarque-Bera (JB):,6.039
Skew:,0.522,Prob(JB):,0.0488
Kurtosis:,3.245,Cond. No.,6960000000000000.0


In [458]:
merit_raises_combined_salaried_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'Commercial') & (merit_raises_combined['pay_rate_type'] == 'Salaried')]
merit_raises_combined_salaried_regression = pd.get_dummies(merit_raises_combined_salaried_regression, columns=['gender','race_grouping','age_group_5'])

In [459]:
merit_raises_combined_salaried_regression = merit_raises_combined_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model47 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result47 = model47.fit()
result47.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.065
Model:,OLS,Adj. R-squared:,0.055
Method:,Least Squares,F-statistic:,6.387
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0132
Time:,20:32:01,Log-Likelihood:,-782.8
No. Observations:,94,AIC:,1570.0
Df Residuals:,92,BIC:,1575.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,982.3188,72.396,13.569,0.000,838.535,1126.103
gender_Female,765.5935,104.539,7.324,0.000,557.971,973.216
gender_Male,216.7253,123.602,1.753,0.083,-28.758,462.209

0,1,2,3
Omnibus:,53.058,Durbin-Watson:,2.115
Prob(Omnibus):,0.0,Jarque-Bera (JB):,163.539
Skew:,2.005,Prob(JB):,3.08e-36
Kurtosis:,8.067,Cond. No.,7780000000000000.0


In [460]:
model48 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result48 = model48.fit()
result48.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.022
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,1.02
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.365
Time:,20:32:01,Log-Likelihood:,-784.91
No. Observations:,94,AIC:,1576.0
Df Residuals:,91,BIC:,1583.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1400.0000,1040.448,1.346,0.182,-666.723,3466.723
race_grouping_white,281.2502,1050.038,0.268,0.789,-1804.521,2367.021
race_grouping_person_of_color,-29.3659,1053.703,-0.028,0.978,-2122.417,2063.686

0,1,2,3
Omnibus:,57.261,Durbin-Watson:,2.031
Prob(Omnibus):,0.0,Jarque-Bera (JB):,190.464
Skew:,2.155,Prob(JB):,4.38e-42
Kurtosis:,8.482,Cond. No.,20.7


In [461]:
model49 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result49 = model49.fit()
result49.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.075
Model:,OLS,Adj. R-squared:,0.044
Method:,Least Squares,F-statistic:,2.434
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.07
Time:,20:32:01,Log-Likelihood:,-782.29
No. Observations:,94,AIC:,1573.0
Df Residuals:,90,BIC:,1583.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,763.3484,682.375,1.119,0.266,-592.309,2119.005
gender_Female,636.6516,347.276,1.833,0.070,-53.274,1326.577
gender_Male,126.6968,370.633,0.342,0.733,-609.631,863.024
race_grouping_white,422.9043,1028.666,0.411,0.682,-1620.721,2466.529
race_grouping_person_of_color,219.0736,1036.139,0.211,0.833,-1839.398,2277.545

0,1,2,3
Omnibus:,51.713,Durbin-Watson:,2.131
Prob(Omnibus):,0.0,Jarque-Bera (JB):,156.955
Skew:,1.951,Prob(JB):,8.27e-35
Kurtosis:,7.985,Cond. No.,9860000000000000.0


In [462]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result49.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,1822.9
1,0,1,1,0,1312.95
2,1,0,0,1,1619.07
3,0,1,0,1,1109.12


In [463]:
model50 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result50 = model50.fit()
result50.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.127
Model:,OLS,Adj. R-squared:,0.033
Method:,Least Squares,F-statistic:,1.352
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.223
Time:,20:32:01,Log-Likelihood:,-779.6
No. Observations:,94,AIC:,1579.0
Df Residuals:,84,BIC:,1605.0
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,868.0815,101.906,8.518,0.000,665.431,1070.732
gender_Female,717.9315,122.709,5.851,0.000,473.912,961.951
gender_Male,150.1500,129.752,1.157,0.250,-107.877,408.177
age_group_5_25_under,5.5e-14,2.48e-13,0.222,0.825,-4.37e-13,5.47e-13
age_group_5_25to29,-161.2607,331.113,-0.487,0.628,-819.716,497.194
age_group_5_30to34,248.7116,255.316,0.974,0.333,-259.012,756.435
age_group_5_35to39,98.1318,297.637,0.330,0.742,-493.753,690.017
age_group_5_40to44,587.8483,295.562,1.989,0.050,0.091,1175.606
age_group_5_45to49,520.5110,339.894,1.531,0.129,-155.406,1196.428

0,1,2,3
Omnibus:,44.764,Durbin-Watson:,2.013
Prob(Omnibus):,0.0,Jarque-Bera (JB):,116.271
Skew:,1.73,Prob(JB):,5.65e-26
Kurtosis:,7.209,Cond. No.,7.67e+16


In [464]:
model51 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result51 = model51.fit()
result51.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.095
Model:,OLS,Adj. R-squared:,-0.014
Method:,Least Squares,F-statistic:,0.8693
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.565
Time:,20:32:01,Log-Likelihood:,-781.27
No. Observations:,94,AIC:,1585.0
Df Residuals:,83,BIC:,1613.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1499.1252,997.506,1.503,0.137,-484.875,3483.125
race_grouping_white,-52.3007,1105.092,-0.047,0.962,-2250.284,2145.683
race_grouping_person_of_color,-469.6346,1126.458,-0.417,0.678,-2710.115,1770.845
age_group_5_25_under,3.666e-13,6.68e-13,0.549,0.584,-9.61e-13,1.69e-12
age_group_5_25to29,-99.1252,340.998,-0.291,0.772,-777.356,579.106
age_group_5_30to34,371.9523,283.450,1.312,0.193,-191.819,935.723
age_group_5_35to39,176.8397,331.634,0.533,0.595,-482.766,836.446
age_group_5_40to44,701.2750,324.481,2.161,0.034,55.896,1346.654
age_group_5_45to49,685.5748,373.068,1.838,0.070,-56.442,1427.591

0,1,2,3
Omnibus:,49.612,Durbin-Watson:,1.991
Prob(Omnibus):,0.0,Jarque-Bera (JB):,143.081
Skew:,1.886,Prob(JB):,8.52e-32
Kurtosis:,7.722,Cond. No.,2.45e+16


In [465]:
model52 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result52 = model52.fit()
result52.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.148
Model:,OLS,Adj. R-squared:,0.034
Method:,Least Squares,F-statistic:,1.296
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.242
Time:,20:32:01,Log-Likelihood:,-778.42
No. Observations:,94,AIC:,1581.0
Df Residuals:,82,BIC:,1611.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,930.5934,672.984,1.383,0.170,-408.187,2269.374
gender_Female,729.0746,348.541,2.092,0.040,35.715,1422.434
gender_Male,201.5188,363.462,0.554,0.581,-521.523,924.561
race_grouping_white,57.5424,1079.680,0.053,0.958,-2090.284,2205.369
race_grouping_person_of_color,-293.2674,1102.198,-0.266,0.791,-2485.890,1899.355
age_group_5_25_under,-8.07e-14,1.3e-13,-0.620,0.537,-3.4e-13,1.78e-13
age_group_5_25to29,-259.6680,338.569,-0.767,0.445,-933.190,413.854
age_group_5_30to34,208.1681,271.233,0.767,0.445,-331.400,747.736
age_group_5_35to39,185.0484,315.193,0.587,0.559,-441.972,812.069

0,1,2,3
Omnibus:,42.404,Durbin-Watson:,2.058
Prob(Omnibus):,0.0,Jarque-Bera (JB):,109.02
Skew:,1.628,Prob(JB):,2.12e-24
Kurtosis:,7.152,Cond. No.,1.49e+17


In [466]:
model53 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result53 = model53.fit()
result53.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.004
Model:,OLS,Adj. R-squared:,-0.007
Method:,Least Squares,F-statistic:,0.3292
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.568
Time:,20:32:01,Log-Likelihood:,-26.919
No. Observations:,92,AIC:,57.84
Df Residuals:,90,BIC:,62.88
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3005,0.024,96.855,0.000,2.253,2.348
gender_Female,1.1707,0.034,34.208,0.000,1.103,1.239
gender_Male,1.1298,0.041,27.818,0.000,1.049,1.210

0,1,2,3
Omnibus:,2.489,Durbin-Watson:,1.823
Prob(Omnibus):,0.288,Jarque-Bera (JB):,2.472
Skew:,0.383,Prob(JB):,0.291
Kurtosis:,2.757,Cond. No.,2740000000000000.0


In [467]:
model54 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result54 = model54.fit()
result54.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.005
Model:,OLS,Adj. R-squared:,-0.018
Method:,Least Squares,F-statistic:,0.2113
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.81
Time:,20:32:02,Log-Likelihood:,-26.869
No. Observations:,92,AIC:,59.74
Df Residuals:,89,BIC:,67.3
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.4000,0.329,10.320,0.000,2.745,4.055
race_grouping_white,0.0755,0.333,0.227,0.821,-0.585,0.736
race_grouping_person_of_color,0.0316,0.334,0.095,0.925,-0.632,0.695

0,1,2,3
Omnibus:,2.415,Durbin-Watson:,1.849
Prob(Omnibus):,0.299,Jarque-Bera (JB):,2.41
Skew:,0.374,Prob(JB):,0.3
Kurtosis:,2.74,Cond. No.,20.5


In [468]:
model55 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result55 = model55.fit()
result55.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.007
Model:,OLS,Adj. R-squared:,-0.027
Method:,Least Squares,F-statistic:,0.2059
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.892
Time:,20:32:02,Log-Likelihood:,-26.765
No. Observations:,92,AIC:,61.53
Df Residuals:,88,BIC:,71.62
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.2556,0.222,10.159,0.000,1.814,2.697
gender_Female,1.1444,0.113,10.121,0.000,0.920,1.369
gender_Male,1.1112,0.121,9.188,0.000,0.871,1.352
race_grouping_white,0.0842,0.335,0.252,0.802,-0.581,0.749
race_grouping_person_of_color,0.0482,0.337,0.143,0.887,-0.622,0.719

0,1,2,3
Omnibus:,2.301,Durbin-Watson:,1.833
Prob(Omnibus):,0.316,Jarque-Bera (JB):,2.285
Skew:,0.368,Prob(JB):,0.319
Kurtosis:,2.764,Cond. No.,3220000000000000.0


In [469]:
model56 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result56 = model56.fit()
result56.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.139
Model:,OLS,Adj. R-squared:,0.045
Method:,Least Squares,F-statistic:,1.477
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.17
Time:,20:32:02,Log-Likelihood:,-20.177
No. Observations:,92,AIC:,60.35
Df Residuals:,82,BIC:,85.57
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1485,0.032,67.444,0.000,2.085,2.212
gender_Female,1.0953,0.038,28.512,0.000,1.019,1.172
gender_Male,1.0532,0.041,25.866,0.000,0.972,1.134
age_group_5_25_under,1.693e-16,9.48e-17,1.787,0.078,-1.92e-17,3.58e-16
age_group_5_25to29,0.0547,0.103,0.529,0.598,-0.151,0.260
age_group_5_30to34,0.2218,0.081,2.748,0.007,0.061,0.382
age_group_5_35to39,0.2141,0.095,2.245,0.027,0.024,0.404
age_group_5_40to44,0.2725,0.092,2.954,0.004,0.089,0.456
age_group_5_45to49,0.2083,0.106,1.964,0.053,-0.003,0.419

0,1,2,3
Omnibus:,4.627,Durbin-Watson:,1.906
Prob(Omnibus):,0.099,Jarque-Bera (JB):,4.258
Skew:,0.526,Prob(JB):,0.119
Kurtosis:,3.077,Cond. No.,8.8e+16


In [470]:
model57 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result57 = model57.fit()
result57.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.14
Model:,OLS,Adj. R-squared:,0.033
Method:,Least Squares,F-statistic:,1.313
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.237
Time:,20:32:02,Log-Likelihood:,-20.174
No. Observations:,92,AIC:,62.35
Df Residuals:,81,BIC:,90.09
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.2369,0.306,10.589,0.000,2.629,3.845
race_grouping_white,-0.1183,0.339,-0.349,0.728,-0.792,0.555
race_grouping_person_of_color,-0.1539,0.346,-0.445,0.657,-0.841,0.534
age_group_5_25_under,1.298e-15,3.56e-16,3.649,0.000,5.9e-16,2.01e-15
age_group_5_25to29,0.1631,0.105,1.557,0.123,-0.045,0.371
age_group_5_30to34,0.3452,0.088,3.904,0.000,0.169,0.521
age_group_5_35to39,0.3414,0.105,3.240,0.002,0.132,0.551
age_group_5_40to44,0.3978,0.099,3.999,0.000,0.200,0.596
age_group_5_45to49,0.3384,0.114,2.958,0.004,0.111,0.566

0,1,2,3
Omnibus:,4.713,Durbin-Watson:,1.923
Prob(Omnibus):,0.095,Jarque-Bera (JB):,4.379
Skew:,0.534,Prob(JB):,0.112
Kurtosis:,3.056,Cond. No.,9.97e+16


In [471]:
model58 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result58 = model58.fit()
result58.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.142
Model:,OLS,Adj. R-squared:,0.024
Method:,Least Squares,F-statistic:,1.205
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.297
Time:,20:32:02,Log-Likelihood:,-20.034
No. Observations:,92,AIC:,64.07
Df Residuals:,80,BIC:,94.33
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.2248,0.212,10.477,0.000,1.802,2.647
gender_Female,1.1308,0.110,10.285,0.000,0.912,1.350
gender_Male,1.0941,0.115,9.517,0.000,0.865,1.323
race_grouping_white,-0.1108,0.341,-0.325,0.746,-0.788,0.567
race_grouping_person_of_color,-0.1404,0.348,-0.403,0.688,-0.833,0.553
age_group_5_25_under,-1.047e-17,1.66e-16,-0.063,0.950,-3.41e-16,3.2e-16
age_group_5_25to29,0.0444,0.107,0.415,0.679,-0.168,0.257
age_group_5_30to34,0.2267,0.087,2.609,0.011,0.054,0.400
age_group_5_35to39,0.2321,0.103,2.254,0.027,0.027,0.437

0,1,2,3
Omnibus:,4.573,Durbin-Watson:,1.918
Prob(Omnibus):,0.102,Jarque-Bera (JB):,4.209
Skew:,0.523,Prob(JB):,0.122
Kurtosis:,3.071,Cond. No.,6.67e+16


In [472]:
commercial_hourly_regression = commercial_hourly[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
commercial_hourly_regression = pd.get_dummies(commercial_hourly_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [473]:
commercial_hourly_regression = commercial_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model59 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result59 = model59.fit()
result59.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.091
Model:,OLS,Adj. R-squared:,0.084
Method:,Least Squares,F-statistic:,13.47
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000348
Time:,20:32:02,Log-Likelihood:,-435.67
No. Observations:,137,AIC:,875.3
Df Residuals:,135,BIC:,881.2
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,19.9208,0.335,59.396,0.000,19.258,20.584
gender_Female,11.8069,0.515,22.927,0.000,10.788,12.825
gender_Male,8.1139,0.545,14.883,0.000,7.036,9.192

0,1,2,3
Omnibus:,7.871,Durbin-Watson:,1.51
Prob(Omnibus):,0.02,Jarque-Bera (JB):,7.596
Skew:,0.511,Prob(JB):,0.0224
Kurtosis:,3.536,Cond. No.,2250000000000000.0


In [474]:
model60 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result60 = model60.fit()
result60.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.101
Model:,OLS,Adj. R-squared:,0.087
Method:,Least Squares,F-statistic:,7.504
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000814
Time:,20:32:02,Log-Likelihood:,-434.91
No. Observations:,137,AIC:,875.8
Df Residuals:,134,BIC:,884.6
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,22.9600,4.138,5.549,0.000,14.777,31.143
race_grouping_white,9.8456,4.237,2.324,0.022,1.465,18.226
race_grouping_person_of_color,6.0483,4.181,1.447,0.150,-2.222,14.318

0,1,2,3
Omnibus:,15.499,Durbin-Watson:,1.38
Prob(Omnibus):,0.0,Jarque-Bera (JB):,17.712
Skew:,0.742,Prob(JB):,0.000143
Kurtosis:,3.949,Cond. No.,18.1


In [475]:
model61 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result61 = model61.fit()
result61.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.181
Model:,OLS,Adj. R-squared:,0.162
Method:,Least Squares,F-statistic:,9.778
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,7.12e-06
Time:,20:32:02,Log-Likelihood:,-428.53
No. Observations:,137,AIC:,865.1
Df Residuals:,133,BIC:,876.7
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,16.4729,2.662,6.187,0.000,11.207,21.739
gender_Female,9.9858,1.471,6.787,0.000,7.075,12.896
gender_Male,6.4871,1.360,4.768,0.000,3.796,9.178
race_grouping_white,7.8829,4.096,1.925,0.056,-0.219,15.985
race_grouping_person_of_color,4.1128,4.042,1.018,0.311,-3.882,12.107

0,1,2,3
Omnibus:,11.81,Durbin-Watson:,1.592
Prob(Omnibus):,0.003,Jarque-Bera (JB):,12.735
Skew:,0.617,Prob(JB):,0.00172
Kurtosis:,3.842,Cond. No.,2770000000000000.0


In [476]:
new_commercial_hourly_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_commercial_hourly_regression['predicted'] = result61.predict(new_commercial_hourly_regression)
new_commercial_hourly_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,34.34
1,0,1,1,0,40,30.84
2,1,0,0,1,40,30.57
3,0,1,0,1,40,27.07


In [477]:
model62 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result62 = model62.fit()
result62.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.156
Model:,OLS,Adj. R-squared:,0.089
Method:,Least Squares,F-statistic:,2.323
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0152
Time:,20:32:02,Log-Likelihood:,-430.59
No. Observations:,137,AIC:,883.2
Df Residuals:,126,BIC:,915.3
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,18.5330,0.342,54.261,0.000,17.857,19.209
gender_Female,10.9683,0.533,20.572,0.000,9.913,12.023
gender_Male,7.5647,0.558,13.550,0.000,6.460,8.669
age_group_5_25_under,0.4121,2.083,0.198,0.843,-3.710,4.534
age_group_5_25to29,3.4150,1.125,3.037,0.003,1.189,5.641
age_group_5_30to34,4.1637,1.557,2.675,0.008,1.083,7.244
age_group_5_35to39,4.5093,1.933,2.333,0.021,0.685,8.334
age_group_5_40to44,2.0586,1.619,1.272,0.206,-1.145,5.262
age_group_5_45to49,2.5494,1.313,1.942,0.054,-0.048,5.147

0,1,2,3
Omnibus:,5.241,Durbin-Watson:,1.586
Prob(Omnibus):,0.073,Jarque-Bera (JB):,4.725
Skew:,0.4,Prob(JB):,0.0942
Kurtosis:,3.434,Cond. No.,4010000000000000.0


In [478]:
model63 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result63 = model63.fit()
result63.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.185
Model:,OLS,Adj. R-squared:,0.113
Method:,Least Squares,F-statistic:,2.583
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.00547
Time:,20:32:02,Log-Likelihood:,-428.16
No. Observations:,137,AIC:,880.3
Df Residuals:,125,BIC:,915.4
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,20.3525,3.811,5.341,0.000,12.810,27.895
race_grouping_white,10.2349,4.254,2.406,0.018,1.817,18.653
race_grouping_person_of_color,6.3766,4.231,1.507,0.134,-1.998,14.751
age_group_5_25_under,1.0445,2.080,0.502,0.617,-3.073,5.162
age_group_5_25to29,3.4080,1.152,2.959,0.004,1.129,5.687
age_group_5_30to34,4.9982,1.588,3.148,0.002,1.856,8.141
age_group_5_35to39,5.0974,1.965,2.594,0.011,1.208,8.987
age_group_5_40to44,1.8070,1.587,1.139,0.257,-1.333,4.947
age_group_5_45to49,3.1001,1.371,2.261,0.025,0.387,5.814

0,1,2,3
Omnibus:,9.273,Durbin-Watson:,1.467
Prob(Omnibus):,0.01,Jarque-Bera (JB):,9.459
Skew:,0.533,Prob(JB):,0.00883
Kurtosis:,3.722,Cond. No.,4700000000000000.0


In [479]:
model64 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result64 = model64.fit()
result64.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.251
Model:,OLS,Adj. R-squared:,0.179
Method:,Least Squares,F-statistic:,3.466
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000202
Time:,20:32:02,Log-Likelihood:,-422.37
No. Observations:,137,AIC:,870.7
Df Residuals:,124,BIC:,908.7
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,14.9662,2.539,5.895,0.000,9.941,19.991
gender_Female,9.1237,1.416,6.445,0.000,6.322,11.926
gender_Male,5.8425,1.308,4.466,0.000,3.253,8.432
race_grouping_white,8.5728,4.125,2.078,0.040,0.409,16.737
race_grouping_person_of_color,4.6437,4.106,1.131,0.260,-3.484,12.771
age_group_5_25_under,-0.5216,2.000,-0.261,0.795,-4.479,3.436
age_group_5_25to29,2.4806,1.094,2.266,0.025,0.314,4.647
age_group_5_30to34,3.9869,1.506,2.648,0.009,1.006,6.968
age_group_5_35to39,4.7246,1.869,2.527,0.013,1.025,8.424

0,1,2,3
Omnibus:,8.226,Durbin-Watson:,1.675
Prob(Omnibus):,0.016,Jarque-Bera (JB):,8.103
Skew:,0.506,Prob(JB):,0.0174
Kurtosis:,3.628,Cond. No.,5470000000000000.0


In [480]:
merit_raises_combined_hourly_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'Commercial') & (merit_raises_combined['pay_rate_type'] == 'Hourly')]
merit_raises_combined_hourly_regression = pd.get_dummies(merit_raises_combined_hourly_regression, columns=['gender','race_grouping','age_group_5'])

In [481]:
merit_raises_combined_hourly_regression = merit_raises_combined_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model65 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result65 = model65.fit()
result65.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.057
Model:,OLS,Adj. R-squared:,0.049
Method:,Least Squares,F-statistic:,7.107
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.00101
Time:,20:32:02,Log-Likelihood:,34.616
No. Observations:,240,AIC:,-63.23
Df Residuals:,237,BIC:,-52.79
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-4.561e+11,3.07e+12,-0.148,0.882,-6.51e+12,5.6e+12
gender_Female,4.561e+11,3.07e+12,0.148,0.882,-5.6e+12,6.51e+12
gender_Male,4.561e+11,3.07e+12,0.148,0.882,-5.6e+12,6.51e+12

0,1,2,3
Omnibus:,112.16,Durbin-Watson:,1.893
Prob(Omnibus):,0.0,Jarque-Bera (JB):,489.149
Skew:,1.908,Prob(JB):,6.06e-107
Kurtosis:,8.862,Cond. No.,479000000000000.0


In [482]:
model66 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result66 = model66.fit()
result66.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.038
Model:,OLS,Adj. R-squared:,0.03
Method:,Least Squares,F-statistic:,4.669
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0103
Time:,20:32:02,Log-Likelihood:,32.263
No. Observations:,240,AIC:,-58.53
Df Residuals:,237,BIC:,-48.08
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2777,0.010,27.077,0.000,0.258,0.298
race_grouping_white,0.1859,0.018,10.170,0.000,0.150,0.222
race_grouping_person_of_color,0.0919,0.014,6.628,0.000,0.065,0.119

0,1,2,3
Omnibus:,106.312,Durbin-Watson:,1.948
Prob(Omnibus):,0.0,Jarque-Bera (JB):,402.969
Skew:,1.857,Prob(JB):,3.14e-88
Kurtosis:,8.148,Cond. No.,1020000000000000.0


In [483]:
model67 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result67 = model67.fit()
result67.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.1
Model:,OLS,Adj. R-squared:,0.088
Method:,Least Squares,F-statistic:,8.726
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.64e-05
Time:,20:32:02,Log-Likelihood:,40.249
No. Observations:,240,AIC:,-72.5
Df Residuals:,236,BIC:,-58.58
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-2.4e+11,2.61e+12,-0.092,0.927,-5.39e+12,4.91e+12
gender_Female,2.907e+11,3.16e+12,0.092,0.927,-5.94e+12,6.52e+12
gender_Male,2.907e+11,3.16e+12,0.092,0.927,-5.94e+12,6.52e+12
race_grouping_white,-5.07e+10,5.52e+11,-0.092,0.927,-1.14e+12,1.04e+12
race_grouping_person_of_color,-5.07e+10,5.52e+11,-0.092,0.927,-1.14e+12,1.04e+12

0,1,2,3
Omnibus:,94.745,Durbin-Watson:,1.98
Prob(Omnibus):,0.0,Jarque-Bera (JB):,324.494
Skew:,1.67,Prob(JB):,3.44e-71
Kurtosis:,7.614,Cond. No.,1040000000000000.0


In [484]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result67.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,0.52
1,0,1,1,0,0.42
2,1,0,0,1,0.42
3,0,1,0,1,0.32


In [485]:
model68 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result68 = model68.fit()
result68.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.136
Model:,OLS,Adj. R-squared:,0.098
Method:,Least Squares,F-statistic:,3.594
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000188
Time:,20:32:02,Log-Likelihood:,45.12
No. Observations:,240,AIC:,-68.24
Df Residuals:,229,BIC:,-29.95
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-2.181e+11,2.76e+12,-0.079,0.937,-5.66e+12,5.23e+12
gender_Female,2.386e+11,3.02e+12,0.079,0.937,-5.72e+12,6.19e+12
gender_Male,2.386e+11,3.02e+12,0.079,0.937,-5.72e+12,6.19e+12
age_group_5_25_under,-2.051e+10,2.6e+11,-0.079,0.937,-5.33e+11,4.92e+11
age_group_5_25to29,-2.051e+10,2.6e+11,-0.079,0.937,-5.33e+11,4.92e+11
age_group_5_30to34,-2.051e+10,2.6e+11,-0.079,0.937,-5.33e+11,4.92e+11
age_group_5_35to39,-2.051e+10,2.6e+11,-0.079,0.937,-5.33e+11,4.92e+11
age_group_5_40to44,-2.051e+10,2.6e+11,-0.079,0.937,-5.33e+11,4.92e+11
age_group_5_45to49,-2.051e+10,2.6e+11,-0.079,0.937,-5.33e+11,4.92e+11

0,1,2,3
Omnibus:,100.582,Durbin-Watson:,1.864
Prob(Omnibus):,0.0,Jarque-Bera (JB):,390.178
Skew:,1.727,Prob(JB):,1.88e-85
Kurtosis:,8.204,Cond. No.,2820000000000000.0


In [486]:
model69 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result69 = model69.fit()
result69.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.119
Model:,OLS,Adj. R-squared:,0.081
Method:,Least Squares,F-statistic:,3.105
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000988
Time:,20:32:02,Log-Likelihood:,42.885
No. Observations:,240,AIC:,-63.77
Df Residuals:,229,BIC:,-25.48
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.727e+12,4.54e+12,0.381,0.704,-7.21e+12,1.07e+13
race_grouping_white,-2.165e+12,5.69e+12,-0.381,0.704,-1.34e+13,9.04e+12
race_grouping_person_of_color,-2.165e+12,5.69e+12,-0.381,0.704,-1.34e+13,9.04e+12
age_group_5_25_under,4.382e+11,1.15e+12,0.381,0.704,-1.83e+12,2.71e+12
age_group_5_25to29,4.382e+11,1.15e+12,0.381,0.704,-1.83e+12,2.71e+12
age_group_5_30to34,4.382e+11,1.15e+12,0.381,0.704,-1.83e+12,2.71e+12
age_group_5_35to39,4.382e+11,1.15e+12,0.381,0.704,-1.83e+12,2.71e+12
age_group_5_40to44,4.382e+11,1.15e+12,0.381,0.704,-1.83e+12,2.71e+12
age_group_5_45to49,4.382e+11,1.15e+12,0.381,0.704,-1.83e+12,2.71e+12

0,1,2,3
Omnibus:,94.296,Durbin-Watson:,1.922
Prob(Omnibus):,0.0,Jarque-Bera (JB):,322.52
Skew:,1.662,Prob(JB):,9.24e-71
Kurtosis:,7.605,Cond. No.,3550000000000000.0


In [487]:
model70 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result70 = model70.fit()
result70.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.166
Model:,OLS,Adj. R-squared:,0.125
Method:,Least Squares,F-statistic:,4.113
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.57e-05
Time:,20:32:02,Log-Likelihood:,49.35
No. Observations:,240,AIC:,-74.7
Df Residuals:,228,BIC:,-32.93
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-4.648e+11,2.37e+12,-0.196,0.845,-5.14e+12,4.21e+12
gender_Female,6.083e+11,3.1e+12,0.196,0.845,-5.51e+12,6.72e+12
gender_Male,6.083e+11,3.1e+12,0.196,0.845,-5.51e+12,6.72e+12
race_grouping_white,-1.043e+11,5.32e+11,-0.196,0.845,-1.15e+12,9.45e+11
race_grouping_person_of_color,-1.043e+11,5.32e+11,-0.196,0.845,-1.15e+12,9.45e+11
age_group_5_25_under,-3.917e+10,2e+11,-0.196,0.845,-4.33e+11,3.55e+11
age_group_5_25to29,-3.917e+10,2e+11,-0.196,0.845,-4.33e+11,3.55e+11
age_group_5_30to34,-3.917e+10,2e+11,-0.196,0.845,-4.33e+11,3.55e+11
age_group_5_35to39,-3.917e+10,2e+11,-0.196,0.845,-4.33e+11,3.55e+11

0,1,2,3
Omnibus:,88.933,Durbin-Watson:,1.924
Prob(Omnibus):,0.0,Jarque-Bera (JB):,290.286
Skew:,1.576,Prob(JB):,9.230000000000001e-64
Kurtosis:,7.369,Cond. No.,3980000000000000.0


In [488]:
model71 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result71 = model71.fit()
result71.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,-0.054
Model:,OLS,Adj. R-squared:,-0.063
Method:,Least Squares,F-statistic:,-6.036
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.0
Time:,20:32:03,Log-Likelihood:,-20.79
No. Observations:,239,AIC:,47.58
Df Residuals:,236,BIC:,58.01
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,4.178e+12,5.09e+12,0.821,0.412,-5.85e+12,1.42e+13
gender_Female,-4.178e+12,5.09e+12,-0.821,0.412,-1.42e+13,5.85e+12
gender_Male,-4.178e+12,5.09e+12,-0.821,0.412,-1.42e+13,5.85e+12

0,1,2,3
Omnibus:,14.859,Durbin-Watson:,1.508
Prob(Omnibus):,0.001,Jarque-Bera (JB):,16.505
Skew:,0.642,Prob(JB):,0.000261
Kurtosis:,2.907,Cond. No.,628000000000000.0


In [489]:
model72 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result72 = model72.fit()
result72.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,-0.023
Model:,OLS,Adj. R-squared:,-0.031
Method:,Least Squares,F-statistic:,-2.609
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,1.0
Time:,20:32:03,Log-Likelihood:,-17.186
No. Observations:,239,AIC:,40.37
Df Residuals:,236,BIC:,50.8
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.618e+12,5.04e+12,1.115,0.266,-4.3e+12,1.55e+13
race_grouping_white,-5.618e+12,5.04e+12,-1.115,0.266,-1.55e+13,4.3e+12
race_grouping_person_of_color,-5.618e+12,5.04e+12,-1.115,0.266,-1.55e+13,4.3e+12

0,1,2,3
Omnibus:,13.647,Durbin-Watson:,1.601
Prob(Omnibus):,0.001,Jarque-Bera (JB):,14.973
Skew:,0.604,Prob(JB):,0.00056
Kurtosis:,2.788,Cond. No.,659000000000000.0


In [490]:
model73 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result73 = model73.fit()
result73.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.069
Model:,OLS,Adj. R-squared:,0.057
Method:,Least Squares,F-statistic:,5.825
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.000744
Time:,20:32:03,Log-Likelihood:,-5.943
No. Observations:,239,AIC:,19.89
Df Residuals:,235,BIC:,33.79
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.635e+12,4.99e+12,0.729,0.467,-6.19e+12,1.35e+13
gender_Female,-1.775e+12,2.43e+12,-0.729,0.467,-6.57e+12,3.02e+12
gender_Male,-1.775e+12,2.43e+12,-0.729,0.467,-6.57e+12,3.02e+12
race_grouping_white,-1.861e+12,2.55e+12,-0.729,0.467,-6.89e+12,3.17e+12
race_grouping_person_of_color,-1.861e+12,2.55e+12,-0.729,0.467,-6.89e+12,3.17e+12

0,1,2,3
Omnibus:,16.513,Durbin-Watson:,1.714
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18.001
Skew:,0.661,Prob(JB):,0.000123
Kurtosis:,3.248,Cond. No.,1720000000000000.0


In [491]:
model74 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result74 = model74.fit()
result74.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.015
Model:,OLS,Adj. R-squared:,-0.028
Method:,Least Squares,F-statistic:,0.3433
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.968
Time:,20:32:03,Log-Likelihood:,-12.729
No. Observations:,239,AIC:,47.46
Df Residuals:,228,BIC:,85.7
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,4.753e+12,4.48e+12,1.061,0.290,-4.07e+12,1.36e+13
gender_Female,-5.392e+12,5.08e+12,-1.061,0.290,-1.54e+13,4.62e+12
gender_Male,-5.392e+12,5.08e+12,-1.061,0.290,-1.54e+13,4.62e+12
age_group_5_25_under,6.392e+11,6.02e+11,1.061,0.290,-5.48e+11,1.83e+12
age_group_5_25to29,6.392e+11,6.02e+11,1.061,0.290,-5.48e+11,1.83e+12
age_group_5_30to34,6.392e+11,6.02e+11,1.061,0.290,-5.48e+11,1.83e+12
age_group_5_35to39,6.392e+11,6.02e+11,1.061,0.290,-5.48e+11,1.83e+12
age_group_5_40to44,6.392e+11,6.02e+11,1.061,0.290,-5.48e+11,1.83e+12
age_group_5_45to49,6.392e+11,6.02e+11,1.061,0.290,-5.48e+11,1.83e+12

0,1,2,3
Omnibus:,9.682,Durbin-Watson:,1.513
Prob(Omnibus):,0.008,Jarque-Bera (JB):,10.171
Skew:,0.504,Prob(JB):,0.00619
Kurtosis:,2.942,Cond. No.,3410000000000000.0


In [492]:
model75 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result75 = model75.fit()
result75.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.098
Model:,OLS,Adj. R-squared:,0.059
Method:,Least Squares,F-statistic:,2.481
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0077
Time:,20:32:03,Log-Likelihood:,-2.1689
No. Observations:,239,AIC:,26.34
Df Residuals:,228,BIC:,64.58
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.591e+12,4.41e+12,1.495,0.136,-2.09e+12,1.53e+13
race_grouping_white,-7.266e+12,4.86e+12,-1.495,0.136,-1.68e+13,2.31e+12
race_grouping_person_of_color,-7.266e+12,4.86e+12,-1.495,0.136,-1.68e+13,2.31e+12
age_group_5_25_under,6.747e+11,4.51e+11,1.495,0.136,-2.14e+11,1.56e+12
age_group_5_25to29,6.747e+11,4.51e+11,1.495,0.136,-2.14e+11,1.56e+12
age_group_5_30to34,6.747e+11,4.51e+11,1.495,0.136,-2.14e+11,1.56e+12
age_group_5_35to39,6.747e+11,4.51e+11,1.495,0.136,-2.14e+11,1.56e+12
age_group_5_40to44,6.747e+11,4.51e+11,1.495,0.136,-2.14e+11,1.56e+12
age_group_5_45to49,6.747e+11,4.51e+11,1.495,0.136,-2.14e+11,1.56e+12

0,1,2,3
Omnibus:,8.581,Durbin-Watson:,1.662
Prob(Omnibus):,0.014,Jarque-Bera (JB):,8.884
Skew:,0.472,Prob(JB):,0.0118
Kurtosis:,2.962,Cond. No.,3390000000000000.0


In [493]:
model76 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result76 = model76.fit()
result76.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.119
Model:,OLS,Adj. R-squared:,0.076
Method:,Least Squares,F-statistic:,2.776
Date:,"Tue, 12 Apr 2022",Prob (F-statistic):,0.0021
Time:,20:32:03,Log-Likelihood:,0.56895
No. Observations:,239,AIC:,22.86
Df Residuals:,227,BIC:,64.58
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.021e+12,4.47e+12,1.124,0.262,-3.78e+12,1.38e+13
gender_Female,-2.754e+12,2.45e+12,-1.124,0.262,-7.58e+12,2.07e+12
gender_Male,-2.754e+12,2.45e+12,-1.124,0.262,-7.58e+12,2.07e+12
race_grouping_white,-2.841e+12,2.53e+12,-1.124,0.262,-7.82e+12,2.14e+12
race_grouping_person_of_color,-2.841e+12,2.53e+12,-1.124,0.262,-7.82e+12,2.14e+12
age_group_5_25_under,5.731e+11,5.1e+11,1.124,0.262,-4.32e+11,1.58e+12
age_group_5_25to29,5.731e+11,5.1e+11,1.124,0.262,-4.32e+11,1.58e+12
age_group_5_30to34,5.731e+11,5.1e+11,1.124,0.262,-4.32e+11,1.58e+12
age_group_5_35to39,5.731e+11,5.1e+11,1.124,0.262,-4.32e+11,1.58e+12

0,1,2,3
Omnibus:,12.228,Durbin-Watson:,1.654
Prob(Omnibus):,0.002,Jarque-Bera (JB):,12.731
Skew:,0.556,Prob(JB):,0.00172
Kurtosis:,3.202,Cond. No.,6620000000000000.0
