-Next step: relate job satisfaction to (un)fair treatment
-Does being treated unfairly affect job satisfaction? Is this relation dependent on the level of negative reciprocity?

Again, first read in relevant datasets and variables: reciprocity measures from 2005 and outcome and controls from 2006.

In [4]:
import pandas as pd
import numpy as np
import seaborn as sns
import math

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols

In [5]:
# define path: insert the path where the SOEP data is stored on your computer here
from pathlib import Path
data_folder = Path("/Volumes/dohmen_soep/SOEP-CORE.v36eu_STATA/Stata/raw")
# define relevant subsets of SOEP-data, vp: 2005, wp: 2006
file_names = ['vp', 'wp']

file_paths = [data_folder / f"{file_name}.dta" for file_name in file_names]
# some controls are in gen data
file_paths_2 = [data_folder / f"{file_name}gen.dta" for file_name in file_names]

In [6]:
# read in 2005 data for the reciprocity measures
data05 = pd.read_stata(file_paths[0], columns=["pid","hid", "syear","vp12602", "vp12603", "vp12605"]).set_index(['pid', 'hid'])
df_05 = data05.rename(columns={ 'vp12602': 'take_revenge', 'vp12603': 'similar_problems', 'vp12605': 'insult_back'})

In [7]:
data06 = pd.read_stata(file_paths[1], columns=["pid", "hid", "syear","wp12402", "wp12401","wp0102", "wp5902", 'wp43b01', "wp43b03", "wp43b05", "wp43b07"]).set_index(['pid', 'hid'])
df_06 = data06.rename(columns={"wp12401": "gender", "wp12402": "year_birth", 'wp43b01': "recog_sup", "wp43b03": "recog_effort", "wp43b05": "recog_personal", "wp43b07": "recog_pay", "wp5902": "wage_lastmonth", "wp0102": "satisfaction_work"})

In [8]:

hours06 = pd.read_stata(file_paths_2[1], columns=["pid","hid", "syear", "wvebzeit", "betr06", "wpsbil", "nace06", "werwzeit", "wbilzeit"]).set_index(['pid', 'hid'])
work06 = hours06.rename(columns={'wvebzeit': 'working_hours', "betr06": "firmsize", "wpsbil": "school_degree", "nace06": "sector", "werwzeit": "tenure" , "wbilzeit" : "years_educ"})

In [9]:
# mapping for reciprocity questions: same scale for all
reciprocity_questions_mapping = {
    '[1] Trifft ueberhaupt nicht zu': 1,
    '[2] Skala 1-7': 2,
    '[3] Skala 1-7': 3,
    '[4] Skala 1-7': 4,
    '[5] Skala 1-7': 5,
    '[6] Skala 1-7': 6,
    '[7] Trifft voll zu': 7,
    '[-1] keine Angabe': -1,
}
recog_mapping = {
    '[-2] trifft nicht zu': -2,
    '[-1] keine Angabe': -1,
    '[1] Ja': 2,
    '[2] Nein': 1,
}
# mapping for firmsize -> we need to recode this in a sensible way: jumps are the same: first change: selbstständig to 0
firmsize_mapping = {
    '[-2] trifft nicht zu': -2,
    '[-1] keine Angabe': -1,
    '[1] Unter  5': 1,
    '[2] 5 bis 10': 2,
    '[3] 11 bis unter 20': 3,
    '[4] bis 90: unter 20': 4,
    '[5] 91-04: 5 bis unter 20': 5,
    '[6] 20 bis unter 100': 6,
    '[7] 100 bis unter 200': 7,
    '[8] bis 98: 20 bis unter 200': 8,
    '[9] 200 bis unter 2000': 9,
    '[10] 2000 und mehr': 10,
    '[11] Selbstaendig-ohne Mitarb.': 0,
}

# mapping for sectors: only to easier remove negatives
sector_map = {
    "[1] Landwirtschaft und  Jagd": 1,
    "[2] Forstwirtschaft": 2,
    "[5] Fischerei und Fischzucht": 5,
    "[10] Kohlenbergbau, Torfgewinnung": 10,
    "[11] Gewinnung von Erdöl und Erdgas, Erbringung damit verbundener Dienstleistungen": 11,
    "[12] Bergbau auf Uran- und Thoriumerze": 12,
    "[13] Erzbergbau": 13,
    "[14] Gewinnung von Steinen und Erden, sonstiger Bergbau": 14,
    "[15] Herstellung von Nahrungs- und Futtermitteln sowie Getränken": 15,
    "[16] Tabakverarbeitung": 16,
    "[17] Herstellung von Textilien": 17,
    "[18] Herstellung von Bekleidung": 18,
    "[19] Herstellung von Leder und Lederwaren": 19,
    "[20] Herstellung von Holz sowie Holz-, Kork- und Flechtwaren (ohne Herstellung von Möbeln)": 20,
    "[21] Herstellung von Papier, Pappe und Waren daraus": 21,
    '[22] Herstellung von Verlags- und Druckerzeugnissen,  Vervielfältigung von bespielten Ton-, Bild- und Datenträgern': 22,
    "[23] Kokerei, Mineralölverarbeitung, Herstellung und Verarbeitung von Spalt- und Brutstoffen": 23,
    "[24] Herstellung von chemischen Erzeugnissen": 24,
    "[25] Herstellung von Gummi- und Kunststoffwaren": 25,
    "[26] Herstellung von Glas und Glaswaren, Keramik, Verarbeitung von Steinen und Erden": 26,
    "[27] Metallerzeugung und -bearbeitung": 27,
    "[28] Herstellung von Metallerzeugnissen": 28,
    "[29] Maschinenbau": 29,
    "[31] Herstellung von Geräten der Elektrizitätserzeugung, -verteilung u. Ä.": 31,
    "[30] Herstellung von Büromaschinen, Datenverarbeitungsgeräten und -einrichtungen": 30,
    "[32] Rundfunk- und Nachrichtentechnik": 32,
    "[33] Medizin-, Mess-, Steuer- und Regelungstechnik, Optik, Herstellung von Uhren": 33,
    "[34] Herstellung von Kraftwagen und Kraftwagenteilen": 34,
    "[35] Sonstiger Fahrzeugbau": 35,
    "[36] Herstellung von Möbeln, Schmuck, Musikinstrumenten, Sportgeräten, Spielwaren und sonstigen Erzeugnissen": 36,
    "[37] Rückgewinnung": 37,
    "[40] Energieversorgung": 40,
    "[41] Wasserversorgung": 41,
    "[45] Bau": 45,
    "[50] Kraftfahrzeughandel; Instandhaltung und Reparatur von Kraftfahrzeugen; Tankstellen": 50,
    "[51] Handelsvermittlung und Großhandel (ohne Handel mit Kraftfahrzeugen)": 51,
    "[52] Einzelhandel (ohne Handel mit Kraftfahrzeugen und ohne Tankstellen); Reparatur von Gebrauchsgütern": 52,
    "[55] Beherbergungs- und Gaststätten": 55,
    "[60] Landverkehr; Transport in Rohrfernleitungen": 60,
    "[61] Schifffahrt": 61,
    "[62] Luftfahrt": 62,
    "[63] Hilfs- und Nebentätigkeiten für den Verkehr; Verkehrsvermittlung": 63,
    "[64] Nachrichtenübermittlung": 64,
    "[65] Kreditinstitute": 65,
    "[66] Versicherungen (ohne Sozialversicherung)": 66,
    "[67] Mit den Kreditinstituten und Versicherungen verbundene Tätigkeiten": 67,
    "[70] Grundstücks- und Wohnungswesen": 70,
    "[71] Vermietung beweglicher Sachen ohne Bedienungspersonal": 71,
    "[72] Datenverarbeitung und Datenbanken": 72,
    "[73] Forschung und Entwicklung": 73,
    "[74] Erbringung von unternehmensbezogenen Dienstleistungen": 74,
    "[75] Öffentliche Verwaltung, Verteidigung, Sozialversicherung": 75,
    "[80] Erziehung und Unterricht": 80,
    "[85] Gesundheits-, Veterinär- und Sozialwesen": 85,
    "[90] Abwasser- und Abfallbeseitigung und sonstige Entsorgung": 90,
    "[91] Interessenvertretungen sowie kirchliche und sonstige Vereinigungen (ohne Sozialwesen, Kultur und Sport)": 91,
    "[92] Kultur, Sport und Unterhaltung": 92,
    "[93] Erbringung von sonstigen Dienstleistungen": 93,
    "[95] Private Haushalte mit Hauspersonal": 95,					
    "[96] Industrie - ohne weitere Zuordnung": 96,					
    "[97] Handwerk - ohne weitere Zuordnung": 97,					
    "[98] Dienstleistungen ohne weitere Zuordnung": 98,					
    "[99] Exterritoriale Organisationen und Körperschaften": 99,				
    "[100] Produzierendes Gewerbe ohne w.Zuordnung": 100,
    "[-1] keine Angabe": -1,
    '[-2] trifft nicht zu': -2, 
    "[-3] unplausibler Wert": -3,
    "[-4] unzulaessige Mehrfachantwort": -4, 
    "[-5] in Fragebogenversion nicht enthalten": -5,
    "[-6] Fragebogenversion mit geaenderter Filterfuehrung": -6, 
    "[-7] nur in weniger eingeschraenkter Edition verfuegbar": -7,
    "[-8] Frage in diesem Jahr nicht Teil des Frageprogramms": -8,
}
# mapping for school degree: to easier remove negatives
school_degree_mapping = {
    '[-2] trifft nicht zu': -2,
    '[-1] keine Angabe':-1,
    '[1] Hauptschulabschluss': 1,
    '[2] Realschulabschluss': 2,
    '[3] Fachhochschulreife': 3,
    '[4] Abitur': 4,
    '[5] Anderer Abschluss': 5,
    '[6] Ohne Abschluss verlassen': 6,
    '[7] Noch kein Abschluss': 7,
    '[8] Keine Schule besucht': 8,
}
satisfaction_mapping = {
    '[0] 0 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 2,
    '[1] 1 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 2,
    '[2] 2 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 2,
    '[3] 3 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 2,
    '[4] 4 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 2,
    '[5] 5 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 2,
    '[6] 6 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 1,
    '[7] 7 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 1,
    '[8] 8 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 1,
    '[9] 9 Zufrieden: Skala 0-Niedrig bis 10-Hoch': 1,
    '[10] 10 Zufrieden: Skala 0-Niedrig bis 10-Hoc': 1,
    '[-2] trifft nicht zu': -2,
    '[-1] keine Angabe': -1,
}

In [10]:
def recode_categoricals(inputdf):

    merged = inputdf  
    
    # recode Gender variable
    merged['gender'].replace('[2] Weiblich', 2,inplace=True)
    merged['gender'].replace('[1] Maennlich', 1,inplace=True)
    # recode reciprocity variables
    merged[["similar_problems","take_revenge","insult_back"]] = merged[["similar_problems","take_revenge","insult_back"]].apply(lambda x: x.map(reciprocity_questions_mapping))
    merged[["recog_sup","recog_effort","recog_personal","recog_pay"]] = merged[["recog_sup","recog_effort","recog_personal","recog_pay"]].apply(lambda x: x.map(recog_mapping))
    merged['firmsize'] = merged['firmsize'].map(firmsize_mapping)
    merged['sector'] = merged['sector'].map(sector_map)
    # recode school degree
    merged['school_degree'] = merged['school_degree'].map(school_degree_mapping)
    merged['satisfaction_work'] = merged['satisfaction_work'].map(satisfaction_mapping)
    output = merged
    return output

In [24]:
# Merge dataframes: a bit tough to read as its nested, merges 3 dataframes: 2005,2006,and 2006gen
allmerged_df = pd.merge(pd.merge(df_05, df_06, on=['pid', 'hid']), work06, on=['pid', 'hid'])
recoded = recode_categoricals(allmerged_df).astype('int')
# construct industry-relative wage
sector_wage_averages = recoded.groupby('sector')['wage_lastmonth'].mean()
recoded["sector_avg_wage"] = recoded["sector"].map(sector_wage_averages)
recoded["relative_wage"] = recoded["wage_lastmonth"] / recoded["sector_avg_wage"]
# replaces negative values with n.a.n 
recoded = recoded.mask(recoded < 0, np.nan) 
# construct avg reciprocity measure
recoded['avg_rec'] = recoded[['take_revenge', 'similar_problems', 'insult_back']].mean(axis=1)
# construct age, potential experience and age^2
recoded['age'] = 2006 - recoded['year_birth']
recoded["potential_experience"] = pow((recoded["age"] - 18), 2)
recoded["age_squared"] = (recoded["age"] ** 2) / 100
recoded["tenure_squared"] = (recoded["tenure"] ** 2) / 100
# recode categoricals back to make it better readable
#recoded["reason_new_job"] = recoded["reason_new_job"].map(reversed_mapping_reason)
#recoded["sector"]=recoded["sector"].map(reversed_mapping_sector)
#recoded["school_degree"] = recoded["school_degree"].map(reversed_mapping_schoold)

# transform binary variables with 1 and 2 into 1 and 0
columns_to_transform = ["recog_sup","recog_effort", "recog_pay", "recog_personal" ,"gender", "satisfaction_work"]

# Iterate over the columns and replace the values 2 with 0
for col in columns_to_transform:
    recoded[col] = recoded[col].replace({2: 0})

# save df somewhere so its not muted when repeatedly executing this cell: Can later transform that into functions
dfnan = recoded


dfnan.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,syear_x,take_revenge,similar_problems,insult_back,syear_y,year_birth,gender,satisfaction_work,wage_lastmonth,recog_sup,...,sector,tenure,years_educ,sector_avg_wage,relative_wage,avg_rec,age,potential_experience,age_squared,tenure_squared
pid,hid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
201,27,2005,1.0,5.0,1.0,2006,1926,0,,,,...,,,10.0,94.115693,,2.333333,80,3844,64.0,
203,60313,2005,2.0,3.0,2.0,2006,1960,1,1.0,2300.0,0.0,...,72.0,1.0,18.0,2196.757962,1.046997,2.333333,46,784,21.16,0.01
602,60,2005,5.0,4.0,3.0,2006,1958,0,,200.0,1.0,...,80.0,1.0,18.0,1773.037897,0.112801,4.0,48,900,23.04,0.01


In [12]:
dfnan.columns

Index(['syear_x', 'take_revenge', 'similar_problems', 'insult_back', 'syear_y',
       'year_birth', 'gender', 'satisfaction_work', 'wage_lastmonth',
       'recog_sup', 'recog_effort', 'recog_personal', 'recog_pay', 'syear',
       'working_hours', 'firmsize', 'school_degree', 'sector', 'tenure',
       'years_educ', 'sector_avg_wage', 'relative_wage', 'avg_rec', 'age',
       'potential_experience', 'age_squared'],
      dtype='object')

### Mincer Wage Regression

In [26]:
df_mincer = dfnan.drop(columns=['syear_x', 'similar_problems', 'take_revenge', 'insult_back','syear_y',
       'recog_personal', 'recog_pay', 'syear_y', 'year_birth','sector_avg_wage', 'school_degree', 'recog_sup', 'age', 'relative_wage', 'recog_effort', 'working_hours', 'syear', 'satisfaction_work', 'tenure_squared'])

In [27]:
# Convert 'gender' and 'sector' columns to categorical data type
for col in ['gender', 'sector']:
    df_mincer[col] = df_mincer[col].astype('category')

In [28]:
df_mincer = df_mincer.dropna()

In [30]:
# Define the dependent variable
y = df_mincer['wage_lastmonth']

# Define the independent variables
X = df_mincer[['gender', 'firmsize', 'tenure', 'years_educ', 'potential_experience', 'age_squared']]

# Add a constant term to the independent variables
X = sm.add_constant(X)

# Fit the Mincer wage regression model
mincer_model = sm.OLS(y, X).fit()

# Print the summary statistics of the model
#mincer_model.summary()

# Get the residuals of the model
residuals_mincer = mincer_model.resid

In [41]:
# drop all columns that we dont need for the regression.
df_regression = dfnan.drop(columns=['syear_x', 'similar_problems', 'take_revenge', 'insult_back','syear_y', 'recog_effort',
       'recog_personal', 'recog_pay', 'syear_y', 'year_birth', 'sector_avg_wage', 'sector', 'wage_lastmonth', 'school_degree', "syear"])

In [42]:
# Create a new column in the dataframe with the same name as the residuals array
df_regression['relative_wage'] = None

# Match the rows of the dataframe with the values in the residuals array using the index
df_regression.loc[df_regression.index, 'relative_wage'] = residuals_mincer

# Rename the column "relative_wage" to "mincer_residuals"
df_regression = df_regression.rename(columns={'relative_wage': 'mincer_residuals'})

  df_regression.loc[df_regression.index, 'relative_wage'] = residuals_mincer


In [43]:
#again, add interaction term reciprocity x unfair treatment 
df_regression["interaction"] = df_regression["recog_sup"] * df_regression["avg_rec"]

In [44]:
#drop nans
df_regression = df_regression.dropna()


In [45]:
from statsmodels.discrete.discrete_model import Probit

Y = df_regression["satisfaction_work"]
X = df_regression.drop(columns=["satisfaction_work"])
X = sm.add_constant(X)
model = Probit(Y, X.astype(float))
probit_model = model.fit(cov_type= "HC3")

probit_model.summary()
#print(probit_model.summary().as_latex())

         Current function value: 0.482957
         Iterations: 35




0,1,2,3
Dep. Variable:,satisfaction_work,No. Observations:,6832.0
Model:,Probit,Df Residuals:,6818.0
Method:,MLE,Df Model:,13.0
Date:,"Mon, 09 Jan 2023",Pseudo R-squ.:,0.09318
Time:,19:15:42,Log-Likelihood:,-3299.6
converged:,False,LL-Null:,-3638.6
Covariance Type:,HC3,LLR p-value:,1.6499999999999999e-136

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.1042,9.9e+04,1.05e-06,1.000,-1.94e+05,1.94e+05
gender,0.1495,0.043,3.464,0.001,0.065,0.234
recog_sup,-0.7773,0.087,-8.894,0.000,-0.949,-0.606
working_hours,-0.0051,0.002,-2.073,0.038,-0.010,-0.000
firmsize,0.0176,0.007,2.489,0.013,0.004,0.031
tenure,0.0209,0.006,3.531,0.000,0.009,0.032
years_educ,0.0427,0.007,6.188,0.000,0.029,0.056
mincer_residuals,0.0002,3.17e-05,6.089,0.000,0.000,0.000
avg_rec,-0.0577,0.017,-3.383,0.001,-0.091,-0.024


negative interaction: combination of unfair treatment and negative reciprocity is associated with lower levels of job satisfaction, however insignificant

Now, instead of using a probit model, we use OLS.

In [48]:
from statsmodels.regression.linear_model import OLS

Y = df_regression["satisfaction_work"]
X = df_regression.drop(columns=["satisfaction_work"])
X = sm.add_constant(X)
model = OLS(Y, X.astype(float))
lpm_model = model.fit(cov_type= "HC3")

lpm_model.summary()
#print(lpm_model.summary().as_latex())

0,1,2,3
Dep. Variable:,satisfaction_work,R-squared:,0.099
Model:,OLS,Adj. R-squared:,0.098
Method:,Least Squares,F-statistic:,2705.0
Date:,"Mon, 09 Jan 2023",Prob (F-statistic):,0.0
Time:,19:17:03,Log-Likelihood:,-3366.4
No. Observations:,6832,AIC:,6759.0
Df Residuals:,6819,BIC:,6848.0
Df Model:,12,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.0830,0.006,14.483,0.000,0.072,0.094
gender,0.0344,0.011,3.079,0.002,0.012,0.056
recog_sup,-0.2044,0.027,-7.558,0.000,-0.257,-0.151
working_hours,-0.0012,0.001,-1.832,0.067,-0.003,8.54e-05
firmsize,0.0052,0.002,2.654,0.008,0.001,0.009
tenure,0.0056,0.002,3.432,0.001,0.002,0.009
years_educ,0.0101,0.002,6.030,0.000,0.007,0.013
mincer_residuals,4.281e-05,6.57e-06,6.519,0.000,2.99e-05,5.57e-05
avg_rec,-0.0130,0.004,-3.286,0.001,-0.021,-0.005

0,1,2,3
Omnibus:,971.958,Durbin-Watson:,1.922
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1450.638
Skew:,-1.127,Prob(JB):,0.0
Kurtosis:,2.86,Cond. No.,1.78e+16
