# MH of older workers, retirement age and working conditions

## Preprocessing SHARE data

Load libraries

In [1]:
import os
import sys

src_path = os.path.abspath("../")
sys.path.append(src_path)

from utils.common import *
from utils.retirement import *
from utils.share import *

import_libraries()

Preprocess **SHARELIFE data**, apply first filters and create first variables

In [2]:
file_names = ["cv_r.dta", "technical_variables.dta", "dn.dta", "re.dta"]
sharelife = import_share_stata1(file_names=file_names, waves=[7])

In [3]:
sharelife = sharelife_preprocessing(sharelife)

Initial n obs: 63248
Gender, country, 1st year in country - formatted, age 50+ filter - applied
N obs after processing gender and age: 56486
Years of education - calculated
N obs after processing education years: 56486
Current ISCO - identified
N obs after isco job changes: 42255
Years of contribution, 1st year of contribution - calculated
N obs after contribution years: 42255


Preprocess **additional data from SHARE waves 6-8** where isco is available

In [4]:
file_names = ["cv_r.dta", "dn.dta", "ep.dta"]
sharelife_add = import_share_stata1(
    file_names=file_names, waves=[6, 7, 8], convert_categoricals=True
)

In [5]:
sharelife_add = sharelife_add_preprocessing(sharelife_add, sharelife)

N obs initial: 192020
N obs dropping missing isco: 10679
N obs after drop already present in Sharelife: 6695
Gender, country, 1st year in country - formatted, age 50+ filter - applied
N obs after gender and age: 3132
Years of education - calculated
N obs after education: 3132
Current ISCO - identified, those changed job - deleted
N obs after job and isco: 2413
Years of contribution, 1st year of contribution - calculated
N obs after contribution years: 2413


In [6]:
# Concat main and additional datasets
df = pd.concat([sharelife, sharelife_add], axis=0).reset_index(drop=True)

Preprocess **main data from SHARE waves 4-6**

In [7]:
file_names = ["cv_r.dta", "dn.dta", "ep.dta", "ch.dta", "gv_health.dta", "as.dta"]
share = import_share_stata1(
    file_names=file_names, waves=[4, 6], convert_categoricals=True
)

In [8]:
share = share_preprocessing(share, df)

Initial n obs: 126085
Those without ISCO codes - deleted
N obs with ISCO: 48858
N obs after age calculation: 48858
N obs after defining number of children: 48858
Current year, age, number of children and living with a partner - imputed
N obs after leaving only employed: 15142
N obs after deleting special conditions pension: 12596
Currently not working and eligible to special pensions - deleted
N obs after defining industry: 12596
Job status, industry of employment - added
N obs after defining finance: 12596
Household income, investments, life insurance - added
N obs after dropping missing sphus:12590
N obs after dropping missing chronic:12590
N obs after dropping missing eurod:12303
Physical and mental health indicators - added
N obs after health: 12303


In [9]:
# Merge with Sharelife data
df = share.merge(df, on=["mergeid"], how="left")

Final preprocessing for **full SHARE dataset**

In [10]:
df = share_final_preprocessing(df)

N obs initial: 12303
Current years of contribution - calculated, those with less 10 years - deleted
Data types - corrected
N obs after data types: 11349
Retirement age, work horizon and work horizon change by reforms - calculated
N obs after work horizon change: 5444
Longitudinal and crossectional weights - added
N obs after weights: 5381


In [11]:
df.country.unique()

array(['Austria', 'Belgium', 'Czech Republic', 'Switzerland', 'Germany',
       'Denmark', 'Estonia', 'Spain', 'France', 'Hungary', 'Italy',
       'Netherlands', 'Poland', 'Portugal', 'Sweden', 'Slovenia',
       'Greece', 'Luxembourg'], dtype=object)

In [12]:
df.mergeid.nunique()

3956

In [13]:
df["mergeid"].value_counts().value_counts()

count
1    2531
2    1425
Name: count, dtype: int64

In [14]:
df.groupby("country").work_horizon_change.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Austria,429.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Belgium,828.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Czech Republic,477.0,0.900126,0.362362,0.0,0.66,1.0,1.0,4.0
Denmark,232.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Estonia,630.0,0.897619,0.735913,0.0,0.0,1.5,1.5,1.5
France,347.0,0.144467,0.19945,0.0,0.0,0.0,0.33,0.5
Germany,358.0,0.17,0.0,0.17,0.17,0.17,0.17,0.17
Greece,43.0,4.55814,2.528903,2.0,2.0,7.0,7.0,7.0
Hungary,34.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Italy,351.0,1.635328,1.722599,0.0,0.25,0.25,3.75,4.75


In [15]:
df.to_csv(
    "/Users/alexandralugova/Documents/GitHub/MH-old-workers/data/datasets/results/share_clean_w46.csv",
    index=False,
)  # Save resulting dataset