# MH of older workers, retirement age and working conditions

## Preprocessing SHARE data

Load libraries

In [1]:
import os
import sys

src_path = os.path.abspath("../")
sys.path.append(src_path)

from utils.common import *
from utils.retirement import *
from utils.share import *

import_libraries()

Preprocess **SHARELIFE data**, apply first filters and create first variables

In [2]:
file_names = ["cv_r.dta", "technical_variables.dta", "dn.dta", "re.dta"]
sharelife = import_share_stata1(file_names=file_names, waves=[7])

In [3]:
sharelife = sharelife_preprocessing(sharelife)

Initial n obs: 63248
Gender, country, 1st year in country - formatted, age 50+ filter - applied
N obs after processing gender and age: 56486
Years of education - calculated
N obs after processing education years: 56486
Current ISCO - identified
N obs after isco job changes: 42255
Years of contribution, 1st year of contribution - calculated
Those worked less than 10 years / started work before age of 10 - deleted
N obs after contribution years: 42255


Preprocess **additional data from SHARE waves 6-8** where isco is available

In [4]:
file_names = ["cv_r.dta", "dn.dta", "ep.dta"]
sharelife_add = import_share_stata1(
    file_names=file_names, waves=[6, 7, 8], convert_categoricals=False
)

In [5]:
sharelife_add = sharelife_add_preprocessing(sharelife_add, sharelife)

N obs initial: 192020
N obs dropping missing isco: 11541
N obs after drop already present in Sharelife: 7323
Gender, country, 1st year in country - formatted, age 50+ filter - applied
N obs after gender and age: 3422
Years of education - calculated
N obs after education: 3422
Current ISCO - identified, those changed job - deleted
N obs after job and isco: 2701
Years of contribution, 1st year of contribution - calculated
Those worked less than 10 years / started work before age of 10 - deleted
N obs after contribution years: 2701


In [6]:
# Concat main and additional datasets
df = pd.concat([sharelife, sharelife_add], axis=0).reset_index(drop=True)

Preprocess **main data from SHARE waves 4-6**

In [7]:
file_names = ["cv_r.dta", "dn.dta", "ep.dta", "ch.dta", "gv_health.dta", "as.dta"]
share = import_share_stata1(
    file_names=file_names, waves=[4, 6], convert_categoricals=True
)

In [8]:
share = share_preprocessing(share, df)

Initial n obs: 126085
Those without ISCO codes - deleted
N obs with ISCO: 49238
N obs after age calculation: 49238
N obs after defining number of children: 49238
Current year, age, number of children and living with a partner - imputed
N obs after defining industry: 49238
Job status, industry of employment - added
N obs after defining finance: 49238
Household income, investments, life insurance - added
N obs after dropping missing sphus:49165
N obs after dropping missing chronic:49164
N obs after dropping missing eurod:47618
Physical and mental health indicators - added
N obs after health: 47618


In [9]:
# Merge with Sharelife data
df = share.merge(df, on=["mergeid"], how="left")

Final preprocessing for **full SHARE dataset**

In [10]:
df = share_final_preprocessing(df)

N obs initial: 47618
Current years of contribution - calculated
Data types - corrected
N obs after data types: 47618
N obs retirement age (and filter to be under it): 47618
Retirement age, work horizon and work horizon change by reforms - calculated
N obs after work horizon change: 6058
Longitudinal and crossectional weights - added
N obs after weights: 6058


In [21]:
df.duplicated().sum()

0

In [22]:
df.country.unique()

array(['Austria', 'Belgium', 'Czech Republic', 'Switzerland', 'Germany',
       'Denmark', 'Estonia', 'Spain', 'France', 'Italy', 'Sweden',
       'Slovenia'], dtype=object)

In [23]:
df.mergeid.nunique()

3029

In [17]:
df.groupby("country").work_horizon_change.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Austria,604.0,1.993377,0.114992,0.0,2.0,2.0,2.0,2.0
Belgium,1034.0,1.194391,1.485466,0.0,0.0,0.0,2.0,5.0
Czech Republic,648.0,4.03034,0.629199,2.0,4.0,4.0,4.0,8.0
Denmark,354.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Estonia,876.0,1.496575,0.071632,0.0,1.5,1.5,1.5,1.5
France,766.0,0.170444,0.221876,0.0,0.0,0.0,0.34,1.0
Germany,32.0,0.031875,0.067415,0.0,0.0,0.0,0.0,0.17
Italy,196.0,2.690816,1.449851,0.0,1.6,3.6,3.75,4.75
Slovenia,580.0,2.296552,0.671391,2.0,2.0,2.0,2.0,4.0
Spain,76.0,0.217105,0.08507,0.0,0.25,0.25,0.25,0.25


In [25]:
df.to_csv(
    "/Users/alexandralugova/Documents/GitHub/MH-old-workers/data/datasets/results/share_clean_w46.csv",
    index=False,
)  # Save resulting dataset