This notebooks checks the microdata harmonized across 2000 2005 2010 2014 and 2021-22 (generated from `khm_dhsYY_microdata_hmn.do`). Specifically, it confirms that, for 2005 2010 and 2014, only the drinking water indicator differs from a previous version that is harmonized only across the three years.

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

code_path = Path(r'C:\Users\tianc\OneDrive\Documents\SIG\DISES\code\MPI')
datafd_path = code_path.parent.parent / 'data' / 'MPI'

In [2]:
for year in ['05', '10', '14']:
    print(year)
    # Read in two versions of microdata 
    ## harmonized with 10 and 14
    df = pd.read_stata(datafd_path / f'khm_dhs{year}_cot_nowall' / f'khm_dhs{year}.dta')
    ## harmonized with all years
    df_hmn = pd.read_stata(datafd_path / 'khm_hmn' / f'khm_dhs{year}.dta')
    
    # Microdata without drinking water indicator
    df_bulk = df.drop(columns=['d_wtr', 'd_wtr_01'])
    df_hmn_bulk = df_hmn.drop(columns=['d_wtr', 'd_wtr_01'])
    # Are the same across two versions
    pd.testing.assert_frame_equal(df_bulk, df_hmn_bulk)

    # Difference in drinking water indicator
    print(
        f'Before harmonization: {df.d_wtr.mean()}\n'
        f'After harmonization: {df_hmn.d_wtr.mean()}'  # a bit more deprived as expected since protected spring is recategorized as non-improved for harmonization purposes
    )
    print()

05
Before harmonization: 0.5159689313854695
After harmonization: 0.5204834141827664

10
Before harmonization: 0.44904720256552966
After harmonization: 0.45180355675704326

14
Before harmonization: 0.3581239041496201
After harmonization: 0.361901978792686

