<a href="https://colab.research.google.com/github/fayshaw/data_preprocessing/blob/main/data_preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Preproccesing Workshop
## LivWell Dataset: Women and their Well-being for 52 Countries
### Women in Data and PyLadies Boston
August 21, 2025

Together, we will explore the LivWell dataset from the Belmin et al's 2022 Nature paper <a href=" https://www.nature.com/articles/s41597-022-01824-2"> LivWell: a sub-national Dataset on the Living Conditions of Women and their Well-being for 52 Countries</a>. The authors aggregated a longitudinal dataset from Demographic and Health Surveys (DHS) for subnational regions.  Much of their work is in geographic harmonization of boundaries.

We will wrangle some raw data to look more like their published data set.

Step 1 look at the data!

1. Open LivWell data set
2. Look at raw data
3. Try to get the raw data in a useable form

## Read files

In [None]:
import pandas as pd

from google.colab import files
uploaded = files.upload()

Saving GDL-Mean-International-Wealth-Index-(IWI)-score-of-region-data.csv to GDL-Mean-International-Wealth-Index-(IWI)-score-of-region-data.csv


### Read raw demographics file

Note the NaN header and footer

In [None]:
demographics = pd.read_csv("STATcompilerExport_demographic.csv")
demographics

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9
0,,,,,,,,,,
1,,,,,,,,,,
2,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age
3,Armenia,2015-16 DHS,Total,,,21.4,7.7,47.4,38.4,6
4,Armenia,2015-16 DHS,Total 15-49,29.9,63.3,,,,,
...,...,...,...,...,...,...,...,...,...,...
748,Age difference between man and woman is 10+ years,Percentage of currently married women age 15-2...,,,,,,,,
749,Age difference between man and woman is 5-9 years,Percentage of currently married women age 15-2...,,,,,,,,
750,Age difference between man and woman is <5 years,Percentage of currently married women age 15-2...,,,,,,,,
751,Man and woman are the same age,Percentage of currently married women age 15-2...,,,,,,,,


In [None]:
demographics = pd.read_csv("STATcompilerExport_demographic.csv", skiprows=3, skipfooter=9, engine="python")
demographics

Unnamed: 0,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age
0,Armenia,2015-16 DHS,Total,,,21.4,7.7,47.4,38.4,6.0
1,Armenia,2015-16 DHS,Total 15-49,29.9,63.3,,,,,
2,Armenia,2015-16 DHS,Region : Aragatsotn,,,21.0,,,,
3,Armenia,2015-16 DHS,Region : Ararat,,,21.3,16.4,38.6,37.5,7.4
4,Armenia,2015-16 DHS,Region : Armavir,,,20.0,4.2,55.2,34.1,4.3
...,...,...,...,...,...,...,...,...,...,...
736,Vietnam,1997 DHS,Region : ..Central Highlands,,,21.9,0.0,40.8,48.8,10.4
737,Vietnam,1997 DHS,Region : ..Southeast,,,22.9,14.8,32.5,48.2,3.3
738,Vietnam,1997 DHS,Region : ..Mekong River Delta,,,20.9,5.2,22.1,45.7,19.3
739,Vietnam,1997 DHS,Region : North,,,21.0,3.8,23.8,56.8,9.0


Filter one country

Note the years

In [None]:
armenia_raw = demographics[demographics['Country'] == "Armenia"].copy()
armenia_raw.head()

Unnamed: 0,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age
0,Armenia,2015-16 DHS,Total,,,21.4,7.7,47.4,38.4,6.0
1,Armenia,2015-16 DHS,Total 15-49,29.9,63.3,,,,,
2,Armenia,2015-16 DHS,Region : Aragatsotn,,,21.0,,,,
3,Armenia,2015-16 DHS,Region : Ararat,,,21.3,16.4,38.6,37.5,7.4
4,Armenia,2015-16 DHS,Region : Armavir,,,20.0,4.2,55.2,34.1,4.3


In [None]:
set(armenia_raw['Survey'])

{'2000 DHS', '2005 DHS', '2010 DHS', '2015-16 DHS'}

In [None]:
armenia_df = armenia_raw.copy()
armenia_df[['year_text','source']] = armenia_raw.loc[:, 'Survey'].str.split(expand=True)
armenia_df

Unnamed: 0,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age,year_text,source
0,Armenia,2015-16 DHS,Total,,,21.4,7.7,47.4,38.4,6.0,2015-16,DHS
1,Armenia,2015-16 DHS,Total 15-49,29.9,63.3,,,,,,2015-16,DHS
2,Armenia,2015-16 DHS,Region : Aragatsotn,,,21.0,,,,,2015-16,DHS
3,Armenia,2015-16 DHS,Region : Ararat,,,21.3,16.4,38.6,37.5,7.4,2015-16,DHS
4,Armenia,2015-16 DHS,Region : Armavir,,,20.0,4.2,55.2,34.1,4.3,2015-16,DHS
5,Armenia,2015-16 DHS,Region : Gegharkunik,,,20.1,4.0,58.7,31.4,6.0,2015-16,DHS
6,Armenia,2015-16 DHS,Region : Lori,,,21.3,15.7,31.2,49.2,3.9,2015-16,DHS
7,Armenia,2015-16 DHS,Region : Kotayk,,,20.7,1.9,40.4,49.4,6.4,2015-16,DHS
8,Armenia,2015-16 DHS,Region : Shirak,,,21.0,4.3,45.3,44.2,6.3,2015-16,DHS
9,Armenia,2015-16 DHS,Region : Syunik,,,20.8,,,,,2015-16,DHS


In [None]:
armenia_df['year'] = armenia_df['year_text'].str.split('-').str[0]
armenia_df

Unnamed: 0,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age,year_text,source,year
0,Armenia,2015-16 DHS,Total,,,21.4,7.7,47.4,38.4,6.0,2015-16,DHS,2015
1,Armenia,2015-16 DHS,Total 15-49,29.9,63.3,,,,,,2015-16,DHS,2015
2,Armenia,2015-16 DHS,Region : Aragatsotn,,,21.0,,,,,2015-16,DHS,2015
3,Armenia,2015-16 DHS,Region : Ararat,,,21.3,16.4,38.6,37.5,7.4,2015-16,DHS,2015
4,Armenia,2015-16 DHS,Region : Armavir,,,20.0,4.2,55.2,34.1,4.3,2015-16,DHS,2015
5,Armenia,2015-16 DHS,Region : Gegharkunik,,,20.1,4.0,58.7,31.4,6.0,2015-16,DHS,2015
6,Armenia,2015-16 DHS,Region : Lori,,,21.3,15.7,31.2,49.2,3.9,2015-16,DHS,2015
7,Armenia,2015-16 DHS,Region : Kotayk,,,20.7,1.9,40.4,49.4,6.4,2015-16,DHS,2015
8,Armenia,2015-16 DHS,Region : Shirak,,,21.0,4.3,45.3,44.2,6.3,2015-16,DHS,2015
9,Armenia,2015-16 DHS,Region : Syunik,,,20.8,,,,,2015-16,DHS,2015


In [None]:
armenia_df['region'] = armenia_df.loc[:, 'Characteristic'].str.split(" : ").str[1]
armenia_df

Unnamed: 0,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age,year_text,source,year,region
0,Armenia,2015-16 DHS,Total,,,21.4,7.7,47.4,38.4,6.0,2015-16,DHS,2015,
1,Armenia,2015-16 DHS,Total 15-49,29.9,63.3,,,,,,2015-16,DHS,2015,
2,Armenia,2015-16 DHS,Region : Aragatsotn,,,21.0,,,,,2015-16,DHS,2015,Aragatsotn
3,Armenia,2015-16 DHS,Region : Ararat,,,21.3,16.4,38.6,37.5,7.4,2015-16,DHS,2015,Ararat
4,Armenia,2015-16 DHS,Region : Armavir,,,20.0,4.2,55.2,34.1,4.3,2015-16,DHS,2015,Armavir
5,Armenia,2015-16 DHS,Region : Gegharkunik,,,20.1,4.0,58.7,31.4,6.0,2015-16,DHS,2015,Gegharkunik
6,Armenia,2015-16 DHS,Region : Lori,,,21.3,15.7,31.2,49.2,3.9,2015-16,DHS,2015,Lori
7,Armenia,2015-16 DHS,Region : Kotayk,,,20.7,1.9,40.4,49.4,6.4,2015-16,DHS,2015,Kotayk
8,Armenia,2015-16 DHS,Region : Shirak,,,21.0,4.3,45.3,44.2,6.3,2015-16,DHS,2015,Shirak
9,Armenia,2015-16 DHS,Region : Syunik,,,20.8,,,,,2015-16,DHS,2015,Syunik


In [None]:
armenia_gdl.drop(index=57, inplace=True)

In [None]:
armenia_gdl.drop(columns=['Level'], inplace=True)

In [None]:
armenia_gdl

Unnamed: 0,Country,ISO_Code,GDLCODE,Region,2000,2010,2016
58,Armenia,ARM,ARMr101,Aragatsotn,55.4,68.4,83.0
59,Armenia,ARM,ARMr102,Ararat,66.6,76.6,84.8
60,Armenia,ARM,ARMr103,Armavir,64.0,72.3,81.1
61,Armenia,ARM,ARMr104,Gegharkunik,59.0,80.9,79.9
62,Armenia,ARM,ARMr106,Kotayk,75.1,84.2,89.0
63,Armenia,ARM,ARMr105,Lori,64.8,82.1,81.8
64,Armenia,ARM,ARMr107,Shirak,69.7,76.2,85.6
65,Armenia,ARM,ARMr108,Syunik,75.0,82.5,85.9
66,Armenia,ARM,ARMr110,Tavush,63.3,76.9,85.2
67,Armenia,ARM,ARMr109,Vayots Dzor,70.7,78.4,85.8


Split the region in the characteristic column by the colon " : "

In [None]:
armenia_df = armenia_df[~armenia_df['Characteristic'].str.contains('Total')]
armenia_df

Unnamed: 0,Country,Survey,Characteristic,Current marital status [Women]: Never married,Current marital status [Women]: Married,Median age at first marriage [Women]: 25-49,Age difference between man and woman is 10+ years,Age difference between man and woman is 5-9 years,Age difference between man and woman is <5 years,Man and woman are the same age,year_text,source,year,region
2,Armenia,2015-16 DHS,Region : Aragatsotn,,,21.0,,,,,2015-16,DHS,2015,Aragatsotn
3,Armenia,2015-16 DHS,Region : Ararat,,,21.3,16.4,38.6,37.5,7.4,2015-16,DHS,2015,Ararat
4,Armenia,2015-16 DHS,Region : Armavir,,,20.0,4.2,55.2,34.1,4.3,2015-16,DHS,2015,Armavir
5,Armenia,2015-16 DHS,Region : Gegharkunik,,,20.1,4.0,58.7,31.4,6.0,2015-16,DHS,2015,Gegharkunik
6,Armenia,2015-16 DHS,Region : Lori,,,21.3,15.7,31.2,49.2,3.9,2015-16,DHS,2015,Lori
7,Armenia,2015-16 DHS,Region : Kotayk,,,20.7,1.9,40.4,49.4,6.4,2015-16,DHS,2015,Kotayk
8,Armenia,2015-16 DHS,Region : Shirak,,,21.0,4.3,45.3,44.2,6.3,2015-16,DHS,2015,Shirak
9,Armenia,2015-16 DHS,Region : Syunik,,,20.8,,,,,2015-16,DHS,2015,Syunik
10,Armenia,2015-16 DHS,Region : Vayots Dzor,,,20.2,9.2,31.7,56.4,2.6,2015-16,DHS,2015,Vayots Dzor
11,Armenia,2015-16 DHS,Region : Tavush,,,20.4,4.4,47.1,35.9,12.7,2015-16,DHS,2015,Tavush


In [None]:
armenia_df.drop(columns=['Survey', 'Characteristic', 'year_text'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  armenia_df.drop(columns=['Survey', 'Characteristic', 'year_text'], inplace=True)


In [None]:
rename_cols = {
    'Current marital status [Women]: Never married' : 'Never_married',
    'Current marital status [Women]: Married' : 'Married',
    'Median age at first marriage [Women]: 25-49' : 'Median_age_marriage',
    'Age difference between man and woman is 10+ years' : 'Age_diff_10+',
    'Age difference between man and woman is 5-9 years' : 'Age_diff_5-9',
    'Age difference between man and woman is <5 years'  : 'Age_diff_<5',
    'Man and woman are the same age' : 'Same_age'
}

In [None]:
armenia_df = armenia_df.rename(mapper=rename_cols, inplace=True)
armenia_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  armenia_df = armenia_df.rename(mapper=rename_cols, inplace=True)


In [None]:
armenia_df

In [None]:
armenia_gdl.dropna()

Unnamed: 0,Country,ISO_Code,Level,GDLCODE,Region,1992,1993,1994,1995,1996,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020


In [None]:
pd.read_excel("/content/drive/MyDrive/PyLadies/Colab Notebooks/STATcompilerExport_education.xlsx", engine="openpyxl", data_only=True)

TypeError: read_excel() got an unexpected keyword argument 'data_only'

In [None]:
import requests

url = "https://gitlab.pik-potsdam.de/belmin/livwelldata-paper/-/raw/main/analysis/data/raw_data/validation_data/STATcompilerExport_education.xlsx"
with open("temp.xlsx", "wb") as f:
    f.write(requests.get(url).content)

df = pd.read_excel("temp.xlsx")

TypeError: expected <class 'int'>

In [None]:
pd.read_excel("/content/drive/MyDrive/PyLadies/Colab Notebooks/STATcompilerExport_education.xlsx", engine="odf")

ImportError: Missing optional dependency 'odfpy'.  Use pip or conda to install odfpy.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
armenia_gdl = armenia_gdl.dropna(axis=1)
armenia_gdl

Unnamed: 0,Country,ISO_Code,Level,GDLCODE,Region,2000,2010,2016
57,Armenia,ARM,National,ARMt,Total,71.4,81.9,86.2
58,Armenia,ARM,Subnat,ARMr101,Aragatsotn,55.4,68.4,83.0
59,Armenia,ARM,Subnat,ARMr102,Ararat,66.6,76.6,84.8
60,Armenia,ARM,Subnat,ARMr103,Armavir,64.0,72.3,81.1
61,Armenia,ARM,Subnat,ARMr104,Gegharkunik,59.0,80.9,79.9
62,Armenia,ARM,Subnat,ARMr106,Kotayk,75.1,84.2,89.0
63,Armenia,ARM,Subnat,ARMr105,Lori,64.8,82.1,81.8
64,Armenia,ARM,Subnat,ARMr107,Shirak,69.7,76.2,85.6
65,Armenia,ARM,Subnat,ARMr108,Syunik,75.0,82.5,85.9
66,Armenia,ARM,Subnat,ARMr110,Tavush,63.3,76.9,85.2


In [None]:
import pandas as pd

In [None]:
armenia_gdl.melt(id_vars=['Country', 'ISO_Code','GDLCODE', 'Region'], var_name='year', value_name='count')

Unnamed: 0,Country,ISO_Code,GDLCODE,Region,year,count
0,Armenia,ARM,ARMr101,Aragatsotn,2000,55.4
1,Armenia,ARM,ARMr102,Ararat,2000,66.6
2,Armenia,ARM,ARMr103,Armavir,2000,64.0
3,Armenia,ARM,ARMr104,Gegharkunik,2000,59.0
4,Armenia,ARM,ARMr106,Kotayk,2000,75.1
5,Armenia,ARM,ARMr105,Lori,2000,64.8
6,Armenia,ARM,ARMr107,Shirak,2000,69.7
7,Armenia,ARM,ARMr108,Syunik,2000,75.0
8,Armenia,ARM,ARMr110,Tavush,2000,63.3
9,Armenia,ARM,ARMr109,Vayots Dzor,2000,70.7


## Read file from url

In [None]:
livwell_df = pd.read_csv("https://zenodo.org/records/7277104/files/livwell.csv")
livwell_df

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
0,Armenia,ARM,2000,1,Aragatsotn,AM2000DHS,2000.0,11.0,1210.53,30.71,...,0.388889,0.316667,0.333333,0.250000,0.166667,0.083333,0.083333,0.050000,0.644083,2938.187500
1,Armenia,ARM,2000,2,Ararat,AM2000DHS,2000.0,11.0,1210.55,30.38,...,0.416667,0.316667,0.333333,0.277778,0.233333,0.083333,0.083333,0.050000,0.644127,3053.040283
2,Armenia,ARM,2000,3,Armavir,AM2000DHS,2000.0,10.0,1210.43,31.10,...,0.361111,0.300000,0.333333,0.250000,0.166667,0.083333,0.083333,0.050000,0.644415,3003.245605
3,Armenia,ARM,2000,4,Gegharkunik,AM2000DHS,2000.0,11.0,1210.58,30.65,...,0.416667,0.316667,0.250000,0.194444,0.166667,0.083333,0.083333,0.050000,0.643942,2945.085449
4,Armenia,ARM,2000,5,Lori,AM2000DHS,2000.0,10.0,1210.43,31.57,...,0.388889,0.316667,0.333333,0.222222,0.150000,0.083333,0.083333,0.050000,0.645256,2925.469727
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1827,Zimbabwe,ZWE,2015,6,Matabeleland South,ZW2015DHS,2015.0,9.0,1389.00,27.65,...,0.388889,0.333333,0.333333,0.250000,0.216667,0.250000,0.083333,0.066667,0.516884,1864.769000
1828,Zimbabwe,ZWE,2015,7,Midlands,ZW2015DHS,2015.0,9.0,1388.60,27.89,...,0.388889,0.316667,0.250000,0.138889,0.150000,0.250000,0.083333,0.050000,0.516000,1687.976000
1829,Zimbabwe,ZWE,2015,8,Masvingo,ZW2015DHS,2015.0,9.0,1388.91,28.69,...,0.250000,0.216667,0.166667,0.055556,0.066667,0.083333,0.027778,0.016667,0.515188,1687.113000
1830,Zimbabwe,ZWE,2015,9,Harare/Chitungwiza,ZW2015DHS,2015.0,9.0,1388.71,28.67,...,0.416667,0.333333,0.416667,0.194444,0.116667,0.250000,0.083333,0.050000,0.516000,1687.976000


In [None]:
livwell_df.columns

Index(['country_name', 'country_code', 'year', 'region_num_harmonized',
       'region_name_harmonized', 'SurveyId', 'interview_year_mean',
       'interview_month_mean', 'CMC_interview_mean', 'DM_age_mean',
       ...
       'drought_spei03_n1_share36', 'drought_spei03_n1_share60',
       'drought_spei03_n1.5_share12', 'drought_spei03_n1.5_share36',
       'drought_spei03_n1.5_share60', 'drought_spei03_n2_share12',
       'drought_spei03_n2_share36', 'drought_spei03_n2_share60', 'hdi',
       'gdp_pc'],
      dtype='object', length=409)

In [None]:
countries = livwell_df['country_name'].unique()

print(f"{len(countries)} countries: ", countries)

52 countries:  ['Armenia' 'Burundi' 'Benin' 'Burkina Faso' 'Bangladesh' 'Bolivia'
 "Cote d'Ivoire" 'Cameroon' 'Congo Democratic Republic' 'Colombia' 'Egypt'
 'Ethiopia' 'Gabon' 'Ghana' 'Guinea' 'Guatemala' 'Honduras' 'Haiti'
 'Indonesia' 'India' 'Jordan' 'Kenya' 'Cambodia' 'Liberia' 'Lesotho'
 'Morocco' 'Madagascar' 'Maldives' 'Mali' 'Mozambique' 'Malawi' 'Namibia'
 'Niger' 'Nigeria' 'Nicaragua' 'Nepal' 'Pakistan' 'Peru' 'Philippines'
 'Rwanda' 'Senegal' 'Sierra Leone' 'Togo' 'Tajikistan' 'Timor-Leste'
 'Turkey' 'Tanzania' 'Uganda' 'Vietnam' 'South Africa' 'Zambia' 'Zimbabwe']


In [None]:
livwell_df.notnull().sum().to_frame().T

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
0,1832,1832,1832,1832,1832,1832,1824,1824,1824,1824,...,1699,1699,1699,1699,1699,1699,1699,1699,1578,1578


In [None]:
gdl = pd.read_csv("GDL-Mean-International-Wealth-Index-(IWI)-score-of-region-data.csv")
armenia_gdl = gdl[gdl['Country'] == 'Armenia']
armenia_gdl

Unnamed: 0,Country,ISO_Code,Level,GDLCODE,Region,1992,1993,1994,1995,1996,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
57,Armenia,ARM,National,ARMt,Total,,,,,,...,,,,,,86.2,,,,
58,Armenia,ARM,Subnat,ARMr101,Aragatsotn,,,,,,...,,,,,,83.0,,,,
59,Armenia,ARM,Subnat,ARMr102,Ararat,,,,,,...,,,,,,84.8,,,,
60,Armenia,ARM,Subnat,ARMr103,Armavir,,,,,,...,,,,,,81.1,,,,
61,Armenia,ARM,Subnat,ARMr104,Gegharkunik,,,,,,...,,,,,,79.9,,,,
62,Armenia,ARM,Subnat,ARMr106,Kotayk,,,,,,...,,,,,,89.0,,,,
63,Armenia,ARM,Subnat,ARMr105,Lori,,,,,,...,,,,,,81.8,,,,
64,Armenia,ARM,Subnat,ARMr107,Shirak,,,,,,...,,,,,,85.6,,,,
65,Armenia,ARM,Subnat,ARMr108,Syunik,,,,,,...,,,,,,85.9,,,,
66,Armenia,ARM,Subnat,ARMr110,Tavush,,,,,,...,,,,,,85.2,,,,


In [None]:
livwell_df.head()

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
0,Armenia,ARM,2000,1,Aragatsotn,AM2000DHS,2000.0,11.0,1210.53,30.71,...,0.388889,0.316667,0.333333,0.25,0.166667,0.083333,0.083333,0.05,0.644083,2938.1875
1,Armenia,ARM,2000,2,Ararat,AM2000DHS,2000.0,11.0,1210.55,30.38,...,0.416667,0.316667,0.333333,0.277778,0.233333,0.083333,0.083333,0.05,0.644127,3053.040283
2,Armenia,ARM,2000,3,Armavir,AM2000DHS,2000.0,10.0,1210.43,31.1,...,0.361111,0.3,0.333333,0.25,0.166667,0.083333,0.083333,0.05,0.644415,3003.245605
3,Armenia,ARM,2000,4,Gegharkunik,AM2000DHS,2000.0,11.0,1210.58,30.65,...,0.416667,0.316667,0.25,0.194444,0.166667,0.083333,0.083333,0.05,0.643942,2945.085449
4,Armenia,ARM,2000,5,Lori,AM2000DHS,2000.0,10.0,1210.43,31.57,...,0.388889,0.316667,0.333333,0.222222,0.15,0.083333,0.083333,0.05,0.645256,2925.469727


In [None]:
livwell_df.describe()

Unnamed: 0,year,region_num_harmonized,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,DM_age_mean_se,DM_age_15.19_p,DM_age_15.19_p_se,DM_age_20.24_p,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
count,1832.0,1832.0,1824.0,1824.0,1824.0,1824.0,1832.0,1824.0,1832.0,1824.0,...,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1578.0,1578.0
mean,2005.924672,8.695415,2005.723136,7.54057,1276.217736,28.213701,0.425246,18.346595,1.216725,16.854419,...,0.190913,0.188248,0.078183,0.074259,0.071326,0.019914,0.018835,0.017785,0.532819,4044.719926
std,7.181717,10.345797,7.274086,3.26898,87.419613,2.38112,0.401818,7.013974,0.64413,3.704709,...,0.132136,0.113024,0.117895,0.077254,0.063944,0.060229,0.034432,0.027606,0.123282,3959.567698
min,1990.0,1.0,1990.0,1.0,1085.41,17.68,0.0,0.36,0.0,4.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.218,259.568787
25%,2000.0,3.0,2000.0,5.0,1204.115,27.68,0.24,16.52,0.83,14.98,...,0.083333,0.1,0.0,0.0,0.016667,0.0,0.0,0.0,0.441577,1645.63975
50%,2006.0,5.0,2006.0,8.0,1281.375,28.6,0.31,20.175,1.19,17.27,...,0.166667,0.183333,0.0,0.055556,0.066667,0.0,0.0,0.0,0.533,2658.960254
75%,2012.0,11.0,2012.0,10.0,1347.3125,29.68,0.41,22.89,1.54,19.2125,...,0.277778,0.266667,0.083333,0.111111,0.1,0.0,0.027778,0.033333,0.63292,5102.347
max,2019.0,93.0,2019.0,13.0,1436.54,33.58,4.62,33.21,9.31,34.33,...,0.722222,0.633333,0.833333,0.444444,0.416667,0.833333,0.361111,0.216667,0.926206,37156.628906


In [None]:
livwell_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1832 entries, 0 to 1831
Columns: 409 entries, country_name to gdp_pc
dtypes: float64(403), int64(2), object(4)
memory usage: 5.7+ MB


In [None]:
livwell_df.dtypes

Unnamed: 0,0
country_name,object
country_code,object
year,int64
region_num_harmonized,int64
region_name_harmonized,object
...,...
drought_spei03_n2_share12,float64
drought_spei03_n2_share36,float64
drought_spei03_n2_share60,float64
hdi,float64


In [None]:
livwell_df.shape

(1832, 409)

In [None]:
cols = ['country_name', 'country_code', 'year', 'region_num_harmonized']
livwell_df[cols]

Unnamed: 0,country_name,country_code,year,region_num_harmonized
0,Armenia,ARM,2000,1
1,Armenia,ARM,2000,2
2,Armenia,ARM,2000,3
3,Armenia,ARM,2000,4
4,Armenia,ARM,2000,5
...,...,...,...,...
1827,Zimbabwe,ZWE,2015,6
1828,Zimbabwe,ZWE,2015,7
1829,Zimbabwe,ZWE,2015,8
1830,Zimbabwe,ZWE,2015,9


In [None]:
livwell_df.isnull().sum()

Unnamed: 0,0
country_name,0
country_code,0
year,0
region_num_harmonized,0
region_name_harmonized,0
...,...
drought_spei03_n2_share12,133
drought_spei03_n2_share36,133
drought_spei03_n2_share60,133
hdi,254


In [None]:
livwell_df[livwell_df['country_name'] == "Armenia"]

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
0,Armenia,ARM,2000,1,Aragatsotn,AM2000DHS,2000.0,11.0,1210.53,30.71,...,0.388889,0.316667,0.333333,0.25,0.166667,0.083333,0.083333,0.05,0.644083,2938.1875
1,Armenia,ARM,2000,2,Ararat,AM2000DHS,2000.0,11.0,1210.55,30.38,...,0.416667,0.316667,0.333333,0.277778,0.233333,0.083333,0.083333,0.05,0.644127,3053.040283
2,Armenia,ARM,2000,3,Armavir,AM2000DHS,2000.0,10.0,1210.43,31.1,...,0.361111,0.3,0.333333,0.25,0.166667,0.083333,0.083333,0.05,0.644415,3003.245605
3,Armenia,ARM,2000,4,Gegharkunik,AM2000DHS,2000.0,11.0,1210.58,30.65,...,0.416667,0.316667,0.25,0.194444,0.166667,0.083333,0.083333,0.05,0.643942,2945.085449
4,Armenia,ARM,2000,5,Lori,AM2000DHS,2000.0,10.0,1210.43,31.57,...,0.388889,0.316667,0.333333,0.222222,0.15,0.083333,0.083333,0.05,0.645256,2925.469727
5,Armenia,ARM,2000,6,Kotayk,AM2000DHS,2000.0,10.0,1210.48,31.15,...,0.416667,0.316667,0.333333,0.277778,0.183333,0.083333,0.083333,0.05,0.644,2918.557617
6,Armenia,ARM,2000,7,Shirak,AM2000DHS,2000.0,10.0,1210.43,31.8,...,0.388889,0.35,0.333333,0.194444,0.133333,0.083333,0.083333,0.05,0.645674,3053.684326
7,Armenia,ARM,2000,8,Syunik,AM2000DHS,2000.0,10.0,1210.42,31.37,...,0.416667,0.316667,0.166667,0.222222,0.183333,0.083333,0.055556,0.066667,0.644479,3086.177002
8,Armenia,ARM,2000,9,Vayots Dzor,AM2000DHS,2000.0,10.0,1210.41,31.69,...,0.388889,0.3,0.333333,0.277778,0.216667,0.083333,0.194444,0.133333,0.643944,2969.26123
9,Armenia,ARM,2000,10,Tavush,AM2000DHS,2000.0,10.0,1210.46,31.28,...,0.416667,0.3,0.333333,0.194444,0.133333,0.083333,0.083333,0.05,0.644377,3000.003906


In [None]:
livwell_df.duplicated().sum()

np.int64(0)

In [None]:
pd.read_excel("https://gitlab.pik-potsdam.de/belmin/livwelldata-paper/-/raw/main/analysis/data/raw_data/validation_data/STATcompilerExport_education.xlsx")

TypeError: expected <class 'int'>

In [None]:
url = "https://gitlab.pik-potsdam.de/belmin/livwelldata-paper/-/raw/main/analysis/data/raw_data/populationWB.csv"
pd.read_csv(url)

Unnamed: 0,Country.Name,Country.Code,Indicator.Name,Indicator.Code,1960,1961,1962,1963,1964,1965,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
0,Aruba,ABW,"Population, total",SP.POP.TOTL,54208.0,55435.0,56226.0,56697.0,57029.0,57360.0,...,101218.0,101342.0,101416.0,101597.0,101936.0,102393.0,102921.0,103441.0,103889.0,
1,Andorra,AND,"Population, total",SP.POP.TOTL,13414.0,14376.0,15376.0,16410.0,17470.0,18551.0,...,84878.0,85616.0,85474.0,84419.0,82326.0,79316.0,75902.0,72786.0,70473.0,
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,8994793.0,9164945.0,9343772.0,9531555.0,9728645.0,9935358.0,...,25877544.0,26528741.0,27207291.0,27962207.0,28809167.0,29726803.0,30682500.0,31627506.0,32526562.0,
3,Angola,AGO,"Population, total",SP.POP.TOTL,5270844.0,5367287.0,5465905.0,5565808.0,5665701.0,5765025.0,...,19183907.0,19842251.0,20520103.0,21219954.0,21942296.0,22685632.0,23448202.0,24227524.0,25021974.0,
4,Albania,ALB,"Population, total",SP.POP.TOTL,1608800.0,1659800.0,1711319.0,1762621.0,1814135.0,1864791.0,...,2970017.0,2947314.0,2927519.0,2913021.0,2904780.0,2900247.0,2896652.0,2893654.0,2889167.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,"Yemen, Rep.",YEM,"Population, total",SP.POP.TOTL,5166311.0,5251663.0,5339285.0,5429501.0,5522690.0,5619170.0,...,21701105.0,22322699.0,22954226.0,23591972.0,24234940.0,24882792.0,25533217.0,26183676.0,26832215.0,
260,South Africa,ZAF,"Population, total",SP.POP.TOTL,17396000.0,17949962.0,18459442.0,18936138.0,19390554.0,19832000.0,...,48596781.0,49296223.0,50020918.0,50771826.0,51549958.0,52356381.0,53192216.0,54058647.0,54956920.0,
261,"Congo, Dem. Rep.",COD,"Population, total",SP.POP.TOTL,15248246.0,15637715.0,16041247.0,16461914.0,16903899.0,17369859.0,...,59834875.0,61809278.0,63845097.0,65938712.0,68087376.0,70291160.0,72552861.0,74877030.0,77266814.0,
262,Zambia,ZMB,"Population, total",SP.POP.TOTL,3049586.0,3142848.0,3240664.0,3342894.0,3449266.0,3559687.0,...,12738676.0,13114579.0,13507849.0,13917439.0,14343526.0,14786581.0,15246086.0,15721343.0,16211767.0,


In [None]:
url = "https://gitlab.pik-potsdam.de/belmin/livwelldata-paper/-/blob/main/analysis/data/raw_data/validation_data/STATcompilerExport_energy_information.xlsx"
pd.read_excel(url)

ValueError: Excel file format cannot be determined, you must specify an engine manually.

In [None]:
livwell_df['EI_internet_day_p']

Unnamed: 0,EI_internet_day_p
0,
1,
2,
3,
4,
...,...
1827,16.80
1828,13.36
1829,9.39
1830,36.96


In [None]:
livwell_lin_df = pd.read_csv("https://zenodo.org/records/7277104/files/livwell_lin_interpolated.csv")
livwell_lin_df

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_15.19_p,...,EI_computer_p,EI_computer_p_se,DP_decide_no_contraception_p,DP_decide_no_contraception_p_se,EI_internet_day_p,EI_internet_day_p_se,EI_internet_week_p,EI_internet_week_p_se,EI_mobile_p,EI_mobile_p_se
0,Colombia,COL,1990,1,Atlantica,CO1990DHS,1990.0,7.0,1086.75,21.52,...,,,,,,,,,,
1,Colombia,COL,1990,2,Oriental,CO1990DHS,1990.0,7.0,1086.93,22.84,...,,,,,,,,,,
2,Colombia,COL,1990,3,Central,CO1990DHS,1990.0,7.0,1086.76,19.49,...,,,,,,,,,,
3,Colombia,COL,1990,4,Pacifica,CO1990DHS,1990.0,7.0,1086.59,21.86,...,,,,,,,,,,
4,Colombia,COL,1990,5,Bogota,CO1990DHS,1990.0,5.0,1085.41,19.51,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7365,Senegal,SEN,2019,4,Kolda + Ziguinchor + Sedhiou (South),SN2019DHS,2019.0,8.0,1435.61,24.41,...,10.67,2.09,76.89,2.18,13.12,1.77,23.30,2.13,55.54,2.54
7366,Sierra Leone,SLE,2019,1,Eastern,SL2019DHS,2019.0,7.0,1434.66,21.11,...,2.06,0.34,44.78,2.50,3.01,0.50,5.56,0.62,33.34,1.65
7367,Sierra Leone,SLE,2019,2,Northern + North Western,SL2019DHS,2019.0,7.0,1434.77,22.77,...,2.36,0.30,52.67,2.31,3.43,0.41,5.44,0.54,34.53,1.36
7368,Sierra Leone,SLE,2019,3,Southern,SL2019DHS,2019.0,7.0,1434.62,21.73,...,2.33,0.36,63.38,2.04,5.50,0.50,7.19,0.66,32.44,1.54


In [None]:
indicators_df = pd.read_csv("https://zenodo.org/records/7277104/files/indicators.csv")
indicators_df

Unnamed: 0,indicator_category,indicator_code,indicator_description
0,Individual demographic information,DM_age_mean,Average age of women
1,Individual demographic information,DM_age_15-19_p,Women in age category 15-19 (%)
2,Individual demographic information,DM_age_20-24_p,Women in age category 20-24 (%)
3,Individual demographic information,DM_age_25-29_p,Women in age category 25-29 (%)
4,Individual demographic information,DM_age_30-34_p,Women in age category 30-34 (%)
...,...,...,...
260,Drought,drought_spei03_n1.5_share36,Share of months in the past 36 months with dro...
261,Drought,drought_spei03_n1.5_share60,Share of months in the past 60 months with dro...
262,Drought,drought_spei03_n2_share12,Share of months in the past 12 months with dro...
263,Drought,drought_spei03_n2_share36,Share of months in the past 36 months with dro...


In [None]:
livwell_lin_df.describe()

Unnamed: 0,year,region_num_harmonized,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_15.19_p,DM_age_15.19_p_se,DM_age_20.24_p,DM_age_20.24_p_se,DM_age_25.29_p,...,EI_computer_p,EI_computer_p_se,DP_decide_no_contraception_p,DP_decide_no_contraception_p_se,EI_internet_day_p,EI_internet_day_p_se,EI_internet_week_p,EI_internet_week_p_se,EI_mobile_p,EI_mobile_p_se
count,7370.0,7370.0,1824.0,1824.0,1824.0,7322.0,7370.0,7322.0,7370.0,7322.0,...,222.0,222.0,206.0,206.0,211.0,211.0,211.0,211.0,216.0,216.0
mean,2005.662822,7.895929,2005.723136,7.54057,1276.217736,18.330098,1.222957,16.999084,1.234551,16.169934,...,15.772973,1.394955,33.876602,2.769515,17.39545,1.473602,24.755308,1.74,63.19,2.133102
std,6.248214,8.938954,7.274086,3.26898,87.419613,6.916951,0.615721,3.556045,0.505939,2.251282,...,18.211635,0.853807,17.430315,1.155579,18.324917,0.934186,22.842431,0.910105,20.510084,1.013954
min,1990.0,1.0,1990.0,1.0,1085.41,0.36,0.0,4.8,0.0,8.82,...,0.14,0.1,1.25,0.39,0.35,0.15,0.69,0.21,10.12,0.33
25%,2001.0,3.0,2000.0,5.0,1204.115,16.55,0.836719,15.400417,0.91,14.74,...,3.055,0.7025,21.7175,1.985,3.0,0.66,5.375,1.015,50.9075,1.4875
50%,2006.0,5.0,2006.0,8.0,1281.375,20.25,1.21,17.45667,1.176667,16.118,...,7.97,1.24,33.22,2.605,10.23,1.35,16.48,1.68,62.14,1.95
75%,2011.0,10.0,2012.0,10.0,1347.3125,22.854217,1.546,19.262,1.466667,17.68,...,21.8675,1.9975,43.7475,3.4575,26.66,2.025,40.715,2.32,80.0175,2.6
max,2019.0,93.0,2019.0,13.0,1436.54,33.21,9.31,34.33,7.31,27.67,...,86.85,3.69,76.89,7.35,78.2,4.13,89.16,4.49,99.37,6.8


In [None]:
livwell_df.describe()

Unnamed: 0,year,region_num_harmonized,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,DM_age_mean_se,DM_age_15.19_p,DM_age_15.19_p_se,DM_age_20.24_p,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
count,1832.0,1832.0,1824.0,1824.0,1824.0,1824.0,1832.0,1824.0,1832.0,1824.0,...,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1578.0,1578.0
mean,2005.924672,8.695415,2005.723136,7.54057,1276.217736,28.213701,0.425246,18.346595,1.216725,16.854419,...,0.190913,0.188248,0.078183,0.074259,0.071326,0.019914,0.018835,0.017785,0.532819,4044.719926
std,7.181717,10.345797,7.274086,3.26898,87.419613,2.38112,0.401818,7.013974,0.64413,3.704709,...,0.132136,0.113024,0.117895,0.077254,0.063944,0.060229,0.034432,0.027606,0.123282,3959.567698
min,1990.0,1.0,1990.0,1.0,1085.41,17.68,0.0,0.36,0.0,4.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.218,259.568787
25%,2000.0,3.0,2000.0,5.0,1204.115,27.68,0.24,16.52,0.83,14.98,...,0.083333,0.1,0.0,0.0,0.016667,0.0,0.0,0.0,0.441577,1645.63975
50%,2006.0,5.0,2006.0,8.0,1281.375,28.6,0.31,20.175,1.19,17.27,...,0.166667,0.183333,0.0,0.055556,0.066667,0.0,0.0,0.0,0.533,2658.960254
75%,2012.0,11.0,2012.0,10.0,1347.3125,29.68,0.41,22.89,1.54,19.2125,...,0.277778,0.266667,0.083333,0.111111,0.1,0.0,0.027778,0.033333,0.63292,5102.347
max,2019.0,93.0,2019.0,13.0,1436.54,33.58,4.62,33.21,9.31,34.33,...,0.722222,0.633333,0.833333,0.444444,0.416667,0.833333,0.361111,0.216667,0.926206,37156.628906


In [None]:
livwell_df['year'].min()

1990

In [None]:
set(livwell_df['year'])

{1990,
 1991,
 1992,
 1993,
 1994,
 1995,
 1996,
 1997,
 1998,
 1999,
 2000,
 2001,
 2002,
 2003,
 2004,
 2005,
 2006,
 2007,
 2008,
 2009,
 2010,
 2011,
 2012,
 2013,
 2014,
 2015,
 2016,
 2017,
 2018,
 2019}

In [None]:
livwell_lin_df

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_15.19_p,...,EI_computer_p,EI_computer_p_se,DP_decide_no_contraception_p,DP_decide_no_contraception_p_se,EI_internet_day_p,EI_internet_day_p_se,EI_internet_week_p,EI_internet_week_p_se,EI_mobile_p,EI_mobile_p_se
0,Colombia,COL,1990,1,Atlantica,CO1990DHS,1990.0,7.0,1086.75,21.52,...,,,,,,,,,,
1,Colombia,COL,1990,2,Oriental,CO1990DHS,1990.0,7.0,1086.93,22.84,...,,,,,,,,,,
2,Colombia,COL,1990,3,Central,CO1990DHS,1990.0,7.0,1086.76,19.49,...,,,,,,,,,,
3,Colombia,COL,1990,4,Pacifica,CO1990DHS,1990.0,7.0,1086.59,21.86,...,,,,,,,,,,
4,Colombia,COL,1990,5,Bogota,CO1990DHS,1990.0,5.0,1085.41,19.51,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7365,Senegal,SEN,2019,4,Kolda + Ziguinchor + Sedhiou (South),SN2019DHS,2019.0,8.0,1435.61,24.41,...,10.67,2.09,76.89,2.18,13.12,1.77,23.30,2.13,55.54,2.54
7366,Sierra Leone,SLE,2019,1,Eastern,SL2019DHS,2019.0,7.0,1434.66,21.11,...,2.06,0.34,44.78,2.50,3.01,0.50,5.56,0.62,33.34,1.65
7367,Sierra Leone,SLE,2019,2,Northern + North Western,SL2019DHS,2019.0,7.0,1434.77,22.77,...,2.36,0.30,52.67,2.31,3.43,0.41,5.44,0.54,34.53,1.36
7368,Sierra Leone,SLE,2019,3,Southern,SL2019DHS,2019.0,7.0,1434.62,21.73,...,2.33,0.36,63.38,2.04,5.50,0.50,7.19,0.66,32.44,1.54


In [None]:
livwell_df[livwell_df['year'] > 2005]

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
22,Armenia,ARM,2010,1,Aragatsotn,AM2010DHS,2010.0,11.0,1331.05,31.32,...,0.055556,0.066667,0.000000,0.027778,0.016667,0.000000,0.027778,0.016667,0.729074,6511.500000
23,Armenia,ARM,2010,2,Ararat,AM2010DHS,2010.0,11.0,1330.96,30.26,...,0.083333,0.100000,0.000000,0.055556,0.066667,0.000000,0.027778,0.016667,0.729265,6988.957031
24,Armenia,ARM,2010,3,Armavir,AM2010DHS,2010.0,11.0,1330.85,31.88,...,0.055556,0.066667,0.000000,0.027778,0.016667,0.000000,0.027778,0.016667,0.729369,6523.384277
25,Armenia,ARM,2010,4,Gegharkunik,AM2010DHS,2010.0,11.0,1330.85,31.07,...,0.111111,0.116667,0.000000,0.027778,0.033333,0.000000,0.027778,0.016667,0.729347,6670.520508
26,Armenia,ARM,2010,5,Lori,AM2010DHS,2010.0,11.0,1330.84,31.44,...,0.027778,0.050000,0.000000,0.027778,0.016667,0.000000,0.000000,0.000000,0.729563,6512.382324
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1827,Zimbabwe,ZWE,2015,6,Matabeleland South,ZW2015DHS,2015.0,9.0,1389.00,27.65,...,0.388889,0.333333,0.333333,0.250000,0.216667,0.250000,0.083333,0.066667,0.516884,1864.769000
1828,Zimbabwe,ZWE,2015,7,Midlands,ZW2015DHS,2015.0,9.0,1388.60,27.89,...,0.388889,0.316667,0.250000,0.138889,0.150000,0.250000,0.083333,0.050000,0.516000,1687.976000
1829,Zimbabwe,ZWE,2015,8,Masvingo,ZW2015DHS,2015.0,9.0,1388.91,28.69,...,0.250000,0.216667,0.166667,0.055556,0.066667,0.083333,0.027778,0.016667,0.515188,1687.113000
1830,Zimbabwe,ZWE,2015,9,Harare/Chitungwiza,ZW2015DHS,2015.0,9.0,1388.71,28.67,...,0.416667,0.333333,0.416667,0.194444,0.116667,0.250000,0.083333,0.050000,0.516000,1687.976000


In [None]:
livwell_df.columns[101:200]

Index(['EI_mobile_p', 'EI_mobile_p_se', 'EI_internet_week_p',
       'EI_internet_week_p_se', 'EI_internet_day_p', 'EI_internet_day_p_se',
       'EI_news_week_p', 'EI_news_week_p_se', 'EI_radio_week_p',
       'EI_radio_week_p_se', 'EI_tv_week_p', 'EI_tv_week_p_se',
       'DP_decide_money_p', 'DP_decide_money_p_se', 'DP_decide_health_p',
       'DP_decide_health_p_se', 'DP_decide_large_purchase_p',
       'DP_decide_large_purchase_p_se', 'DP_decide_visits_p',
       'DP_decide_visits_p_se', 'DP_owns_house_p', 'DP_owns_house_p_se',
       'DP_owns_land_p', 'DP_owns_land_p_se', 'DP_decide_contraception_p',
       'DP_decide_contraception_p_se', 'DP_decide_no_contraception_p',
       'DP_decide_no_contraception_p_se', 'DP_earn_more_equal_p',
       'DP_earn_more_equal_p_se', 'DP_earn_more_p', 'DP_earn_more_p_se',
       'DV_phys_partner_p', 'DV_phys_partner_p_se', 'DV_phys_partner_12m_p',
       'DV_phys_partner_12m_p_se', 'DV_sex_partner_p', 'DV_sex_partner_p_se',
       'DV_sex_partne

In [None]:
indicators_df.columns

Index(['indicator_category', 'indicator_code', 'indicator_description'], dtype='object')

In [None]:
livwell_df.columns.str.contains('age')

array([False, False, False, False, False, False, False, False, False,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False, False,
       False, False, False, False, False, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,

In [None]:
livwell_df.filter(like='age').columns

Index(['DM_age_mean', 'DM_age_mean_se', 'DM_age_15.19_p', 'DM_age_15.19_p_se',
       'DM_age_20.24_p', 'DM_age_20.24_p_se', 'DM_age_25.29_p',
       'DM_age_25.29_p_se', 'DM_age_30.34_p', 'DM_age_30.34_p_se',
       'DM_age_35.39_p', 'DM_age_35.39_p_se', 'DM_age_40.44_p',
       'DM_age_40.44_p_se', 'DM_age_45.49_p', 'DM_age_45.49_p_se',
       'DM_age_marr_mean', 'DM_age_marr_mean_se', 'DM_age_diff_mean',
       'DM_age_diff_mean_se', 'DM_age_diff_10plus_p',
       'DM_age_diff_10plus_p_se', 'DM_age_diff_5_9_p', 'DM_age_diff_5_9_p_se',
       'DM_age_diff_5minus_p', 'DM_age_diff_5minus_p_se', 'DM_age_diff_0_p',
       'DM_age_diff_0_p_se', 'RH_age_first_birth_mean',
       'RH_age_first_birth_mean_se', 'RH_age_first_sex_mean',
       'RH_age_first_sex_mean_se'],
      dtype='object')

In [None]:
livwell_df.columns[:100]

Index(['country_name', 'country_code', 'year', 'region_num_harmonized',
       'region_name_harmonized', 'SurveyId', 'interview_year_mean',
       'interview_month_mean', 'CMC_interview_mean', 'DM_age_mean',
       'DM_age_mean_se', 'DM_age_15.19_p', 'DM_age_15.19_p_se',
       'DM_age_20.24_p', 'DM_age_20.24_p_se', 'DM_age_25.29_p',
       'DM_age_25.29_p_se', 'DM_age_30.34_p', 'DM_age_30.34_p_se',
       'DM_age_35.39_p', 'DM_age_35.39_p_se', 'DM_age_40.44_p',
       'DM_age_40.44_p_se', 'DM_age_45.49_p', 'DM_age_45.49_p_se',
       'DM_urban_p', 'DM_urban_p_se', 'DM_born_rural_p', 'DM_born_rural_p_se',
       'DM_nvr_marr_p', 'DM_nvr_marr_p_se', 'DM_marr_p', 'DM_marr_p_se',
       'DM_age_marr_mean', 'DM_age_marr_mean_se', 'DM_age_diff_mean',
       'DM_age_diff_mean_se', 'DM_age_diff_10plus_p',
       'DM_age_diff_10plus_p_se', 'DM_age_diff_5_9_p', 'DM_age_diff_5_9_p_se',
       'DM_age_diff_5minus_p', 'DM_age_diff_5minus_p_se', 'DM_age_diff_0_p',
       'DM_age_diff_0_p_se', 'HH_w

In [None]:
len(livwell_df.columns)

409

In [None]:
livwell_df.describe()

Unnamed: 0,year,region_num_harmonized,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,DM_age_mean_se,DM_age_15.19_p,DM_age_15.19_p_se,DM_age_20.24_p,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
count,1832.0,1832.0,1824.0,1824.0,1824.0,1824.0,1832.0,1824.0,1832.0,1824.0,...,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1699.0,1578.0,1578.0
mean,2005.924672,8.695415,2005.723136,7.54057,1276.217736,28.213701,0.425246,18.346595,1.216725,16.854419,...,0.190913,0.188248,0.078183,0.074259,0.071326,0.019914,0.018835,0.017785,0.532819,4044.719926
std,7.181717,10.345797,7.274086,3.26898,87.419613,2.38112,0.401818,7.013974,0.64413,3.704709,...,0.132136,0.113024,0.117895,0.077254,0.063944,0.060229,0.034432,0.027606,0.123282,3959.567698
min,1990.0,1.0,1990.0,1.0,1085.41,17.68,0.0,0.36,0.0,4.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.218,259.568787
25%,2000.0,3.0,2000.0,5.0,1204.115,27.68,0.24,16.52,0.83,14.98,...,0.083333,0.1,0.0,0.0,0.016667,0.0,0.0,0.0,0.441577,1645.63975
50%,2006.0,5.0,2006.0,8.0,1281.375,28.6,0.31,20.175,1.19,17.27,...,0.166667,0.183333,0.0,0.055556,0.066667,0.0,0.0,0.0,0.533,2658.960254
75%,2012.0,11.0,2012.0,10.0,1347.3125,29.68,0.41,22.89,1.54,19.2125,...,0.277778,0.266667,0.083333,0.111111,0.1,0.0,0.027778,0.033333,0.63292,5102.347
max,2019.0,93.0,2019.0,13.0,1436.54,33.58,4.62,33.21,9.31,34.33,...,0.722222,0.633333,0.833333,0.444444,0.416667,0.833333,0.361111,0.216667,0.926206,37156.628906


In [None]:
livwell_df.isnull().sum()

Unnamed: 0,0
country_name,0
country_code,0
year,0
region_num_harmonized,0
region_name_harmonized,0
...,...
drought_spei03_n2_share12,133
drought_spei03_n2_share36,133
drought_spei03_n2_share60,133
hdi,254


In [None]:
livwell_df['country_name']

Unnamed: 0,country_name
0,Armenia
1,Armenia
2,Armenia
3,Armenia
4,Armenia
...,...
1827,Zimbabwe
1828,Zimbabwe
1829,Zimbabwe
1830,Zimbabwe


In [None]:
livwell_df[livwell_df['country_name'] == 'Armenia']

Unnamed: 0,country_name,country_code,year,region_num_harmonized,region_name_harmonized,SurveyId,interview_year_mean,interview_month_mean,CMC_interview_mean,DM_age_mean,...,drought_spei03_n1_share36,drought_spei03_n1_share60,drought_spei03_n1.5_share12,drought_spei03_n1.5_share36,drought_spei03_n1.5_share60,drought_spei03_n2_share12,drought_spei03_n2_share36,drought_spei03_n2_share60,hdi,gdp_pc
0,Armenia,ARM,2000,1,Aragatsotn,AM2000DHS,2000.0,11.0,1210.53,30.71,...,0.388889,0.316667,0.333333,0.25,0.166667,0.083333,0.083333,0.05,0.644083,2938.1875
1,Armenia,ARM,2000,2,Ararat,AM2000DHS,2000.0,11.0,1210.55,30.38,...,0.416667,0.316667,0.333333,0.277778,0.233333,0.083333,0.083333,0.05,0.644127,3053.040283
2,Armenia,ARM,2000,3,Armavir,AM2000DHS,2000.0,10.0,1210.43,31.1,...,0.361111,0.3,0.333333,0.25,0.166667,0.083333,0.083333,0.05,0.644415,3003.245605
3,Armenia,ARM,2000,4,Gegharkunik,AM2000DHS,2000.0,11.0,1210.58,30.65,...,0.416667,0.316667,0.25,0.194444,0.166667,0.083333,0.083333,0.05,0.643942,2945.085449
4,Armenia,ARM,2000,5,Lori,AM2000DHS,2000.0,10.0,1210.43,31.57,...,0.388889,0.316667,0.333333,0.222222,0.15,0.083333,0.083333,0.05,0.645256,2925.469727
5,Armenia,ARM,2000,6,Kotayk,AM2000DHS,2000.0,10.0,1210.48,31.15,...,0.416667,0.316667,0.333333,0.277778,0.183333,0.083333,0.083333,0.05,0.644,2918.557617
6,Armenia,ARM,2000,7,Shirak,AM2000DHS,2000.0,10.0,1210.43,31.8,...,0.388889,0.35,0.333333,0.194444,0.133333,0.083333,0.083333,0.05,0.645674,3053.684326
7,Armenia,ARM,2000,8,Syunik,AM2000DHS,2000.0,10.0,1210.42,31.37,...,0.416667,0.316667,0.166667,0.222222,0.183333,0.083333,0.055556,0.066667,0.644479,3086.177002
8,Armenia,ARM,2000,9,Vayots Dzor,AM2000DHS,2000.0,10.0,1210.41,31.69,...,0.388889,0.3,0.333333,0.277778,0.216667,0.083333,0.194444,0.133333,0.643944,2969.26123
9,Armenia,ARM,2000,10,Tavush,AM2000DHS,2000.0,10.0,1210.46,31.28,...,0.416667,0.3,0.333333,0.194444,0.133333,0.083333,0.083333,0.05,0.644377,3000.003906


In [None]:
set(livwell_df['country_name'])

{'Armenia',
 'Bangladesh',
 'Benin',
 'Bolivia',
 'Burkina Faso',
 'Burundi',
 'Cambodia',
 'Cameroon',
 'Colombia',
 'Congo Democratic Republic',
 "Cote d'Ivoire",
 'Egypt',
 'Ethiopia',
 'Gabon',
 'Ghana',
 'Guatemala',
 'Guinea',
 'Haiti',
 'Honduras',
 'India',
 'Indonesia',
 'Jordan',
 'Kenya',
 'Lesotho',
 'Liberia',
 'Madagascar',
 'Malawi',
 'Maldives',
 'Mali',
 'Morocco',
 'Mozambique',
 'Namibia',
 'Nepal',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'Pakistan',
 'Peru',
 'Philippines',
 'Rwanda',
 'Senegal',
 'Sierra Leone',
 'South Africa',
 'Tajikistan',
 'Tanzania',
 'Timor-Leste',
 'Togo',
 'Turkey',
 'Uganda',
 'Vietnam',
 'Zambia',
 'Zimbabwe'}