# Exploring data

Retrieving datasets, filtering, pivoting and merging

In [1]:
import pandas as pd

## Merging happiness datasets from years 2015-2019

Since the different csv for the different years have different column names, made a dictionary to connect the correct ones

In [2]:
column_dict = {"Happiness.Rank" : "Happiness Rank",
               "Overall rank" : "Happiness Rank",
               "Country or region" : "Country",
               "Happiness.Score" : "Happiness Score",
               "Score" : "Happiness Score",
               "Economy (GDP per Capita)" : "GDP",
               "Economy..GDP.per.Capita." : "GDP",
               "GDP per capita" : "GDP",
               "Health (Life Expectancy)" : "Life expectancy",
               "Health..Life.Expectancy." : "Life expectancy",
               "Healthy life expectancy" : "Life expectancy",
               "Trust (Government Corruption)" : "Corruption",
               "Trust..Government.Corruption." : "Corruption",
               "Perceptions of corruption" : "Corruption"
              }

Then, retrieve all 5 datasets and concat them in to one `happiness` dataframe

In [3]:
years = range(2015, 2020)

ind_years = []
columns_to_keep = ["Country", "Year", "Happiness Rank", "Happiness Score", "GDP", "Life expectancy", "Corruption"]

for year in years:
    path = "../data/worldhappiness/" + str(year) + ".csv"
    df = pd.read_csv(path)
    df.rename(columns=column_dict, inplace=True)
    df["Year"] = year
    df = df[columns_to_keep]
    ind_years.append(df)
    
happiness = pd.concat(ind_years, ignore_index=True)

And that's all we need to do for now with the happiness data

In [4]:
happiness.head() 

Unnamed: 0,Country,Year,Happiness Rank,Happiness Score,GDP,Life expectancy,Corruption
0,Switzerland,2015,1,7.587,1.39651,0.94143,0.41978
1,Iceland,2015,2,7.561,1.30232,0.94784,0.14145
2,Denmark,2015,3,7.527,1.32548,0.87464,0.48357
3,Norway,2015,4,7.522,1.459,0.88521,0.36503
4,Canada,2015,5,7.427,1.32629,0.90563,0.32957


## Preparing SGD Data

First, read in the excel sheet for the SGD data

In [5]:
sgd_df = pd.read_excel("../data/sdgindicators/data.xlsx")
sgd_df.head()

Unnamed: 0,setting,date,source,indicator_abbr,indicator_name,dimension,subgroup,estimate,se,ci_lb,...,iso3,favourable_indicator,indicator_scale,ordered_dimension,subgroup_order,reference_subgroup,whoreg6,wbincome2023,dataset_id,update
0,Afghanistan,2000,UN SDG Indicators Database,SI_POV_EMP1,1.1.1 Employed population below international ...,Age (2 groups) (15-25+),15-24 years,66.300003,,,...,AFG,0,100,0,0,0,Eastern Mediterranean,Low income,rep_sdg,24 March 2023
1,Afghanistan,2000,UN SDG Indicators Database,SI_POV_EMP1,1.1.1 Employed population below international ...,Age (2 groups) (15-25+),25+ years,66.300003,,,...,AFG,0,100,0,0,1,Eastern Mediterranean,Low income,rep_sdg,24 March 2023
2,Afghanistan,2000,UN SDG Indicators Database,SI_POV_EMP1,1.1.1 Employed population below international ...,Sex,Female,71.599998,,,...,AFG,0,100,0,0,0,Eastern Mediterranean,Low income,rep_sdg,24 March 2023
3,Afghanistan,2000,UN SDG Indicators Database,SI_POV_EMP1,1.1.1 Employed population below international ...,Sex,Male,65.400002,,,...,AFG,0,100,0,0,1,Eastern Mediterranean,Low income,rep_sdg,24 March 2023
4,Afghanistan,2001,UN SDG Indicators Database,SI_POV_EMP1,1.1.1 Employed population below international ...,Age (2 groups) (15-25+),15-24 years,66.900002,,,...,AFG,0,100,0,0,0,Eastern Mediterranean,Low income,rep_sdg,24 March 2023


Then we check the column names and make a list for the indicators we're interested in

In [6]:
sgd_df.columns

Index(['setting', 'date', 'source', 'indicator_abbr', 'indicator_name',
       'dimension', 'subgroup', 'estimate', 'se', 'ci_lb', 'ci_ub',
       'population', 'flag', 'setting_average', 'iso3', 'favourable_indicator',
       'indicator_scale', 'ordered_dimension', 'subgroup_order',
       'reference_subgroup', 'whoreg6', 'wbincome2023', 'dataset_id',
       'update'],
      dtype='object')

In [7]:
interesting_indicators = [
    "1.1.1 Employed population below international poverty line (%)",
    "1.1.1 Employed population below international poverty line (%) - Female",
    "1.1.1 Employed population below international poverty line (%) - Male",
    "1.1.1 Population below international poverty line (%)",
    '3.4.2 Suicide mortality rate (deaths per 100 000 population)',
    '4.1.2 Completion rate (%)',
    '4.1.2 Completion rate (%) - Lower secondary education',
    '4.1.2 Completion rate (%) - Primary education',
    '4.1.2 Completion rate (%) - Upper secondary education',
    '8.5.2 Unemployment rate (%)',
    '8.5.2 Unemployment rate (%) - Female',
    '8.5.2 Unemployment rate (%) - Male',
]

columns_keep = [
    'setting', 
    'date',
    'indicator_name',
    'dimension', 
    'subgroup', 
    'estimate',
    'setting_average',
    'indicator_scale'
]

Then we filter for the years 2015-2019, since that matches our happiness dataset, and filter only the indicators and columns we want

In [28]:
# filter out years and columns of interest
sgd = sgd_df[(sgd_df["date"] >= 2015) & 
             (sgd_df["date"] <= 2019) & 
             (sgd_df["indicator_name"].isin(interesting_indicators))][columns_keep]

Since in the `estimate` column some values are from 0-100 and some from 0-10000, we normalize values depending on what is indicated in the `indicator_scale` column. Furthermore, rename some columns for better understandability

In [30]:
# normalize estimate based on indicator scale
sgd['value'] = sgd.apply(lambda row: (row['estimate'] / row['indicator_scale']) * 100, axis=1)
sgd['average'] = sgd.apply(lambda row: (row['setting_average'] / row['indicator_scale']) * 100, axis=1)
# drop non-normalizd columns
sgd_norm = sgd.drop(columns=["estimate", "setting_average", "indicator_scale", "dimension"])
# rename columns for convenience
sgd_norm.rename(columns={"setting" : "Country", 
                         "date" : "Year",
                         "indicator_name" : "Data"},
                inplace=True)
# display new head
sgd_norm.head()

Unnamed: 0,Country,Year,Data,subgroup,value,average
60,Afghanistan,2015,1.1.1 Employed population below international ...,15-24 years,46.0,43.700001
61,Afghanistan,2015,1.1.1 Employed population below international ...,25+ years,42.799999,43.700001
62,Afghanistan,2015,1.1.1 Employed population below international ...,Female,50.700001,43.700001
63,Afghanistan,2015,1.1.1 Employed population below international ...,Male,42.099998,43.700001
64,Afghanistan,2016,1.1.1 Employed population below international ...,15-24 years,45.099998,42.700001


Now, the average value is stored in a different column, but we want to move that into a distinct subgroup and also store it into the `value` column. We create a new dataframe with one row for each group of data, and then concat that to the original, resulting in a cleaner dataframe

In [10]:
# Create a new DataFrame to store the results
average_sgd = pd.DataFrame()

# Create new rows with average stores in "value" column
for group, group_df in sgd_norm.groupby(['Country', 'Year', 'Data']):
    if 'average' not in group_df['subgroup'].values:
        new_row = group_df.iloc[0].copy()
        new_row['subgroup'] = 'average'
        new_row['value'] = new_row['average']
        average_sgd = pd.concat([average_sgd, new_row.to_frame().T], ignore_index=True)

# Concatenate with original and sort
sgd_w_average = pd.concat([sgd_norm, average_sgd], ignore_index=True)
sgd_w_average = sgd_w_average.sort_values(by=['Country', 'Year', 'Data', 'subgroup']).reset_index(drop=True)

# Drop average column
sgd_w_average.drop(columns="average", inplace=True)

sgd_w_average.head()

Unnamed: 0,Country,Year,Data,subgroup,value
0,Afghanistan,2015,1.1.1 Employed population below international ...,15-24 years,46.0
1,Afghanistan,2015,1.1.1 Employed population below international ...,25+ years,42.799999
2,Afghanistan,2015,1.1.1 Employed population below international ...,Female,50.700001
3,Afghanistan,2015,1.1.1 Employed population below international ...,Male,42.099998
4,Afghanistan,2015,1.1.1 Employed population below international ...,average,43.700001


Now we want to move the data description from the "Data" column, and pivot the table so that we have a column for each data type

In [11]:
pivot_df = sgd_w_average.pivot_table(
    index=['Country', 'Year'],
    columns=['Data', 'subgroup'],
    values=['value'],
    aggfunc='first'  # Use 'first' to handle duplicate entries, if any
).reset_index()

pivot_df.head()

Unnamed: 0_level_0,Country,Year,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,1.1.1 Employed population below international poverty line (%),1.1.1 Employed population below international poverty line (%),1.1.1 Employed population below international poverty line (%),1.1.1 Employed population below international poverty line (%),1.1.1 Employed population below international poverty line (%),1.1.1 Employed population below international poverty line (%) - Female,1.1.1 Employed population below international poverty line (%) - Female,1.1.1 Employed population below international poverty line (%) - Female,...,8.5.2 Unemployment rate (%) - Female,8.5.2 Unemployment rate (%) - Female,8.5.2 Unemployment rate (%) - Female,8.5.2 Unemployment rate (%) - Female,8.5.2 Unemployment rate (%) - Female,8.5.2 Unemployment rate (%) - Male,8.5.2 Unemployment rate (%) - Male,8.5.2 Unemployment rate (%) - Male,8.5.2 Unemployment rate (%) - Male,8.5.2 Unemployment rate (%) - Male
subgroup,Unnamed: 1_level_2,Unnamed: 2_level_2,15-24 years,25+ years,Female,Male,average,15-24 years,25+ years,average,...,15-24 years,25+ years,Persons with disability,Persons without disability,average,15-24 years,25+ years,Persons with disability,Persons without disability,average
0,Afghanistan,2015,46.0,42.799999,50.700001,42.099998,43.700001,50.5,50.799999,50.700001,...,,,,,,,,,,
1,Afghanistan,2016,45.099998,41.700001,49.5,41.0,42.700001,49.5,49.599998,49.5,...,,,,,,,,,,
2,Afghanistan,2017,43.900002,40.200001,48.0,39.5,41.299999,48.099998,48.0,48.0,...,21.4,10.1,13.8,14.0,14.0,16.299999,7.9,12.4,10.3,10.4
3,Afghanistan,2018,43.400002,39.400002,47.400002,38.799999,40.599998,47.5,47.299999,47.400002,...,,,,,,,,,,
4,Afghanistan,2019,42.700001,38.599998,46.599998,38.0,39.799999,46.799999,46.5,46.599998,...,,,,,,,,,,


# Merging both Datasets

Firstly, we check if the countries roughly align

In [12]:
hap_countries = set(happiness["Country"])
sgd_countries = set(pivot_df["Country"])
print(len(hap_countries))
print(len(sgd_countries))

len(hap_countries.intersection(sgd_countries))
print(hap_countries.difference(sgd_countries), "\n")
print(sgd_countries.difference(hap_countries))

170
190
{'Macedonia', 'Somaliland region', 'Czech Republic', 'Iran', 'Vietnam', 'United States', 'Palestinian Territories', 'Bolivia', 'Taiwan Province of China', 'Congo (Kinshasa)', 'Laos', 'Trinidad & Tobago', 'Syria', 'Ivory Coast', 'South Korea', 'United Kingdom', 'Turkey', 'Taiwan', 'Russia', 'Northern Cyprus', 'Hong Kong', 'Congo (Brazzaville)', 'Moldova', 'North Cyprus', 'Kosovo', 'Hong Kong S.A.R., China', 'Venezuela', 'Swaziland', 'Somaliland Region', 'Tanzania', 'Netherlands'} 

{'occupied Palestinian territory', 'United States of America', 'Tuvalu', 'Cabo Verde', 'Bahamas', 'Timor-Leste', 'Equatorial Guinea', 'Marshall Islands', 'Venezuela (Bolivarian Republic of)', 'San Marino', 'Monaco', 'Republic of Korea', 'Congo', 'United Republic of Tanzania', 'Cuba', 'Cook Islands', 'Saint Vincent and the Grenadines', "Democratic People's Republic of Korea", 'Solomon Islands', 'Guyana', 'Kiribati', 'Grenada', "Côte d'Ivoire", 'Syrian Arab Republic', 'Vanuatu', 'Barbados', 'Seychelles'

We can fix some by renaming the countries in the sgd dataset

In [13]:
sgd_country_rename = {
    'Eswatini':'Swaziland',
    "Lao People's Democratic Republic":"Laos",
    'United States of America':'United States',
    'The United Kingdom':'United Kingdom',
    'Czechia':'Czech Republic',
    'Netherlands (Kingdom of the)':'Netherlands',
    "Viet nam" : "Vietnam",
    'Russian Federation' : 'Russia',
    "Republic of Moldova" : 'Moldova',
    'Bolivia (Plurinational State of)':'Bolivia',
    'Republic of Korea': 'South Korea',
    'Iran (Islamic Republic of)': 'Iran',
    'Syrian Arab Republic': 'Syria',
    'Türkiye':'Turkey',
    'United Republic of Tanzania': 'Tanzania',
    'occupied Palestinian territory':'Palestinian Territories',
    'Congo':'Congo (Brazzaville)',
    'Democratic Republic of the Congo':'Congo (Kinshasa)',
    'Venezuela (Bolivarian Republic of)':'Venezuela',
    
}

In [14]:
# replace names in SGD dataset
pivot_df['Country'] = pivot_df['Country'].replace(sgd_country_rename)

hap_countries = set(happiness["Country"])
sgd_countries = set(pivot_df["Country"])
print(len(hap_countries))
print(len(sgd_countries))
print(len(hap_countries.intersection(sgd_countries)))

170
190
157


So there are `157` countries that are present in both datasets

Now, we can merge the datasets on year and country

In [15]:
combined = pd.merge(happiness, pivot_df, on=['Country', 'Year'], how="inner").sort_values(by=['Country', 'Year']).reset_index(drop=True)
combined.head()

  combined = pd.merge(happiness, pivot_df, on=['Country', 'Year'], how="inner").sort_values(by=['Country', 'Year']).reset_index(drop=True)
  combined = pd.merge(happiness, pivot_df, on=['Country', 'Year'], how="inner").sort_values(by=['Country', 'Year']).reset_index(drop=True)


Unnamed: 0,Country,Year,Happiness Rank,Happiness Score,GDP,Life expectancy,Corruption,"(value, 1.1.1 Employed population below international poverty line (%), 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%), 25+ years)","(value, 1.1.1 Employed population below international poverty line (%), Female)",...,"(value, 8.5.2 Unemployment rate (%) - Female, 15-24 years)","(value, 8.5.2 Unemployment rate (%) - Female, 25+ years)","(value, 8.5.2 Unemployment rate (%) - Female, Persons with disability)","(value, 8.5.2 Unemployment rate (%) - Female, Persons without disability)","(value, 8.5.2 Unemployment rate (%) - Female, average)","(value, 8.5.2 Unemployment rate (%) - Male, 15-24 years)","(value, 8.5.2 Unemployment rate (%) - Male, 25+ years)","(value, 8.5.2 Unemployment rate (%) - Male, Persons with disability)","(value, 8.5.2 Unemployment rate (%) - Male, Persons without disability)","(value, 8.5.2 Unemployment rate (%) - Male, average)"
0,Afghanistan,2015,153,3.575,0.31982,0.30335,0.09719,46.0,42.799999,50.700001,...,,,,,,,,,,
1,Afghanistan,2016,154,3.36,0.38227,0.17344,0.07112,45.099998,41.700001,49.5,...,,,,,,,,,,
2,Afghanistan,2017,141,3.794,0.401477,0.180747,0.061158,43.900002,40.200001,48.0,...,21.4,10.1,13.8,14.0,14.0,16.299999,7.9,12.4,10.3,10.4
3,Afghanistan,2018,145,3.632,0.332,0.255,0.036,43.400002,39.400002,47.400002,...,,,,,,,,,,
4,Afghanistan,2019,154,3.203,0.35,0.361,0.025,42.700001,38.599998,46.599998,...,,,,,,,,,,


And we get the full dataset, which we will merge to pickle for easy use.

To read it back later, use `df = pd.read_pickle('../data/combined_df.pkl')`

In [16]:
import pickle

combined.to_pickle('../data/combined_df.pkl')

In [114]:
df = pd.read_pickle('../data/combined_df.pkl')
df['Country'].unique()
df.columns

Index([                                                                                          'Country',
                                                                                                    'Year',
                                                                                          'Happiness Rank',
                                                                                         'Happiness Score',
                                                                                                     'GDP',
                                                                                         'Life expectancy',
                                                                                              'Corruption',
                ('value', '1.1.1 Employed population below international poverty line (%)', '15-24 years'),
                  ('value', '1.1.1 Employed population below international poverty line (%)', '25+ years'),
                     ('value

In [120]:
columns_to_check_nan = df.columns[7:]
for country in df['Country'].unique():
    for column in columns_to_check_nan:
        nan_values = df[(df['Country'] == country) & df[column].isna()]
        
        if not nan_values.empty:
            print(f"Country: {country}, Column: {column}, Years with NaN: {nan_values['Year'].tolist()}")

Country: Afghanistan, Column: ('value', '1.1.1 Population below international poverty line (%)', '15-64 years'), Years with NaN: [2015, 2016, 2017, 2018, 2019]
Country: Afghanistan, Column: ('value', '1.1.1 Population below international poverty line (%)', '65+ years'), Years with NaN: [2015, 2016, 2017, 2018, 2019]
Country: Afghanistan, Column: ('value', '1.1.1 Population below international poverty line (%)', '<15 years'), Years with NaN: [2015, 2016, 2017, 2018, 2019]
Country: Afghanistan, Column: ('value', '1.1.1 Population below international poverty line (%)', 'Female'), Years with NaN: [2015, 2016, 2017, 2018, 2019]
Country: Afghanistan, Column: ('value', '1.1.1 Population below international poverty line (%)', 'Male'), Years with NaN: [2015, 2016, 2017, 2018, 2019]
Country: Afghanistan, Column: ('value', '1.1.1 Population below international poverty line (%)', 'Rural'), Years with NaN: [2015, 2016, 2017, 2018, 2019]
Country: Afghanistan, Column: ('value', '1.1.1 Population belo

In [131]:
for country in df['Country'].unique():
    nan_data = df.loc[df['Country'] == country, columns_to_check_nan].isnull().sum()
    nan_columns = nan_data[nan_data > 0].index
    
    if not nan_columns.empty:
        print(f"Country: {country}")
        for column in nan_columns:
            nan_years = df.loc[(df['Country'] == country) & df[column].isnull(), 'Year'].tolist()
            nan_count = nan_data[column]
            print(f"  - Column with NaN: {column}")
            print(f"    - Years with NaN: {nan_years}")
            print(f"    - Number of NaN values: {nan_count}")

Country: Afghanistan
  - Column with NaN: ('value', '1.1.1 Population below international poverty line (%)', '15-64 years')
    - Years with NaN: [2015, 2016, 2017, 2018, 2019]
    - Number of NaN values: 5
  - Column with NaN: ('value', '1.1.1 Population below international poverty line (%)', '65+ years')
    - Years with NaN: [2015, 2016, 2017, 2018, 2019]
    - Number of NaN values: 5
  - Column with NaN: ('value', '1.1.1 Population below international poverty line (%)', '<15 years')
    - Years with NaN: [2015, 2016, 2017, 2018, 2019]
    - Number of NaN values: 5
  - Column with NaN: ('value', '1.1.1 Population below international poverty line (%)', 'Female')
    - Years with NaN: [2015, 2016, 2017, 2018, 2019]
    - Number of NaN values: 5
  - Column with NaN: ('value', '1.1.1 Population below international poverty line (%)', 'Male')
    - Years with NaN: [2015, 2016, 2017, 2018, 2019]
    - Number of NaN values: 5
  - Column with NaN: ('value', '1.1.1 Population below internatio

In [133]:
poverty_rate_cols = df.filter(like='1.1.1')
poverty_rate_cols.loc[:, 'Country'] = df['Country'].copy()
#poverty_rate_cols.loc[:, 'Year'] = df['Year'].copy()
display(poverty_rate_cols)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  poverty_rate_cols.loc[:, 'Country'] = df['Country'].copy()


Unnamed: 0,"(value, 1.1.1 Employed population below international poverty line (%), 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%), 25+ years)","(value, 1.1.1 Employed population below international poverty line (%), Female)","(value, 1.1.1 Employed population below international poverty line (%), Male)","(value, 1.1.1 Employed population below international poverty line (%), average)","(value, 1.1.1 Employed population below international poverty line (%) - Female, 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%) - Female, 25+ years)","(value, 1.1.1 Employed population below international poverty line (%) - Female, average)","(value, 1.1.1 Employed population below international poverty line (%) - Male, 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%) - Male, 25+ years)","(value, 1.1.1 Employed population below international poverty line (%) - Male, average)","(value, 1.1.1 Population below international poverty line (%), 15-64 years)","(value, 1.1.1 Population below international poverty line (%), 65+ years)","(value, 1.1.1 Population below international poverty line (%), <15 years)","(value, 1.1.1 Population below international poverty line (%), Female)","(value, 1.1.1 Population below international poverty line (%), Male)","(value, 1.1.1 Population below international poverty line (%), Rural)","(value, 1.1.1 Population below international poverty line (%), Urban)","(value, 1.1.1 Population below international poverty line (%), average)",Country
0,46.000000,42.799999,50.700001,42.099998,43.700001,50.500000,50.799999,50.700001,44.799999,41.099998,42.099998,,,,,,,,,Afghanistan
1,45.099998,41.700001,49.500000,41.000000,42.700001,49.500000,49.599998,49.500000,43.900002,39.799999,41.000000,,,,,,,,,Afghanistan
2,43.900002,40.200001,48.000000,39.500000,41.299999,48.099998,48.000000,48.000000,42.599998,38.299999,39.500000,,,,,,,,,Afghanistan
3,43.400002,39.400002,47.400002,38.799999,40.599998,47.500000,47.299999,47.400002,42.099998,37.500000,38.799999,,,,,,,,,Afghanistan
4,42.700001,38.599998,46.599998,38.000000,39.799999,46.799999,46.500000,46.599998,41.400002,36.599998,38.000000,,,,,,,,,Afghanistan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
739,28.700001,23.900000,26.100000,24.700001,25.400000,28.400000,25.200001,26.100000,28.900000,22.500000,24.700001,,,,,,,,,Zimbabwe
740,32.200001,27.299999,29.600000,27.900000,28.799999,31.900000,28.700001,29.600000,32.400002,25.700001,27.900000,,,,,,,,,Zimbabwe
741,33.599998,28.700001,31.000000,29.299999,30.200001,33.299999,30.100000,31.000000,33.900002,27.100000,29.299999,29.735781,29.662500,41.195629,34.278091,34.546928,47.927280,4.78471,34.200001,Zimbabwe
742,33.099998,28.100000,30.500000,28.700001,29.600000,32.900002,29.500000,30.500000,33.299999,26.500000,28.700001,,,,,,,,,Zimbabwe


In [113]:
poverty_rate_cols.columns

Index([         ('value', '1.1.1 Employed population below international poverty line (%)', '15-24 years'),
                  ('value', '1.1.1 Employed population below international poverty line (%)', '25+ years'),
                     ('value', '1.1.1 Employed population below international poverty line (%)', 'Female'),
                       ('value', '1.1.1 Employed population below international poverty line (%)', 'Male'),
                    ('value', '1.1.1 Employed population below international poverty line (%)', 'average'),
       ('value', '1.1.1 Employed population below international poverty line (%) - Female', '15-24 years'),
         ('value', '1.1.1 Employed population below international poverty line (%) - Female', '25+ years'),
           ('value', '1.1.1 Employed population below international poverty line (%) - Female', 'average'),
         ('value', '1.1.1 Employed population below international poverty line (%) - Male', '15-24 years'),
           ('value', '1.1.1 

In [112]:
poverty_rate_cols['Country'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahrain',
       'Bangladesh', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan',
       'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo (Brazzaville)', 'Congo (Kinshasa)',
       'Costa Rica', 'Croatia', 'Cyprus', 'Czech Republic', 'Denmark',
       'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Estonia', 'Ethiopia', 'Finland', 'France', 'Gabon',
       'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Guatemala',
       'Guinea', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kuwait',
       'Kyrgyzstan', 'Laos', 'L

In [111]:
countries_b = poverty_rate_cols[poverty_rate_cols['Country'].str.contains('Bangladesh')]
display(countries_b)

Unnamed: 0,"(value, 1.1.1 Employed population below international poverty line (%), 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%), 25+ years)","(value, 1.1.1 Employed population below international poverty line (%), Female)","(value, 1.1.1 Employed population below international poverty line (%), Male)","(value, 1.1.1 Employed population below international poverty line (%), average)","(value, 1.1.1 Employed population below international poverty line (%) - Female, 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%) - Female, 25+ years)","(value, 1.1.1 Employed population below international poverty line (%) - Female, average)","(value, 1.1.1 Employed population below international poverty line (%) - Male, 15-24 years)","(value, 1.1.1 Employed population below international poverty line (%) - Male, 25+ years)","(value, 1.1.1 Employed population below international poverty line (%) - Male, average)","(value, 1.1.1 Population below international poverty line (%), 15-64 years)","(value, 1.1.1 Population below international poverty line (%), 65+ years)","(value, 1.1.1 Population below international poverty line (%), <15 years)","(value, 1.1.1 Population below international poverty line (%), Female)","(value, 1.1.1 Population below international poverty line (%), Male)","(value, 1.1.1 Population below international poverty line (%), Rural)","(value, 1.1.1 Population below international poverty line (%), Urban)","(value, 1.1.1 Population below international poverty line (%), average)",Country
49,13.8,11.9,13.8,11.6,12.2,15.3,13.5,13.8,13.2,11.3,11.6,,,,,,,,,Bangladesh
50,13.6,11.6,13.4,11.3,11.9,14.9,13.1,13.4,13.1,11.0,11.3,11.79989,13.1953,16.760679,13.76815,13.16496,16.352051,5.77174,13.5,Bangladesh
51,10.9,9.2,10.6,9.1,9.5,12.0,10.3,10.6,10.5,8.8,9.1,,,,,,,,,Bangladesh
52,7.4,6.1,7.1,6.0,6.4,8.2,6.9,7.1,7.1,5.8,6.0,,,,,,,,,Bangladesh
53,6.5,5.4,6.2,5.3,5.6,7.2,6.0,6.2,6.2,5.1,5.3,,,,,,,,,Bangladesh
