# Part 1: Data Sourcing
## 1.1 Dataset Sources
The datasets are available on Kaggle

__[Inequality in Income Across the Globe](https://www.kaggle.com/datasets/iamsouravbanerjee/inequality-in-income-across-the-globe)__

__[Gender Inequality Index by Country](https://www.kaggle.com/datasets/iamsouravbanerjee/gender-inequality-index-dataset)__

__[Inequality in Education Around the World](https://www.kaggle.com/datasets/iamsouravbanerjee/inequality-in-education-around-the-world)__

- ISO3 - three-letter country codes for the Country/Territory

- Country - Name of the Country/Territory

- Hemisphere - Name of the Hemisphere

- Human Development Groups - A summary measure of key dimensions of human development: a long and healthy life, a good education, and a decent standard of living

- UNDP Developing Regions - United Nations Development Programme Developing Regions
    - SSA - Sub-Saharan Africa
    - LAC - Latin America and the Caribbean
    - EAP - East Asia and the Pacific
    - AS - Arab States
    - ECA - Europe and Central Asia
    - SA - South Asia

- HDI Rank (2021) - Human Development Index Rank for 2021

## 1.2 Reason for selecting the datasets about inequality

I am interested in conducting data exploration and analysis on inequality in income, gender and education.

According to __[Oxfam](https://www.oxfam.org/en/press-releases/richest-1-bag-nearly-twice-much-wealth-rest-world-put-together-over-past-two-years)__, the richest one per cent grabbed nearly two-thirds of all new wealth created since 2020, worth $42 trillion, almost twice as much money as the bottom 99 per cent of the world’s population.

I was wondering if the inequality has increased sharply recently or has been gradually rising over a period of time and if there are any trends in specific areas.

In addition, I hope to gain some insights on the relationship among different kinds of inequality - whether there are any interinfluence among them.

# Part 2: Data Cleaning with Python

## 2.1 First Dataset - Inequality in Income Across the Globe

In [1]:
# import Pandas
import pandas as pd

In [2]:
# import income data
income = pd.read_csv("C:/Users/snzk2/Documents/global_inequality/Inequality_in_Income_Across_the_Globe/Inequality_in_Income.csv")

In [3]:
# print the first 10 rows of dataset
income.head(10)

Unnamed: 0,ISO3,Country,Continent,Hemisphere,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),Inequality in income (2010),Inequality in income (2011),Inequality in income (2012),Inequality in income (2013),Inequality in income (2014),Inequality in income (2015),Inequality in income (2016),Inequality in income (2017),Inequality in income (2018),Inequality in income (2019),Inequality in income (2020),Inequality in income (2021)
0,AFG,Afghanistan,Asia,Northern Hemisphere,Low,SA,180.0,10.767,10.767,10.767,,,,,,,,,
1,AGO,Angola,Africa,Southern Hemisphere,Medium,SSA,148.0,24.454,28.94534,28.94534,28.94534,28.94534,28.94534,28.87733,28.87733,28.87733,28.87733,28.87733,28.87733
2,ALB,Albania,Europe,Northern Hemisphere,High,ECA,67.0,18.33096,18.33096,12.237393,12.237393,12.237393,12.237393,12.237393,12.237393,13.17898,13.17898,13.17898,13.17898
3,AND,Andorra,Europe,Northern Hemisphere,Very High,,40.0,,,,,,,,,,,,
4,ARE,United Arab Emirates,Asia,Northern Hemisphere,Very High,AS,26.0,,,,,,,,,,,,
5,ARG,Argentina,America,Southern Hemisphere,Very High,LAC,47.0,28.898242,27.705847,26.45734,25.898772,26.390125,25.531354,26.977223,25.978698,26.405909,28.099767,28.099767,28.099767
6,ARM,Armenia,Asia,Northern Hemisphere,High,ECA,85.0,14.25436,14.25436,14.25436,14.25436,14.25436,14.25436,17.43837,17.43837,17.43837,17.43837,17.43837,17.43837
7,ATG,Antigua and Barbuda,America,Northern Hemisphere,High,LAC,71.0,,,,,,,,,,,,
8,AUS,Australia,Oceania,Southern Hemisphere,Very High,,5.0,17.7,17.7,17.7,17.7,17.296,17.296,17.134,17.134,17.126,17.126,17.126,17.126
9,AUT,Austria,Europe,Northern Hemisphere,Very High,,25.0,12.69666,13.84098,15.4998,15.50253,14.165,13.61598,14.77588,15.90224,13.39422,13.39422,14.93291,14.93291


In [4]:
# examine missing values
print(income.isnull().mean())
print("")

ISO3                           0.000000
Country                        0.000000
Continent                      0.000000
Hemisphere                     0.000000
Human Development Groups       0.020513
UNDP Developing Regions        0.225641
HDI Rank (2021)                0.020513
Inequality in income (2010)    0.317949
Inequality in income (2011)    0.271795
Inequality in income (2012)    0.246154
Inequality in income (2013)    0.215385
Inequality in income (2014)    0.210256
Inequality in income (2015)    0.215385
Inequality in income (2016)    0.210256
Inequality in income (2017)    0.215385
Inequality in income (2018)    0.200000
Inequality in income (2019)    0.194872
Inequality in income (2020)    0.179487
Inequality in income (2021)    0.179487
dtype: float64



In [5]:
# check datatype in each column
print(income.dtypes)

ISO3                            object
Country                         object
Continent                       object
Hemisphere                      object
Human Development Groups        object
UNDP Developing Regions         object
HDI Rank (2021)                float64
Inequality in income (2010)    float64
Inequality in income (2011)    float64
Inequality in income (2012)    float64
Inequality in income (2013)    float64
Inequality in income (2014)    float64
Inequality in income (2015)    float64
Inequality in income (2016)    float64
Inequality in income (2017)    float64
Inequality in income (2018)    float64
Inequality in income (2019)    float64
Inequality in income (2020)    float64
Inequality in income (2021)    float64
dtype: object


## 2.2 Second Dataset - Gender Inequality Index by Country

In [6]:
# import gender data
gender = pd.read_csv("C:/Users/snzk2/Documents/global_inequality/Gender_Inequality_Index_by_Country/Gender_Inequality_Index.csv")

In [7]:
# print the first 10 rows of dataset
gender.head(10)

Unnamed: 0,ISO3,Country,Continent,Hemisphere,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),GII Rank (2021),Gender Inequality Index (1990),Gender Inequality Index (1991),...,Gender Inequality Index (2012),Gender Inequality Index (2013),Gender Inequality Index (2014),Gender Inequality Index (2015),Gender Inequality Index (2016),Gender Inequality Index (2017),Gender Inequality Index (2018),Gender Inequality Index (2019),Gender Inequality Index (2020),Gender Inequality Index (2021)
0,AFG,Afghanistan,Asia,Northern Hemisphere,Low,SA,180.0,167.0,,,...,0.738,0.728,0.718,0.706,0.692,0.678,0.671,0.665,0.674,0.678
1,AGO,Angola,Africa,Southern Hemisphere,Medium,SSA,148.0,136.0,0.725,0.723,...,0.545,0.54,0.531,0.53,0.529,0.538,0.537,0.537,0.537,0.537
2,ALB,Albania,Europe,Northern Hemisphere,High,ECA,67.0,39.0,,,...,0.235,0.225,0.219,0.204,0.191,0.17,0.164,0.156,0.156,0.144
3,AND,Andorra,Europe,Northern Hemisphere,Very High,,40.0,,,,...,,,,,,,,,,
4,ARE,United Arab Emirates,Asia,Northern Hemisphere,Very High,AS,26.0,11.0,0.659,0.647,...,0.171,0.161,0.151,0.126,0.118,0.112,0.103,0.056,0.05,0.049
5,ARG,Argentina,America,Southern Hemisphere,Very High,LAC,47.0,69.0,0.442,0.439,...,0.36,0.356,0.351,0.345,0.338,0.328,0.315,0.306,0.293,0.287
6,ARM,Armenia,Asia,Northern Hemisphere,High,ECA,85.0,53.0,0.47,0.468,...,0.327,0.305,0.313,0.309,0.306,0.261,0.26,0.239,0.239,0.216
7,ATG,Antigua and Barbuda,America,Northern Hemisphere,High,LAC,71.0,,,,...,,,,,,,,,,
8,AUS,Australia,Oceania,Southern Hemisphere,Very High,,5.0,19.0,0.185,0.184,...,0.13,0.123,0.117,0.111,0.101,0.096,0.092,0.08,0.074,0.073
9,AUT,Austria,Europe,Northern Hemisphere,Very High,,25.0,12.0,0.202,0.204,...,0.098,0.086,0.083,0.084,0.082,0.072,0.061,0.054,0.052,0.053


In [8]:
# examine missing values
print(gender.isnull().mean())
print("")

ISO3                              0.000000
Country                           0.000000
Continent                         0.000000
Hemisphere                        0.000000
Human Development Groups          0.020513
UNDP Developing Regions           0.225641
HDI Rank (2021)                   0.020513
GII Rank (2021)                   0.128205
Gender Inequality Index (1990)    0.343590
Gender Inequality Index (1991)    0.343590
Gender Inequality Index (1992)    0.343590
Gender Inequality Index (1993)    0.343590
Gender Inequality Index (1994)    0.343590
Gender Inequality Index (1995)    0.333333
Gender Inequality Index (1996)    0.333333
Gender Inequality Index (1997)    0.333333
Gender Inequality Index (1998)    0.302564
Gender Inequality Index (1999)    0.276923
Gender Inequality Index (2000)    0.261538
Gender Inequality Index (2001)    0.235897
Gender Inequality Index (2002)    0.230769
Gender Inequality Index (2003)    0.215385
Gender Inequality Index (2004)    0.200000
Gender Ineq

In [9]:
# check datatype in each column
print(gender.dtypes)

ISO3                               object
Country                            object
Continent                          object
Hemisphere                         object
Human Development Groups           object
UNDP Developing Regions            object
HDI Rank (2021)                   float64
GII Rank (2021)                   float64
Gender Inequality Index (1990)    float64
Gender Inequality Index (1991)    float64
Gender Inequality Index (1992)    float64
Gender Inequality Index (1993)    float64
Gender Inequality Index (1994)    float64
Gender Inequality Index (1995)    float64
Gender Inequality Index (1996)    float64
Gender Inequality Index (1997)    float64
Gender Inequality Index (1998)    float64
Gender Inequality Index (1999)    float64
Gender Inequality Index (2000)    float64
Gender Inequality Index (2001)    float64
Gender Inequality Index (2002)    float64
Gender Inequality Index (2003)    float64
Gender Inequality Index (2004)    float64
Gender Inequality Index (2005)    

## 2.3 Third Dataset - Inequality in Education Around the World

In [10]:
# import education data
education = pd.read_csv("C:/Users/snzk2/Documents/global_inequality/Inequality_in_Education_Around_the_World/Inequality_in_Education.csv")

In [11]:
# print the first 10 rows of dataset
education.head(10)

Unnamed: 0,ISO3,Country,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),Inequality in Education (2010),Inequality in Education (2011),Inequality in Education (2012),Inequality in Education (2013),Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021)
0,AFG,Afghanistan,Low,SA,180.0,42.809,44.82338,44.82338,44.82338,44.82338,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517
1,AGO,Angola,Medium,SSA,148.0,,,,,,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144
2,ALB,Albania,High,ECA,67.0,11.9,11.9,11.9,11.9,11.9,11.9,11.9,12.33344,12.33344,12.33344,12.33344,12.33344
3,AND,Andorra,Very High,,40.0,15.160302,15.160302,15.160302,15.160302,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154
4,ARE,United Arab Emirates,Very High,AS,26.0,,,,,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355
5,ARG,Argentina,Very High,LAC,47.0,6.91431,6.8309,6.43056,6.51336,6.19228,6.33131,6.15194,6.20843,5.95181,5.78729,5.78729,5.78729
6,ARM,Armenia,High,ECA,85.0,3.685,3.685,3.685,3.685,3.685,3.685,2.93495,2.93495,2.93495,2.93495,2.93495,2.93495
7,ATG,Antigua and Barbuda,High,LAC,71.0,,,,,,,,,,,,
8,AUS,Australia,Very High,,5.0,2.746427,2.477821,2.475142,2.935539,2.832901,2.720964,2.672566,3.043233,3.05657,3.05657,3.05657,3.05657
9,AUT,Austria,Very High,,25.0,2.48836,3.66724,3.48986,4.29767,2.65811,2.60756,2.57064,2.9996,2.91745,2.91745,2.52241,2.52241


In [12]:
# examine missing values
print(education.isnull().mean())
print("")

ISO3                              0.000000
Country                           0.000000
Human Development Groups          0.020513
UNDP Developing Regions           0.225641
HDI Rank (2021)                   0.020513
Inequality in Education (2010)    0.297436
Inequality in Education (2011)    0.230769
Inequality in Education (2012)    0.194872
Inequality in Education (2013)    0.153846
Inequality in Education (2014)    0.138462
Inequality in Education (2015)    0.138462
Inequality in Education (2016)    0.138462
Inequality in Education (2017)    0.138462
Inequality in Education (2018)    0.117949
Inequality in Education (2019)    0.107692
Inequality in Education (2020)    0.097436
Inequality in Education (2021)    0.097436
dtype: float64



In [13]:
# check datatype in each column
print(education.dtypes)

ISO3                               object
Country                            object
Human Development Groups           object
UNDP Developing Regions            object
HDI Rank (2021)                   float64
Inequality in Education (2010)    float64
Inequality in Education (2011)    float64
Inequality in Education (2012)    float64
Inequality in Education (2013)    float64
Inequality in Education (2014)    float64
Inequality in Education (2015)    float64
Inequality in Education (2016)    float64
Inequality in Education (2017)    float64
Inequality in Education (2018)    float64
Inequality in Education (2019)    float64
Inequality in Education (2020)    float64
Inequality in Education (2021)    float64
dtype: object


## 2.4 Combine Datasets into One

In [14]:
# delete duplicate columns from gender dataframe
gender_dropped = gender.drop(columns=['Country',
                                      'Continent',
                                      'Hemisphere',
                                      'Human Development Groups',
                                      'UNDP Developing Regions',
                                      'HDI Rank (2021)'
                                       ])
gender_dropped

Unnamed: 0,ISO3,GII Rank (2021),Gender Inequality Index (1990),Gender Inequality Index (1991),Gender Inequality Index (1992),Gender Inequality Index (1993),Gender Inequality Index (1994),Gender Inequality Index (1995),Gender Inequality Index (1996),Gender Inequality Index (1997),...,Gender Inequality Index (2012),Gender Inequality Index (2013),Gender Inequality Index (2014),Gender Inequality Index (2015),Gender Inequality Index (2016),Gender Inequality Index (2017),Gender Inequality Index (2018),Gender Inequality Index (2019),Gender Inequality Index (2020),Gender Inequality Index (2021)
0,AFG,167.0,,,,,,,,,...,0.738,0.728,0.718,0.706,0.692,0.678,0.671,0.665,0.674,0.678
1,AGO,136.0,0.725,0.723,0.721,0.719,0.717,0.714,0.712,0.708,...,0.545,0.540,0.531,0.530,0.529,0.538,0.537,0.537,0.537,0.537
2,ALB,39.0,,,,,,,,,...,0.235,0.225,0.219,0.204,0.191,0.170,0.164,0.156,0.156,0.144
3,AND,,,,,,,,,,...,,,,,,,,,,
4,ARE,11.0,0.659,0.647,0.640,0.632,0.635,0.627,0.624,0.616,...,0.171,0.161,0.151,0.126,0.118,0.112,0.103,0.056,0.050,0.049
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,WSM,99.0,,,,,,,,,...,0.482,0.475,0.447,0.443,0.409,0.407,0.407,0.405,0.404,0.418
191,YEM,170.0,,,,,,,,,...,0.800,0.802,0.806,0.815,0.810,0.808,0.806,0.785,0.784,0.820
192,ZAF,97.0,0.511,0.502,0.496,0.489,0.482,0.476,0.472,0.468,...,0.432,0.431,0.433,0.427,0.418,0.407,0.405,0.410,0.408,0.405
193,ZMB,138.0,0.666,0.660,0.654,0.651,0.647,0.642,0.645,0.646,...,0.585,0.581,0.579,0.567,0.545,0.540,0.537,0.534,0.535,0.540


In [15]:
# delete duplicate columns from education dataframe
education_dropped = education.drop(columns=['Country',
                                            'Human Development Groups',
                                            'UNDP Developing Regions',
                                            'HDI Rank (2021)'
                                            ])
education_dropped

Unnamed: 0,ISO3,Inequality in Education (2010),Inequality in Education (2011),Inequality in Education (2012),Inequality in Education (2013),Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021)
0,AFG,42.809000,44.823380,44.823380,44.823380,44.823380,45.365170,45.365170,45.365170,45.365170,45.365170,45.365170,45.365170
1,AGO,,,,,,34.171440,34.171440,34.171440,34.171440,34.171440,34.171440,34.171440
2,ALB,11.900000,11.900000,11.900000,11.900000,11.900000,11.900000,11.900000,12.333440,12.333440,12.333440,12.333440,12.333440
3,AND,15.160302,15.160302,15.160302,15.160302,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154
4,ARE,,,,,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355
...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,WSM,,,,,,,,,,,7.029960,7.029960
191,YEM,48.090120,48.090120,48.090120,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650
192,ZAF,,,16.060770,16.060770,16.060770,14.265590,17.294130,17.294130,17.294130,17.294130,17.294130,17.294130
193,ZMB,23.760000,23.760000,23.760000,23.760000,21.734910,21.734910,21.734910,21.734910,20.435720,20.435720,20.435720,20.435720


In [16]:
# merge the three dataframes
merged = income.merge(
            gender_dropped, on='ISO3', how ='left').merge(
                education_dropped, on='ISO3', how ='outer')
merged

Unnamed: 0,ISO3,Country,Continent,Hemisphere,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),Inequality in income (2010),Inequality in income (2011),Inequality in income (2012),...,Inequality in Education (2012),Inequality in Education (2013),Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021)
0,AFG,Afghanistan,Asia,Northern Hemisphere,Low,SA,180.0,10.76700,10.76700,10.767000,...,44.823380,44.823380,44.823380,45.365170,45.365170,45.365170,45.365170,45.365170,45.365170,45.365170
1,AGO,Angola,Africa,Southern Hemisphere,Medium,SSA,148.0,24.45400,28.94534,28.945340,...,,,,34.171440,34.171440,34.171440,34.171440,34.171440,34.171440,34.171440
2,ALB,Albania,Europe,Northern Hemisphere,High,ECA,67.0,18.33096,18.33096,12.237393,...,11.900000,11.900000,11.900000,11.900000,11.900000,12.333440,12.333440,12.333440,12.333440,12.333440
3,AND,Andorra,Europe,Northern Hemisphere,Very High,,40.0,,,,...,15.160302,15.160302,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154
4,ARE,United Arab Emirates,Asia,Northern Hemisphere,Very High,AS,26.0,,,,...,,,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,WSM,Samoa,Oceania,Southern Hemisphere,High,EAP,111.0,,,,...,,,,,,,,,7.029960,7.029960
191,YEM,Yemen,Asia,Northern Hemisphere,Low,AS,183.0,20.57543,20.57543,20.575430,...,48.090120,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650
192,ZAF,South Africa,Africa,Southern Hemisphere,High,SSA,109.0,57.26800,57.26800,54.705000,...,16.060770,16.060770,16.060770,14.265590,17.294130,17.294130,17.294130,17.294130,17.294130,17.294130
193,ZMB,Zambia,Africa,Southern Hemisphere,Medium,SSA,154.0,42.62731,42.62731,42.627310,...,23.760000,23.760000,21.734910,21.734910,21.734910,21.734910,20.435720,20.435720,20.435720,20.435720


In [17]:
# check column names and duplicates
merged.columns

Index(['ISO3', 'Country', 'Continent', 'Hemisphere',
       'Human Development Groups', 'UNDP Developing Regions',
       'HDI Rank (2021)', 'Inequality in income (2010)',
       'Inequality in income (2011)', 'Inequality in income (2012)',
       'Inequality in income (2013)', 'Inequality in income (2014)',
       'Inequality in income (2015)', 'Inequality in income (2016)',
       'Inequality in income (2017)', 'Inequality in income (2018)',
       'Inequality in income (2019)', 'Inequality in income (2020)',
       'Inequality in income (2021)', 'GII Rank (2021)',
       'Gender Inequality Index (1990)', 'Gender Inequality Index (1991)',
       'Gender Inequality Index (1992)', 'Gender Inequality Index (1993)',
       'Gender Inequality Index (1994)', 'Gender Inequality Index (1995)',
       'Gender Inequality Index (1996)', 'Gender Inequality Index (1997)',
       'Gender Inequality Index (1998)', 'Gender Inequality Index (1999)',
       'Gender Inequality Index (2000)', 'Gender In

In [18]:
# check datatype in each column
print(merged.dtypes)

ISO3                               object
Country                            object
Continent                          object
Hemisphere                         object
Human Development Groups           object
                                   ...   
Inequality in Education (2017)    float64
Inequality in Education (2018)    float64
Inequality in Education (2019)    float64
Inequality in Education (2020)    float64
Inequality in Education (2021)    float64
Length: 64, dtype: object


In [19]:
# select columns with string/mixed type values
merged_str_columns = merged[['ISO3',
                             'Country',
                             'Continent',
                             'Hemisphere',
                             'Human Development Groups',
                             'UNDP Developing Regions']]

# remove leading and trailing whitespaces from strings
for i in merged_str_columns:
    merged[i] = merged[i].str.strip()

In [20]:
# replace NaN with ""
merged = merged.fillna('')

In [21]:
merged

Unnamed: 0,ISO3,Country,Continent,Hemisphere,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),Inequality in income (2010),Inequality in income (2011),Inequality in income (2012),...,Inequality in Education (2012),Inequality in Education (2013),Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021)
0,AFG,Afghanistan,Asia,Northern Hemisphere,Low,SA,180.0,10.767,10.767,10.767,...,44.82338,44.82338,44.82338,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517
1,AGO,Angola,Africa,Southern Hemisphere,Medium,SSA,148.0,24.454,28.94534,28.94534,...,,,,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144
2,ALB,Albania,Europe,Northern Hemisphere,High,ECA,67.0,18.33096,18.33096,12.237393,...,11.9,11.9,11.9,11.9,11.9,12.33344,12.33344,12.33344,12.33344,12.33344
3,AND,Andorra,Europe,Northern Hemisphere,Very High,,40.0,,,,...,15.160302,15.160302,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154
4,ARE,United Arab Emirates,Asia,Northern Hemisphere,Very High,AS,26.0,,,,...,,,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,WSM,Samoa,Oceania,Southern Hemisphere,High,EAP,111.0,,,,...,,,,,,,,,7.02996,7.02996
191,YEM,Yemen,Asia,Northern Hemisphere,Low,AS,183.0,20.57543,20.57543,20.57543,...,48.09012,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765
192,ZAF,South Africa,Africa,Southern Hemisphere,High,SSA,109.0,57.268,57.268,54.705,...,16.06077,16.06077,16.06077,14.26559,17.29413,17.29413,17.29413,17.29413,17.29413,17.29413
193,ZMB,Zambia,Africa,Southern Hemisphere,Medium,SSA,154.0,42.62731,42.62731,42.62731,...,23.76,23.76,21.73491,21.73491,21.73491,21.73491,20.43572,20.43572,20.43572,20.43572


In [22]:
# import latitude and longitude data on Flourish template
geo = pd.read_csv("C:/Users/snzk2/Documents/global_inequality/latitude_longitude.csv")
geo

Unnamed: 0,Country,Longitude,Latitude
0,New Zealand,174.77600,-41.2865
1,Australia,149.12900,-35.2820
2,Uruguay,-56.06750,-34.8941
3,Argentina,-58.41730,-34.6118
4,Chile,-70.64750,-33.4750
...,...,...,...
205,Norway,10.73870,59.9138
206,Finland,24.95250,60.1608
207,Faroe Islands,-6.91181,61.8926
208,Iceland,-21.89520,64.1353


In [23]:
# add latitude and longitude data to merged dataframe
merged_gcs = merged.merge(
             geo, on='Country', how ='left')
merged_gcs

Unnamed: 0,ISO3,Country,Continent,Hemisphere,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),Inequality in income (2010),Inequality in income (2011),Inequality in income (2012),...,Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021),Longitude,Latitude
0,AFG,Afghanistan,Asia,Northern Hemisphere,Low,SA,180.0,10.767,10.767,10.767,...,44.82338,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517,69.1761,34.52280
1,AGO,Angola,Africa,Southern Hemisphere,Medium,SSA,148.0,24.454,28.94534,28.94534,...,,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144,13.2420,-8.81155
2,ALB,Albania,Europe,Northern Hemisphere,High,ECA,67.0,18.33096,18.33096,12.237393,...,11.9,11.9,11.9,12.33344,12.33344,12.33344,12.33344,12.33344,19.8172,41.33170
3,AND,Andorra,Europe,Northern Hemisphere,Very High,,40.0,,,,...,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154,1.5218,42.50750
4,ARE,United Arab Emirates,Asia,Northern Hemisphere,Very High,AS,26.0,,,,...,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355,54.3705,24.47640
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,WSM,Samoa,Oceania,Southern Hemisphere,High,EAP,111.0,,,,...,,,,,,,7.02996,7.02996,-171.7520,-13.83140
191,YEM,Yemen,Asia,Northern Hemisphere,Low,AS,183.0,20.57543,20.57543,20.57543,...,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765,46.13765,,
192,ZAF,South Africa,Africa,Southern Hemisphere,High,SSA,109.0,57.268,57.268,54.705,...,16.06077,14.26559,17.29413,17.29413,17.29413,17.29413,17.29413,17.29413,28.1871,-25.74600
193,ZMB,Zambia,Africa,Southern Hemisphere,Medium,SSA,154.0,42.62731,42.62731,42.62731,...,21.73491,21.73491,21.73491,21.73491,20.43572,20.43572,20.43572,20.43572,28.2937,-15.39820


In [24]:
# export the cleaned dataset
merged_gcs.to_csv(r"C:/Users/snzk2/Documents/global_inequality/merged_gcs.csv", index=False)

## 2.5 Datasets for Average by Continent

### 2.5.1 Income Inequality Average by Continent

In [25]:
# delete columns
income_dropped = income.drop([
    'ISO3',
    'Country',
    'Hemisphere',
    'Human Development Groups',
    'UNDP Developing Regions',
    'HDI Rank (2021)'],
    axis=1
)

income_dropped.head(10)

Unnamed: 0,Continent,Inequality in income (2010),Inequality in income (2011),Inequality in income (2012),Inequality in income (2013),Inequality in income (2014),Inequality in income (2015),Inequality in income (2016),Inequality in income (2017),Inequality in income (2018),Inequality in income (2019),Inequality in income (2020),Inequality in income (2021)
0,Asia,10.767,10.767,10.767,,,,,,,,,
1,Africa,24.454,28.94534,28.94534,28.94534,28.94534,28.94534,28.87733,28.87733,28.87733,28.87733,28.87733,28.87733
2,Europe,18.33096,18.33096,12.237393,12.237393,12.237393,12.237393,12.237393,12.237393,13.17898,13.17898,13.17898,13.17898
3,Europe,,,,,,,,,,,,
4,Asia,,,,,,,,,,,,
5,America,28.898242,27.705847,26.45734,25.898772,26.390125,25.531354,26.977223,25.978698,26.405909,28.099767,28.099767,28.099767
6,Asia,14.25436,14.25436,14.25436,14.25436,14.25436,14.25436,17.43837,17.43837,17.43837,17.43837,17.43837,17.43837
7,America,,,,,,,,,,,,
8,Oceania,17.7,17.7,17.7,17.7,17.296,17.296,17.134,17.134,17.126,17.126,17.126,17.126
9,Europe,12.69666,13.84098,15.4998,15.50253,14.165,13.61598,14.77588,15.90224,13.39422,13.39422,14.93291,14.93291


In [26]:
# examine missing values
print(income_dropped.isnull().mean())
print("")

Continent                      0.000000
Inequality in income (2010)    0.317949
Inequality in income (2011)    0.271795
Inequality in income (2012)    0.246154
Inequality in income (2013)    0.215385
Inequality in income (2014)    0.210256
Inequality in income (2015)    0.215385
Inequality in income (2016)    0.210256
Inequality in income (2017)    0.215385
Inequality in income (2018)    0.200000
Inequality in income (2019)    0.194872
Inequality in income (2020)    0.179487
Inequality in income (2021)    0.179487
dtype: float64



In [27]:
# group by Continent
income_grouped = income_dropped.groupby('Continent').mean().reset_index()
print(income_grouped)

  Continent  Inequality in income (2010)  Inequality in income (2011)  \
0    Africa                    30.852983                    30.411029   
1   America                    32.518300                    32.385245   
2      Asia                    20.517847                    20.051855   
3    Europe                    14.718189                    15.453847   
4   Oceania                    22.115326                    20.711818   

   Inequality in income (2012)  Inequality in income (2013)  \
0                    30.116782                    29.038429   
1                    32.419759                    32.315194   
2                    19.193918                    19.557435   
3                    15.445028                    15.569503   
4                    20.247181                    20.879674   

   Inequality in income (2014)  Inequality in income (2015)  \
0                    29.268111                    28.516585   
1                    31.584749                    31.233

In [28]:
# export the cleaned dataset
income_grouped.to_csv(r"C:/Users/snzk2/Documents/global_inequality/Inequality_in_Income_Across_the_Globe/income_grouped.csv", index=False)

### 2.5.2 Gender Inequality Average by Continent

In [29]:
# delete columns
gender_dropped = gender.drop([
    'ISO3',
    'Country',
    'Hemisphere',
    'Human Development Groups',
    'UNDP Developing Regions',
    'HDI Rank (2021)',
    'GII Rank (2021)'],
    axis=1
)

gender_dropped.head(10)

Unnamed: 0,Continent,Gender Inequality Index (1990),Gender Inequality Index (1991),Gender Inequality Index (1992),Gender Inequality Index (1993),Gender Inequality Index (1994),Gender Inequality Index (1995),Gender Inequality Index (1996),Gender Inequality Index (1997),Gender Inequality Index (1998),...,Gender Inequality Index (2012),Gender Inequality Index (2013),Gender Inequality Index (2014),Gender Inequality Index (2015),Gender Inequality Index (2016),Gender Inequality Index (2017),Gender Inequality Index (2018),Gender Inequality Index (2019),Gender Inequality Index (2020),Gender Inequality Index (2021)
0,Asia,,,,,,,,,,...,0.738,0.728,0.718,0.706,0.692,0.678,0.671,0.665,0.674,0.678
1,Africa,0.725,0.723,0.721,0.719,0.717,0.714,0.712,0.708,0.683,...,0.545,0.54,0.531,0.53,0.529,0.538,0.537,0.537,0.537,0.537
2,Europe,,,,,,,,,,...,0.235,0.225,0.219,0.204,0.191,0.17,0.164,0.156,0.156,0.144
3,Europe,,,,,,,,,,...,,,,,,,,,,
4,Asia,0.659,0.647,0.64,0.632,0.635,0.627,0.624,0.616,0.609,...,0.171,0.161,0.151,0.126,0.118,0.112,0.103,0.056,0.05,0.049
5,America,0.442,0.439,0.436,0.43,0.429,0.426,0.422,0.419,0.415,...,0.36,0.356,0.351,0.345,0.338,0.328,0.315,0.306,0.293,0.287
6,Asia,0.47,0.468,0.487,0.491,0.481,0.469,0.463,0.446,0.438,...,0.327,0.305,0.313,0.309,0.306,0.261,0.26,0.239,0.239,0.216
7,America,,,,,,,,,,...,,,,,,,,,,
8,Oceania,0.185,0.184,0.184,0.181,0.179,0.176,0.174,0.172,0.169,...,0.13,0.123,0.117,0.111,0.101,0.096,0.092,0.08,0.074,0.073
9,Europe,0.202,0.204,0.201,0.195,0.18,0.17,0.163,0.155,0.151,...,0.098,0.086,0.083,0.084,0.082,0.072,0.061,0.054,0.052,0.053


In [30]:
# examine missing values
print(gender_dropped.isnull().mean())
print("")

Continent                         0.000000
Gender Inequality Index (1990)    0.343590
Gender Inequality Index (1991)    0.343590
Gender Inequality Index (1992)    0.343590
Gender Inequality Index (1993)    0.343590
Gender Inequality Index (1994)    0.343590
Gender Inequality Index (1995)    0.333333
Gender Inequality Index (1996)    0.333333
Gender Inequality Index (1997)    0.333333
Gender Inequality Index (1998)    0.302564
Gender Inequality Index (1999)    0.276923
Gender Inequality Index (2000)    0.261538
Gender Inequality Index (2001)    0.235897
Gender Inequality Index (2002)    0.230769
Gender Inequality Index (2003)    0.215385
Gender Inequality Index (2004)    0.200000
Gender Inequality Index (2005)    0.184615
Gender Inequality Index (2006)    0.169231
Gender Inequality Index (2007)    0.164103
Gender Inequality Index (2008)    0.164103
Gender Inequality Index (2009)    0.164103
Gender Inequality Index (2010)    0.153846
Gender Inequality Index (2011)    0.143590
Gender Ineq

In [31]:
# group by Continent
gender_grouped = gender_dropped.groupby('Continent').mean().reset_index()
print(gender_grouped)

  Continent  Gender Inequality Index (1990)  Gender Inequality Index (1991)  \
0    Africa                        0.668088                        0.665029   
1   America                        0.495240                        0.493320   
2      Asia                        0.526433                        0.522733   
3    Europe                        0.270382                        0.269412   
4   Oceania                        0.465600                        0.460800   

   Gender Inequality Index (1992)  Gender Inequality Index (1993)  \
0                        0.662471                        0.660265   
1                        0.490440                        0.487600   
2                        0.520933                        0.518000   
3                        0.264029                        0.259118   
4                        0.455200                        0.451400   

   Gender Inequality Index (1994)  Gender Inequality Index (1995)  \
0                        0.657265        

In [32]:
# export the cleaned dataset
gender_grouped.to_csv(r"C:/Users/snzk2/Documents/global_inequality/Gender_Inequality_Index_by_Country/gender_grouped.csv", index=False)

### 2.5.3 Education Inequality Average by Continent

In [33]:
# add the Continent column
education_merged = pd.merge(
            education, income[['ISO3','Continent']], on='ISO3', how ='left')

education_merged

Unnamed: 0,ISO3,Country,Human Development Groups,UNDP Developing Regions,HDI Rank (2021),Inequality in Education (2010),Inequality in Education (2011),Inequality in Education (2012),Inequality in Education (2013),Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021),Continent
0,AFG,Afghanistan,Low,SA,180.0,42.809000,44.823380,44.823380,44.823380,44.823380,45.365170,45.365170,45.365170,45.365170,45.365170,45.365170,45.365170,Asia
1,AGO,Angola,Medium,SSA,148.0,,,,,,34.171440,34.171440,34.171440,34.171440,34.171440,34.171440,34.171440,Africa
2,ALB,Albania,High,ECA,67.0,11.900000,11.900000,11.900000,11.900000,11.900000,11.900000,11.900000,12.333440,12.333440,12.333440,12.333440,12.333440,Europe
3,AND,Andorra,Very High,,40.0,15.160302,15.160302,15.160302,15.160302,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154,Europe
4,ARE,United Arab Emirates,Very High,AS,26.0,,,,,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355,Asia
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,WSM,Samoa,High,EAP,111.0,,,,,,,,,,,7.029960,7.029960,Oceania
191,YEM,Yemen,Low,AS,183.0,48.090120,48.090120,48.090120,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,46.137650,Asia
192,ZAF,South Africa,High,SSA,109.0,,,16.060770,16.060770,16.060770,14.265590,17.294130,17.294130,17.294130,17.294130,17.294130,17.294130,Africa
193,ZMB,Zambia,Medium,SSA,154.0,23.760000,23.760000,23.760000,23.760000,21.734910,21.734910,21.734910,21.734910,20.435720,20.435720,20.435720,20.435720,Africa


In [34]:
# delete columns
education_dropped = education_merged.drop([
    'ISO3',
    'Country',
    'Human Development Groups',
    'UNDP Developing Regions',
    'HDI Rank (2021)'],
    axis=1
)
education_dropped = education_dropped[[
    'Continent',
    'Inequality in Education (2010)',
    'Inequality in Education (2011)',
    'Inequality in Education (2012)',
    'Inequality in Education (2013)',
    'Inequality in Education (2014)',
    'Inequality in Education (2015)',
    'Inequality in Education (2016)',
    'Inequality in Education (2017)',
    'Inequality in Education (2018)',
    'Inequality in Education (2019)',
    'Inequality in Education (2020)',
    'Inequality in Education (2021)']]

education_dropped.head(10)

Unnamed: 0,Continent,Inequality in Education (2010),Inequality in Education (2011),Inequality in Education (2012),Inequality in Education (2013),Inequality in Education (2014),Inequality in Education (2015),Inequality in Education (2016),Inequality in Education (2017),Inequality in Education (2018),Inequality in Education (2019),Inequality in Education (2020),Inequality in Education (2021)
0,Asia,42.809,44.82338,44.82338,44.82338,44.82338,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517,45.36517
1,Africa,,,,,,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144,34.17144
2,Europe,11.9,11.9,11.9,11.9,11.9,11.9,11.9,12.33344,12.33344,12.33344,12.33344,12.33344
3,Europe,15.160302,15.160302,15.160302,15.160302,9.965681,10.083815,10.008154,10.008154,10.008154,10.008154,10.008154,10.008154
4,Asia,,,,,,,18.241437,14.475335,12.634355,12.634355,12.634355,12.634355
5,America,6.91431,6.8309,6.43056,6.51336,6.19228,6.33131,6.15194,6.20843,5.95181,5.78729,5.78729,5.78729
6,Asia,3.685,3.685,3.685,3.685,3.685,3.685,2.93495,2.93495,2.93495,2.93495,2.93495,2.93495
7,America,,,,,,,,,,,,
8,Oceania,2.746427,2.477821,2.475142,2.935539,2.832901,2.720964,2.672566,3.043233,3.05657,3.05657,3.05657,3.05657
9,Europe,2.48836,3.66724,3.48986,4.29767,2.65811,2.60756,2.57064,2.9996,2.91745,2.91745,2.52241,2.52241


In [35]:
# examine missing values
print(education_dropped.isnull().mean())
print("")

Continent                         0.000000
Inequality in Education (2010)    0.297436
Inequality in Education (2011)    0.230769
Inequality in Education (2012)    0.194872
Inequality in Education (2013)    0.153846
Inequality in Education (2014)    0.138462
Inequality in Education (2015)    0.138462
Inequality in Education (2016)    0.138462
Inequality in Education (2017)    0.138462
Inequality in Education (2018)    0.117949
Inequality in Education (2019)    0.107692
Inequality in Education (2020)    0.097436
Inequality in Education (2021)    0.097436
dtype: float64



In [36]:
# group by Continent
education_grouped = education_dropped.groupby('Continent').mean().reset_index()
print(education_grouped)

  Continent  Inequality in Education (2010)  Inequality in Education (2011)  \
0    Africa                       35.159143                       34.778685   
1   America                       17.641636                       16.745422   
2      Asia                       22.475217                       22.107345   
3    Europe                        6.190862                        5.797983   
4   Oceania                       12.623022                        9.420280   

   Inequality in Education (2012)  Inequality in Education (2013)  \
0                       33.902994                       34.190573   
1                       16.011335                       15.798229   
2                       21.392876                       20.143041   
3                        5.573296                        5.428061   
4                        9.418993                        8.665900   

   Inequality in Education (2014)  Inequality in Education (2015)  \
0                       34.159310        

In [37]:
# export the cleaned dataset
education_grouped.to_csv(r"C:/Users/snzk2/Documents/global_inequality/Inequality_in_Education_Around_the_World/education_grouped.csv", index=False)