<a href="https://colab.research.google.com/github/fajriar/Python_Final_Project/blob/main/Inequality_Python_Final_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Final Project (IA 6650 - Python for Public Policy)**

**Name**: Fajri Anggraeni Ramadhani <br>
**UNI**: far2127 <br>
**Email**: far2127@columbia.edu <br>
**Program**: MPA - International Finance and Economic Policy <br>


#**Inequality and Shared Prosperity Across Countries**

## **A. Introduction**

Inequality reflects how economic welfare is shared among people, and one effective way to study it is by looking at the welfare share of each population decile. This project uses data from the [World Bank Poverty and Inequality Platform (PIP)](https://pip.worldbank.org/poverty-calculator) in 2021 Purchasing Power Parity (PPP) terms. The dataset provides welfare shares by decile for all countries on certain years. By focusing on the welfare share held by each decile rather than on average incomes or headcount poverty rates, the project enables a consistent cross-country comparison of how evenly welfare is distributed across populations. The main objective is to assess inequality across countries using these decile welfare share measures as a standardized and comparable indicator.

## **B. Methodology**

The analysis in this project begins with data extraction from the PIP API. It then filtered for relevant years and welfare measure type. Cross-country inequality will be assessed through graphical comparisons and simple inequality indicators derived from decile data, such as the ratio and the gap of top to bottom decile welfare shares. <p>

For this project, the following libraries will be used:
* **Pandas** for data collection (which is done by importing dataset from World Bank), data cleaning, and data analysis; and
* **Plotly** for interactive data visualization.




## **C. Data Cleaning**

### 1. Load & Inspect

Prior to the analysis, I use the Pandas library to clean the dataset and check for duplicates, missing values, and datatypes.

In [None]:
import pandas as pd

# Define a url link, World Bank has provided the API with csv format
url='https://api.worldbank.org/pip/v1/pip?country=all&year=all&povline=3&ppp_version=2021&format=csv&fill_gaps=false'

# Load the csv file in the specific path and assign it to variable called 'df'
df = pd.read_csv(url)

# Inspect data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2556 entries, 0 to 2555
Data columns (total 44 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   region_name           2556 non-null   object 
 1   region_code           2556 non-null   object 
 2   country_name          2556 non-null   object 
 3   country_code          2556 non-null   object 
 4   reporting_year        2556 non-null   int64  
 5   reporting_level       2556 non-null   object 
 6   survey_acronym        2556 non-null   object 
 7   survey_coverage       2556 non-null   object 
 8   survey_year           2556 non-null   float64
 9   welfare_type          2556 non-null   object 
 10  survey_comparability  2556 non-null   int64  
 11  comparable_spell      2556 non-null   object 
 12  poverty_line          2556 non-null   int64  
 13  headcount             2556 non-null   float64
 14  poverty_gap           2556 non-null   float64
 15  poverty_severity     

### 2. Data Cleansing

#### a. Change variable datatype and removing duplicate
Here I change the type of variable `reporting_year` from integer to datetime format (although it is not needed for the analysis but I do it as exercise purposes). On top of it, I also make sure there is no duplicated values on the data.



In [None]:
df['year_dt'] = pd.to_datetime(df['reporting_year'], format='%Y')
df.dtypes

Unnamed: 0,0
region_name,object
region_code,object
country_name,object
country_code,object
reporting_year,int64
reporting_level,object
survey_acronym,object
survey_coverage,object
survey_year,float64
welfare_type,object


In [None]:
df[df.duplicated(keep=False)].sort_values(by=['reporting_year', 'country_code']) #to check if there is any duplicated values and display it in order by year and country

Unnamed: 0,region_name,region_code,country_name,country_code,reporting_year,reporting_level,survey_acronym,survey_coverage,survey_year,welfare_type,...,reporting_gdp,reporting_pce,is_interpolated,distribution_type,estimation_type,spl,spr,pg,estimate_type,year_dt


#### b. Prioritizing Income over Consumption Data

The PIP data primarily utilizes a monetary measure of welfare which estimated from an aggregation of households’ income or the monetary value of their consumption. Such aggregates jointly refer as welfare aggregates. Consumption is “better” for low- and middle- income countries, where material deprivation is a priority. On the other hand, income is “better” in contexts where living standards are ‘high’ and/or the focus is on minimum rights to resources, and **inequality** ([World Bank, 2021](https://thedocs.worldbank.org/en/doc/5fa9e60429d489daf00df5eb2784fed7-0050012022/related/20221109-gv-iariw-web.pdf)). Therefore, whenever both income and consumption estimates are available for a country in a given reference year, income estimates are preferred because the focus of this project is on inequality.

In [None]:
# Find countries that have both welfare types by counting the welfare type for the same year
welfare_counts = (
    df.groupby(['country_code','reporting_year'])['welfare_type']
      .nunique()
      .reset_index(name='welfare_type_count'))

welfare_counts.sort_values(by='welfare_type_count',ascending=False).head()

Unnamed: 0,country_code,reporting_year,welfare_type_count
2018,SVK,2005,2
2017,SVK,2004,2
1425,LVA,2009,2
1424,LVA,2008,2
1423,LVA,2007,2


In [None]:
# This particular part of prioritizing income data is made with help of ChatGPT
# Prompt: welfare measure for each country can be consumption, income, or both.
# if there are both, I want to use consumption and drop the income. how do I do this?

df_income = (
    df.sort_values(by=['country_code', 'reporting_year', 'welfare_type'], ascending=[True, False, True])
      .groupby(['country_code','reporting_year'])
      .last()  # keeps the last after sorting
      .reset_index()
      )

# double check that income is selected, referring to previous result and taking SVK as sample
print(df_income.loc[(df_income['country_code'] == 'SVK') & (df_income['reporting_year'] == 2005),
 ['country_code', 'reporting_year', 'welfare_type']])

     country_code  reporting_year welfare_type
2018          SVK            2005       income


In [None]:
# Repeat the previous step just to make sure there are no countries have both welfare type
welfare_counts_check = (
    df_income.groupby(['country_code','reporting_year'])['welfare_type']
      .nunique()
      .reset_index(name='welfare_type_count'))

welfare_counts_check.sort_values(by='welfare_type_count',ascending=False).head()

Unnamed: 0,country_code,reporting_year,welfare_type_count
2421,ZWE,2019,1
0,AGO,2000,1
1,AGO,2008,1
2,AGO,2018,1
3,ALB,1996,1


#### c. Filter recent data available
Occasionally, a survey fulfilling World Bank's criteria is not incorporated into PIP data because the World Bank does not have access to the data, because of concerns about the survey design or welfare aggregate, or because the auxiliary data needed are unavailable ([World Bank, 2025](https://datanalytics.worldbank.org/PIP-Methodology/welfareaggregate.html)). This is why not all countries have PIP data on annual basis, even in 2024 there are only 13 countries data available in PIP. Therefore, for this project I adjust the data to include any latest available data for each country.

In [None]:
# Look at the summary level of reporting year
df_by_year = df_income.groupby(['reporting_year'])['country_code'].count().reset_index().sort_values('reporting_year',ascending=False)
df_by_year.head(10)

Unnamed: 0,reporting_year,country_code
61,2024,13
60,2023,53
59,2022,69
58,2021,80
57,2020,70
56,2019,79
55,2018,94
54,2017,79
53,2016,84
52,2015,86


In [None]:
# This particular part of taking the latest available data is made with help of ChatGPT
# Prompt: how do I take the latest available data from each country? I have imported the data using pandas

df_latest = df_income.loc[df_income.groupby('country_code')['reporting_year'].idxmax()].sort_values('country_code').reset_index(drop=True)
df_latest.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 172 entries, 0 to 171
Data columns (total 45 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   country_code          172 non-null    object        
 1   reporting_year        172 non-null    int64         
 2   region_name           172 non-null    object        
 3   region_code           172 non-null    object        
 4   country_name          172 non-null    object        
 5   reporting_level       172 non-null    object        
 6   survey_acronym        172 non-null    object        
 7   survey_coverage       172 non-null    object        
 8   survey_year           172 non-null    float64       
 9   welfare_type          172 non-null    object        
 10  survey_comparability  172 non-null    int64         
 11  comparable_spell      172 non-null    object        
 12  poverty_line          172 non-null    int64         
 13  headcount           

In [None]:
# See the summary of distribution by reporting year
df_by_year_latest = df_latest.groupby(['reporting_year'])['country_code'].count().reset_index().sort_values('reporting_year',ascending=False)
print(df_by_year_latest)

    reporting_year  country_code
19            2024            13
18            2023            40
17            2022            23
16            2021            25
15            2020             9
14            2019            12
13            2018            10
12            2017             8
11            2016             5
10            2015             4
9             2014             6
8             2013             3
7             2012             3
6             2011             3
5             2010             2
4             2009             1
3             2006             1
2             2005             1
1             1998             2
0             1992             1


#### d. Filter data for last 3 years
For the sake of comparability, this project will only consider the data from 3 years prior to the latest available data. Therefore, in this project we will only use the data between 2022 to 2024.

In [None]:
# define the cut-off year by take maximum year-3 instead of hard code to 2022, so can be used for another year
df_last_3y = df_latest[df_latest['reporting_year'] > df_latest['reporting_year'].max()-3]

# Check the data to ensure it only includes data from last 3 years
df_by_year_last3y = df_last_3y.groupby(['reporting_year'])['country_code'].count().reset_index().sort_values('reporting_year',ascending=False)
print(df_by_year_last3y)

   reporting_year  country_code
2            2024            13
1            2023            40
0            2022            23


In [None]:
# Check the data to ensure it each country has only one record
df_by_country_last3y = df_last_3y.groupby(['country_code'])['reporting_year'].count().reset_index().sort_values('reporting_year',ascending=False)
print(df_by_country_last3y)

   country_code  reporting_year
0           ARG               1
1           ARM               1
2           AUT               1
3           BEL               1
4           BGD               1
..          ...             ...
71          USA               1
72          UZB               1
73          VNM               1
74          XKX               1
75          ZMB               1

[76 rows x 2 columns]


## **D. Data Analysis**

#### **Welfare measure type across country**

Both income and consumption approaches to measuring monetary poverty have advantages and disadvantages. Countries typically choose the concept that can be more accurately measured and that is more relevant to their context, while balancing concerns about respondent burden in surveys. This part is trying to give a broad picture on the welfare measure type used by each country.

In [None]:
import plotly.express as px

map_graph = px.choropleth(
    df_latest,
    locations='country_code',
    color='welfare_type',
    hover_data={'reporting_year','welfare_type','country_name'},
    title="Welfare Measure Type by Country<br><sup>Based on each country's latest available data; ties favor income</sup>")

map_graph.update_layout(
    legend_title_text='Welfare Measure Type',
    title_x=0.5)

map_graph.add_annotation(
    text='Source: World Bank PIP (2021 PPP) version [20250930_2021_01_02_PROD] accessed on 2025-10-20',
    x=0.5, y=-0.1,
    showarrow=False,
    font=dict(size=12, color='gray')
    )

map_graph.show()

From the graph above, we can see the pattern of welfare type measure in each country. Most of developed countries use income while most of Asian and African countries used consumption to measure the welfare. This is consistent with World Bank argument that income aggregates are difficult to obtain when a large fraction of the population works in the informal sector or is self-employed, which is frequently the case in poorer economies ([World Bank, 2025](https://datanalytics.worldbank.org/PIP-Methodology/welfareaggregate.html)).<p>

Another thing to note from above graph is that many developed countries have not reported the latest welfare data. Countries such as Australia, Japan, and German last available data is in 2020, while UK, Canada, and South Korea is in 2021. This will affect result of the analysis in this project, since we only use data between 2022-2024.

In [None]:
map_graph_last3y = px.choropleth(
    df_last_3y,
    locations='country_code',
    color='welfare_type',
    hover_data={'reporting_year','welfare_type','country_name'},
    title="Welfare Measure Type by Country<br><sup>Based on each country's latest available data between 2022-2024; ties favor income</sup>")

map_graph_last3y.update_layout(
    legend_title_text='Welfare Measure Type',
    title_x=0.5)

map_graph_last3y.add_annotation(
    text='Source: World Bank PIP (2021 PPP) version [20250930_2021_01_02_PROD] accessed on 2025-10-20',
    x=0.5, y=-0.1,  # position below chart (y < 0 moves below the plot)
    showarrow=False,
    font=dict(size=12, color='gray')
    )

map_graph_last3y.show()

From the graph above, the use of data between 2022-2024 will decrease the number of data point quite significantly. We can see that Africa is under-represented with many countries is not included in the dataset due to its unupdated data. This is an important context to interpret the analysis later on.

#### **Top-20 Country with Highest Welfare Share of the Richest 10%**

In a perfectly equal country, the richest-10% will only accounted for 10% of the welfare share. Therefore, the higher welfare share held by the richest-10% indicates higher inequality within the country. This part is trying to assess which country has the highest welfare share of the richest-10%.

In [None]:
# create new data set for top 20 using latest available data from all country
df_latest_dc10top20 = df_latest.sort_values('decile10', ascending=False).head(20)
order = list(df_latest_dc10top20.sort_values('decile10', ascending=False)['country_name'])

dc10top20 = px.bar(df_latest_dc10top20,
            x='country_name',
            y='decile10',
            color='region_name',
            hover_data={'reporting_year','welfare_type'},
            title="Top-20 Country with Highest Welfare Share of the Richest 10%<br><sup>Based on each country's latest available data; ties favor income</sup>",
            category_orders={'country_name': order})          # <-- enforce the x order, do this with help of ChatGPT

dc10top20.update_layout(
    yaxis_tickformat='.0%',
    xaxis_title='Country',
    yaxis_title='Welfare Share of the Richest 10%')

dc10top20.show()

In [None]:
# create new data set for top 20
df_latest_dc10top20 = df_last_3y.sort_values('decile10', ascending=False).head(20)
order = list(df_latest_dc10top20.sort_values('decile10', ascending=False)['country_name'])

dc10top20 = px.bar(df_latest_dc10top20,
            x='country_name',
            y='decile10',
            color='region_name',
            hover_data={'reporting_year','welfare_type'},
            title="Top-20 Country with Highest Welfare Share of the Richest 10%<br><sup>Based on each country's latest available data between 2022-2024; ties favor income</sup>",
            category_orders={'country_name': order})          # <-- enforce the x order, do this with help of ChatGPT

dc10top20.update_layout(
    yaxis_tickformat='.0%',
    xaxis_title='Country',
    yaxis_title='Welfare Share of the Richest 10%')

dc10top20.show()

Here we can see that the majority of the Top-20 countries with highest welfare share of the richest-10% is from Sub-Saharan Africa and Latin America & Carribbean country. If we only consider data between 2022-2024 then the number of Sub-Saharan African countries drop because many data are missing. Another fascinating thing is that the United States made it into the Top-20 (ranked 19th) and being the only developed country in this list.

#### **Inequality across populous countries**
The bigger the population, the harder it is to manage a country. I think it is unfair to compare Indonesia with population of around 270 million people with another country whose population might be less than 5 million. This part is trying to compare the inequality across countries with population of more than 50 million people.

In [None]:
# Setting parameter for population threshold
pop_threshold = 50_000_000

# Keep only countries whose population exceeds threshold
big_country_3y = df_last_3y[df_last_3y['reporting_pop'] > pop_threshold]

# See the list of country whose population exceeds threshold
by_country_biglast3y = big_country_3y.groupby(['country_code','reporting_year'])['reporting_pop'].sum().reset_index().sort_values('reporting_pop',ascending=False)
by_country_biglast3y['reporting_pop'] = by_country_biglast3y['reporting_pop'] / (10**6) #population in million
print(by_country_biglast3y)

   country_code  reporting_year  reporting_pop
6           IND            2022    1432.758115
2           CHN            2022     897.578430
16          USA            2023     336.806231
5           IDN            2024     283.487931
11          NGA            2022     226.510651
1           BRA            2023     211.140729
0           BGD            2022     169.384897
13          RUS            2023     143.826130
10          MEX            2022     128.613117
12          PHL            2023     114.891199
17          VNM            2022      99.680655
7           IRN            2023      90.829284
15          TUR            2022      84.979913
14          THA            2023      71.702435
4           FRA            2023      68.287487
8           ITA            2023      58.993475
9           KEN            2022      54.980444
3           COL            2023      52.321152


In [None]:
# Create new metrics for certain population decile
# Added the additional .loc[:] and .copy as response from a warning is that's missing
big_country_3y.loc[:,'bottom50'] = (big_country_3y['decile1']
                           +big_country_3y['decile2']
                           +big_country_3y['decile3']
                           +big_country_3y['decile4']
                           +big_country_3y['decile5']).copy()
big_country_3y.loc[:,'middle40'] = (big_country_3y['decile6']
                           +big_country_3y['decile7']
                           +big_country_3y['decile8']
                           +big_country_3y['decile9']).copy()
big_country_3y.loc[:,'top10'] = (big_country_3y['decile10']).copy()
big_country_3y.info()

<class 'pandas.core.frame.DataFrame'>
Index: 18 entries, 31 to 68
Data columns (total 48 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   country_code          18 non-null     object        
 1   reporting_year        18 non-null     int64         
 2   region_name           18 non-null     object        
 3   region_code           18 non-null     object        
 4   country_name          18 non-null     object        
 5   reporting_level       18 non-null     object        
 6   survey_acronym        18 non-null     object        
 7   survey_coverage       18 non-null     object        
 8   survey_year           18 non-null     float64       
 9   welfare_type          18 non-null     object        
 10  survey_comparability  18 non-null     int64         
 11  comparable_spell      18 non-null     object        
 12  poverty_line          18 non-null     int64         
 13  headcount             18 n

All the new variables is in a new column which means the data is in wide format. I need to change it into long format so it is easier to plot. This is not covered in class, so I do it with the help of ChatGPT.

In [None]:
# This part is made with help of ChatGPT
# Prompt: I just add new column of bottom50, middle40, and top10. how do I put all these three columns into stacked bar for each country?

# Sort data based on bottom 50
big_country_3y = big_country_3y.sort_values('bottom50')

# Reshape the dataframe to long format (helped by ChatGPT)
df_melt = big_country_3y.melt(
    id_vars=['country_name','welfare_type'],
    value_vars=['bottom50','middle40','top10'],
    var_name='Group',
    value_name='Share')

# Build stacked bar chart
big_country_pareto = px.bar(df_melt,
                      x='country_name',
                      y='Share',
                      color='Group',
                      text='Share',
                      title='Welfare Distribution by Country<br><sup>Selected countries with population >50 million, data from the latest available year between 2022-2024</sup>')

# Format percentages nicely (this also helped by ChatGPT)
big_country_pareto.update_traces(texttemplate='%{text:.1%}',
                                 textposition='inside',
                                 insidetextanchor='middle',
                                 textfont=dict(color='white'))

big_country_pareto.update_layout(yaxis_tickformat='.0%',  # display y-axis as percentages
                                legend_title_text='Welfare Group',
                                yaxis_title='Share of Welfare',
                                xaxis_title='Country')

big_country_pareto.show()

From the above graph, it seems that the welfare share of Middle-40% group across countries relatively stable in around 45%. What differentiate the country is the welfare share of the Richest-10% which then directly relates to the welfare share of Bottom-50%. The difference across country can be quite huge with the Richest-10% in Colombia takes about 43% welfare share, almost doubled those Richest-10% in India. In terms of inequality, Indonesia performs relatively well compared to other countries whose population is more than 50 million. <p>

India case in particular is also fascinating as the country has a reputation of a very stark inequality between the rich and the poor, but the data seems to says the opposite. One possible explanation is that India's PIP data use consumption data which might not describe wealth distribution very well. However, wealth data is beyond the scope of this project. <p>


#### **Welfare gap between the Rich and the Poor**

Higher welfare gap between the rich and the poor indicates higher inequality within the country. This part is trying to compare the welfare gap across countries using data between 2022-2024.

In [None]:
# This part is made with help of ChatGPT
# Prompt: Can I make a combo graph in Python? I'm thinking on calculating the difference between decile10-decile1 and display it on bar chart.
# (cont.) ..then I indicate the decile 1 and decile 10 by a dot in the same graph. how do I do it?

import plotly.graph_objects as go

# Calculate the difference between Decile 10 and Decile 1
df_last_3y['diff_10_1'] = (df_last_3y['decile10'] - df_last_3y['decile1'])*100

# Sort by difference for nicer visualization
df_last_3y = df_last_3y.sort_values('diff_10_1', ascending=False)

# Build Plotly figure (go.Figure for layering)
fig = go.Figure()

# Bar: difference (decile10 - decile1)
fig.add_trace(
    go.Bar(
        x=df_last_3y['country_name'],
        y=df_last_3y['diff_10_1'],
        name='Richest 10% - Poorest 10%',
        marker_color='lightgrey',
        hovertemplate='<b>%{x}</b><br>Welfare Gap: %{y:.1f}%<br>Year: %{customdata[0]}<br>Welfare Type: %{customdata[1]}<extra></extra>',
        customdata=df_last_3y[['reporting_year', 'welfare_type']]
    )
)

# Scatter: decile1 (little dot)
fig.add_trace(
    go.Scatter(
        x=df_last_3y['country_name'],
        y=df_last_3y['decile1']*100,
        mode='markers',
        name='Poorest 10%',
        marker=dict(symbol='circle', size=10),
        hovertemplate='<b>%{x}</b><br>Poorest 10%: %{y:.1f}%<br>Year: %{customdata[0]}<br>Welfare Type: %{customdata[1]}<extra></extra>',
        customdata=df_last_3y[['reporting_year', 'welfare_type']]
    )
)

# Scatter: decile10 (little dot)
fig.add_trace(
    go.Scatter(
        x=df_last_3y['country_name'],
        y=df_last_3y['decile10']*100,
        mode='markers',
        name='Richest 10%',
        marker=dict(symbol='diamond', size=10, color='green'),
        hovertemplate='<b>%{x}</b><br>Richest 10%: %{y:.1f}%<br>Year: %{customdata[0]}<br>Welfare Type: %{customdata[1]}<extra></extra>',
        customdata=df_last_3y[['reporting_year', 'welfare_type']]
    )
)

# setting graph layout
fig.update_layout(
    title="Welfare Gap between the Richest 10% and the Poorest 10%<br><sup>Data from the latest available year between 2022-2024</sup>",
    xaxis_title="Country",
    yaxis_title="Share of Welfare (%)",
    template="plotly_white",
    xaxis=dict(range=[-0.5, len(df_last_3y['country_name']) - 0.5]),  # remove side gaps
    bargap=0.15,                      # remove space *between* bars
    width=1700,height=500,
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=0.85
    )
)

# compute median
median_gap = df_last_3y['diff_10_1'].median()
median_poorest = df_last_3y['decile1'].median()*100
median_richest = df_last_3y['decile10'].median()*100

# add horizontal line as median
fig.add_hline(y=median_gap, line_dash="dot", line_color="black", annotation_text=f"Median Gap = {median_gap:.1f}%", annotation_position="bottom left", annotation_font_color="black")
fig.add_hline(y=median_poorest, line_dash="dot", line_color="red", annotation_text=f"Median Poorest 10% = {median_poorest:.1f}%", annotation_position="top left", annotation_font_color="red")
fig.add_hline(y=median_richest, line_dash="dot", line_color="green", annotation_text=f"Median Richest 10% = {median_richest:.1f}%", annotation_position="top left", annotation_font_color="green")

# add footer
fig.add_annotation(
    text="Source: World Bank PIP (2021 PPP) version [20250930_2021_01_02_PROD] accessed on 2025-10-20",
    xref="paper", yref="paper",
    x=0, y=-0.7,
    showarrow=False,
    font=dict(size=11, color="gray"),
    xanchor='left'
)

fig.show()

From the graph above, the richest-10% capture 20% to more than 40% of total welfare, while the poorest-10% consistently only receive less than 5%, underscoring deep structural inequality. Colombia being the country with highest gap of 41.5% where the richest-10% capture up to 42.7% while the poorest-10% only get 1.1%. The United States becomes the developed country with highest welfare gap. On the other hand, Nordic countries (e.g. Denmark, Norway, Sweden, Finland) that is known for their extensive social welfare programs and prioritizing high-quality public services like universal healthcare, education, and strong social safety nets have relatively lower welfare gap.


### **E. Conclusion and Discussion**

This project highlights **differences in welfare measurement** and inequality patterns across countries using World Bank PIP data. Developed countries mostly rely on income-based welfare measures, while many Asian and African nations use consumption-based indicators due to large informal sectors. However, **data availability declines sharply after 2021**, which limits the number of observations for the 2022–2024 period and leads to underrepresentation of Sub-Saharan Africa regions and some notable developed countries such as the UK, Japan, Canada, and Australia.<p>

The analysis shows that the richest-10% capture between 20% to 40% of total welfare, while the poorest-10% receive less than 5%. The highest inequality levels appear in **Sub-Saharan Africa and Latin America**, with **Colombia** being the most unequal and the **United States** standing out among developed nations. In contrast, Nordic countries display lower welfare gaps, consistent with their strong social welfare systems.<p>

Among populous countries, **Indonesia performs relatively well**, with inequality levels lower than many peers. Among populous countries, the middle-40% hold a stable share of about 45%, suggesting that inequality differences mainly stem from the top and bottom groups. India’s unexpectedly moderate inequality likely reflects its use of consumption data, which understates wealth gaps. Overall, the findings **underscore the importance of consistent and up-to-date data** for accurate cross-country inequality comparisons.<p>

Nevertheless, several caveats must be noted. Limited recent data (2022–2024) reduce the overall sample coverage and may bias cross-country comparisons. In addition, welfare measure based on income or consumption may underestimate the actual wealth inequality. Future work should expand the analysis by incorporating Gini coefficients, wealth-based indicators, and a more detailed time-series and regional approach to better capture long-term inequality dynamics and data variation across countries.

