# Tidy Tuesday: Income Inequality Before and After Taxes
**August 5, 2025**

Today's goal is to answer the [5 questions](https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-08-05/readme.md#:~:text=Which%20countries%20have,the%20available%20data%3F) from the readme file. They are:
* Which countries have the highest Gini coefficient before taxes?
* Which countries have the highest Gini coefficient after taxes?
* Which countries have the highest shifts in Gini coefficient?
* Which countries have the lowest shifts in Gini coefficient?
* Which countries have had the highest changes in redistribution in the available data?

## Prepare
### Knowing the data
From the Tidy Tuesday page:

>The Gini coefficient measures inequality on a scale from 0 to 1. Higher values indicate higher inequality ... Income has been equivalized – adjusted to account for the fact that people in the same household can share costs like rent and heating.

In [46]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [47]:
# Import data
income = pd.read_csv('income_inequality_processed.csv')
income

Unnamed: 0,Entity,Code,Year,gini_mi_eq,gini_dhi_eq
0,Australia,AUS,1989,0.431,0.304
1,Australia,AUS,1995,0.470,0.311
2,Australia,AUS,2001,0.481,0.320
3,Australia,AUS,2003,0.469,0.316
4,Australia,AUS,2004,0.467,0.316
...,...,...,...,...,...
942,Vietnam,VNM,2005,,0.369
943,Vietnam,VNM,2007,,0.401
944,Vietnam,VNM,2009,,0.398
945,Vietnam,VNM,2011,,0.364


## Cleaning the data

In [48]:
print('Data Types:')
print(income.dtypes)
print()
print('Count of NAs:')
print(income.isna().sum())

Data Types:
Entity          object
Code            object
Year             int64
gini_mi_eq     float64
gini_dhi_eq    float64
dtype: object

Count of NAs:
Entity           0
Code             0
Year             0
gini_mi_eq     398
gini_dhi_eq      0
dtype: int64


Data types look right. The number of NAs is concerning. Considering how the questions from the goal relate to the shift between the pre-tax (`gini_mi_eq`) and the post-tax (`gini_dhi_eq`) gini coefficients, we will need both values to accomplish that goal. Thus, I can remove the rows with NAs.

In [49]:
income = income.dropna().reset_index()
income

Unnamed: 0,index,Entity,Code,Year,gini_mi_eq,gini_dhi_eq
0,0,Australia,AUS,1989,0.431,0.304
1,1,Australia,AUS,1995,0.470,0.311
2,2,Australia,AUS,2001,0.481,0.320
3,3,Australia,AUS,2003,0.469,0.316
4,4,Australia,AUS,2004,0.467,0.316
...,...,...,...,...,...,...
544,912,United States,USA,2019,0.505,0.394
545,913,United States,USA,2020,0.521,0.376
546,914,United States,USA,2021,0.517,0.371
547,915,United States,USA,2022,0.512,0.393


I will also rename `gini_mi_eq` and `gini_dhi_eq` to `PreTax` and `PostTax`, respectively. This will help with readability and referencing.

In [50]:
income = income.rename(columns={'gini_mi_eq': 'PreTax', 'gini_dhi_eq': 'PostTax'})
income.columns

Index(['index', 'Entity', 'Code', 'Year', 'PreTax', 'PostTax'], dtype='object')

The column `Code` is redundant. It contains a 3-digit code for each country as an identifier. Using `Entity` is fine.

In [51]:
income = income.drop(columns=['index','Code']) # also drops index column

Lastly, I will need an additional column that will contain the values of the shifts between `PostTax` and `PreTax`. Needed to answer the questions.

In [52]:
Shifts = [] # To contain the shift between tax

# Loop to find the gini coefficient differencet
for row in range(len(income)):
    change = income['PostTax'][row] - income['PreTax'][row]
    Shifts.append(change)
    
income['Shifts'] = Shifts # Create new column

In [53]:
income

Unnamed: 0,Entity,Year,PreTax,PostTax,Shifts
0,Australia,1989,0.431,0.304,-0.127
1,Australia,1995,0.470,0.311,-0.159
2,Australia,2001,0.481,0.320,-0.161
3,Australia,2003,0.469,0.316,-0.153
4,Australia,2004,0.467,0.316,-0.151
...,...,...,...,...,...
544,United States,2019,0.505,0.394,-0.111
545,United States,2020,0.521,0.376,-0.145
546,United States,2021,0.517,0.371,-0.146
547,United States,2022,0.512,0.393,-0.119


Data is now ready to use.

## Analysis
Recall the goals:
1. Which countries have the highest Gini coefficient before taxes?
2. Which countries have the highest Gini coefficient after taxes?
3. Which countries have the highest shifts in Gini coefficient?
4. Which countries have the lowest shifts in Gini coefficient?
5. Which countries have had the highest changes in redistribution in the available data?

### 1) Which countries have the highest Gini coefficient before taxes? ###
In other words, which countries had the highest level of income inequality before taxes?

In [55]:
# Sort income by PreTax values, top and bottom 5
income[['Entity', 'Year', 'PreTax']].sort_values(['PreTax'], ascending=False)

Unnamed: 0,Entity,Year,PreTax
377,South Africa,2010,0.754
376,South Africa,2008,0.754
378,South Africa,2012,0.723
379,South Africa,2015,0.717
380,South Africa,2017,0.706
...,...,...,...
226,Iceland,2003,0.333
442,United Kingdom,1968,0.332
163,Finland,1991,0.330
162,Finland,1987,0.327


**South Africa occupies all top 5 instances of the highest before-tax Gini coefficient.** Specifically, for the years 2010, 2008, 2012, 2015, and 2017.

Iceland in 2017 has the lowest recorded Gini coefficient. Finland and the United Kingdom take up the other sports in the bottom 5, followed by another instance of Iceland.

### 2) Which countries have the highest Gini coefficient after taxes? ###

In [56]:
# Sort income by PostTax values, top and bottom 5
income[['Entity', 'Year', 'PostTax']].sort_values(['PostTax'], ascending=False)

Unnamed: 0,Entity,Year,PostTax
377,South Africa,2010,0.664
376,South Africa,2008,0.658
379,South Africa,2015,0.623
378,South Africa,2012,0.621
380,South Africa,2017,0.616
...,...,...,...
142,Denmark,1995,0.220
374,Slovakia,2017,0.219
164,Finland,1995,0.217
163,Finland,1991,0.209


**South Africa also occupies all top 5 instances of the highest after-tax Gini coefficient.** The same years are recorded, although the ordering of South Africa's 2012 and 2015 records are swapped.

The countries with the lowest income inequality after-taxes are Finland (1987, 1991, 1995), Slovakia (2017) and Denmark (1995).

### 3) Which countries have the highest shifts in Gini coefficient? ###

In [65]:
# Top 10
income.sort_values(['Shifts'], ascending=True)[0:10]

Unnamed: 0,Entity,Year,PreTax,PostTax,Shifts
250,Ireland,2010,0.577,0.298,-0.279
252,Ireland,2012,0.581,0.308,-0.273
249,Ireland,2009,0.577,0.311,-0.266
251,Ireland,2011,0.569,0.305,-0.264
253,Ireland,2013,0.574,0.31,-0.264
254,Ireland,2014,0.549,0.297,-0.252
50,Belgium,2020,0.495,0.247,-0.248
255,Ireland,2015,0.544,0.298,-0.246
400,Sweden,1995,0.467,0.221,-0.246
248,Ireland,2008,0.537,0.295,-0.242


**Ireland** dominates the top 10 records of the greatest shifts of the Gini coefficient between before and after-taxes. Belgium and Sweden each make a single appearance.

### 4) Which countries have the lowest shifts in Gini coefficient? ###

In [67]:
# Bottom 10
income.sort_values(['Shifts'], ascending=False)[0:10]

Unnamed: 0,Entity,Year,PreTax,PostTax,Shifts
156,Dominican Republic,2007,0.523,0.515,-0.008
54,Brazil,1995,0.617,0.568,-0.049
52,Brazil,1992,0.59,0.54,-0.05
231,Iceland,2008,0.364,0.314,-0.05
55,Brazil,1996,0.615,0.564,-0.051
53,Brazil,1993,0.615,0.562,-0.053
56,Brazil,1997,0.614,0.561,-0.053
57,Brazil,1998,0.614,0.557,-0.057
229,Iceland,2006,0.365,0.307,-0.058
230,Iceland,2007,0.356,0.298,-0.058


The Dominican Republic doesn't even record a hundredth of a shift in 2007, with a recorded difference in -0.008. Brazil's 1992 and Iceland's 2008 share third place with a shift of -0.05.

### 5) Which countries have had the highest changes in redistribution in the available data? ###

In [70]:
income.Entity.value_counts()

Entity
United Kingdom        54
United States         53
Canada                45
Germany               39
Israel                25
Sweden                23
Netherlands           23
Belgium               21
Brazil                21
Ireland               21
Austria               20
Luxembourg            19
Switzerland           19
Spain                 19
Bulgaria              16
Denmark               16
Greece                16
Iceland               15
Japan                 13
Lithuania             13
Australia             11
Slovakia               9
Finland                9
Czechia                6
Norway                 6
South Africa           5
Estonia                5
Romania                3
Italy                  3
Dominican Republic     1
Name: count, dtype: int64