# Measuring Income Equality with the Gini Coefficient

As we discussed in our numpy exercises, one frequently used measure of inequality is the Gini Coefficient. The Gini Coefficient takes on a value of 1 when the distribution of some property is maximally unequal across a said of entities, and a value of 0 when it is evenly distributed. 

In this exercise, we will calculate the Gini Coefficient for income inequality across the countries of the world to get a sense of not income inequality *within* a country, but rather income inequality *across* countries. 

## The Gini Coefficient In Detail

To visualize the Gini Coefficient, we plot the cumulative share of the population (ordered from poorest to richest) on the x-axis, and cumulative share of income earned by the cumulative proportion of entities on the y-axis. The Gini Coefficient is then defined as $$\frac{A}{A + B}$$, where the areas A and B are labeled below: 

![gini_coefficient](https://upload.wikimedia.org/wikipedia/commons/thumb/5/59/Economics_Gini_coefficient2.svg/800px-Economics_Gini_coefficient2.svg.png)

If income is evenly distributed, then the poorest 20% of a population will also have 20% of the wealth; the poorest 40% will have 40% of the wealth, and so forth, resulting in a perfect 45 degree line. In this situation, there is no area between the 45% line and the actual income distribution, so $A=0$, and the Gini Coefficient is 0. 

If, by contrast, the top 10% of people hold all the wealth in a country, then there will be no wealth for the poorest 90% of people, then wealth will jump up at the far right side of the graph. This will generate a very large gap between the 45% line and actual income for most of the graph, generating a large value for the area $A$, creating a very high Gini Coefficient. 

Note that this is only one of many ways that one can measure inequality, and there is no "correct" measure; most of the time we are interested in measuring inequality because we have an ethical concern about inequality itself or because we worry that inequality may give rise to other negative phenomena we care about (e.g., one may worry that economic inequality gives rise to inequality and political influence). Because one may care about inequality for a range of reasons, the "correct" measure of the inequality depends entirely on your substantive application! You can learn more about [different measures of inequality here](https://en.wikipedia.org/wiki/Income_inequality_metrics). In this case, we are trying to get a general sense of inequality that puts equal weight on inequities at any point in the income distribution, which is what we get with the Gini coefficient. If we were more concerned with just understanding inequality at the very top of the income distribution, we might use a measure [like an income ratio.](https://en.wikipedia.org/wiki/Income_inequality_metrics#Share_of_income)

To illustrate what Gini curves look like, here are a few different Gini plots. These come from someone studying inequality of participation, so to adapt this to our study of income, just imagine the y-axis plots share of income):

![gini_distributions](https://miro.medium.com/max/595/0*3DTcZnzDwS6A6AtP)




### Exercise 1

To begin, load data on countries' Gross Domestic Product (GDP) per capita. GDP is a measure of how much is produced by a country's entire economy in a year, and "per capita" just means the number has been divided by each country's population, given a measure of the amount of economic production generated per person in a country. Essentially, this is a very crude measure of the wealth of each country.

Note that GDP per capita is measured in US dollars (so you can make comparisons across countries), and conversions from local currency to the US dollar were made using [Purchasing Power Parity (PPP)](https://en.wikipedia.org/wiki/Purchasing_power_parity) exchange rates (don't worry if that doesn't mean anything to you—just an added detail for any economists reading this).

```python
import pandas as pd
gdppercap = pd.Series([34605, 34493, 12393, 44200, 10041, 
                       58138, 4709, 49284, 10109, 42536],
                      index=['Bahrain', 'Belgium', 'Bulgaria',
                             'Ireland', 'Macedonia', 'Norway', 
                             'Paraguay', 'Singapore', 
                             'South Africa', 'Switzerland']
                      )                   
```

### Exercise 2

Find the mean, median, minimum and maximum values of GDP per capita in this data. 

### Exercise 3

Programmatically, determine which country in our data has the highest income per capita, and which has the lowest income per capita.

(Obviously, this is easier to do by just looking at the data, but that's only because this dataset is very small. With a real dataset, you would need to do it with code, so please write code to accomplish this task.)

Hint: Country names form the index for this Series, so to get country names you'll need to access the index. 

In [1]:
import pandas as pd
gdppercap = pd.Series([34605, 34493, 12393, 44200, 10041, 
                       58138, 4709, 49284, 10109, 42536],
                      index=['Bahrain', 'Belgium', 'Bulgaria',
                             'Ireland', 'Macedonia', 'Norway', 
                             'Paraguay', 'Singapore', 
                             'South Africa', 'Switzerland']
                      )                   


### Exercise 4

Get Python to print out the names of all the countries that have GDP per capita of less than \$10,000.

### Exercise 5 

Get Python to print out the GDP per capita of Switzerland.

## Calculating Gini Coefficients

In our previous exercise, we used a library function to calculate the Gini Coefficient for our data; today we are going to calculate it ourselves!

For discrete data, the Gini Coefficient can be calculated with the following formula: 

$$\frac{2 \sum_{i=1}^n i y_i}{n \sum_{i=1}^n y_i} -\frac{n+1}{n}$$

Where $i$ is each country's rank ordering from poorest to richest, and $y_i$ is the income of country $i$.

### Exercise 6

Begin by writing a function to calculate the Gini Coefficient for our data *by looping over the entries in our Series*. In other words, try and embrace the spirit of how you might normally think about interpreting the summation notation written above. 

**HINT**: Be careful with 0-indexing! Python counts from 0, but mathematical formulas (like $\sum$) start from 1!

### Exercise 7

Excellent! But as we've seen in [our readings](../11_vectorization.ipynb), in data science we generally strive to *not* loop over the entries in our arrays; instead, we aspire to write *vectorized code* that naturally applies a simple operation to each observation.

So now write a new function to calculate the Gini Coefficient that *doesn't* use loops, and instead relies on vectorized code.

**HINT:** you will probably have to create some new arrays.

### Exercise 8

The result we just generated offers a snapshot of inequality for this subset of countries. But what are the dynamics of inequality for these countries?

There is an idea in economics called the "convergence hypothesis", which argues that poorer countries are likely to grow faster, and as a result global inequality is likely to decline. Economists advocating for this hypothesis pointed out that while rich countries had to invent new technologies in order to grow, many poor countries simply had to take advantage of innovations already developed by rich countries. 

To test this hypothesis, let's do a small analysis of the dynamics of income inequality in our sample. Using the code below, load a new Series with the average GDP growth rate for our countries from to .


```python
avg_growth = pd.Series([-0.29768835, 0.980299584, 4.52991925, 
                        3.686556736, 2.621416804, 0.775132075, 
                        2.015489468, 3.345793635, 1.349993318, 
                        0.982775018],
                        index=['Bahrain', 'Belgium', 'Bulgaria', 
                               'Ireland', 'Macedonia', 'Norway', 
                               'Paraguay', 'Singapore', 
                               'South Africa', 'Switzerland']
                      )
```

### Exercise 9

Using this data on average growth rates in GDP per capita, and assuming growth rates from 2000 to 2018 continue into the future, estimate what our Gini Coefficient may look like in 2025 (remembering that income in our data is from 2008, so we're extrapolating ahead 17 years)?

**Hint:** the formula for compound growth (i.e. value of something growing at a rate of `x` percent for $t$ periods) is:

$$future\_value = current\_value * (1 + \frac{percentage\_growth\_rate}{100})^t$$

## Exercise 8 

Interpret your result -- does it seem to imply that we are seeing covergence or not?

[After you're done, you can see a more systematic version of this analysis here!](https://www.cgdev.org/blog/everything-you-know-about-cross-country-convergence-now-wrong)

## Absolutely positively need the solutions?

*Don't use this link until you've really, really spent time struggling with your code!* Doing so only results in you cheating yourself. 

[Link](../solutions_warning.ipynb)