<a href="https://colab.research.google.com/github/imlilalex/Module6GDPAssignment/blob/main/Module_6_Assignment_Income_inequality_in_relation_to_GDP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt

%load_ext google.colab.data_table

In [25]:
gini_file = ('economic-inequality-gini-index.csv')
gdp_file = ('gdp-per-capita-maddison-2020.csv')

gini_df = pd.read_csv(gini_file)
gdp_df = pd.read_csv(gdp_file)

df = pd.merge(gini_df, gdp_df, on = ["Entity", "Code", "Year"])
df.drop(columns=['417485-annotations', 'Code'], inplace = True)
df.columns = ['country', 'year', 'gini', 'gdp']
df['year'] = pd.to_datetime(df['year'], format='%Y')
df

Unnamed: 0,country,year,gini,gdp
0,Albania,1996-01-01,0.270103,3965.685303
1,Albania,2002-01-01,0.317390,5608.962402
2,Albania,2005-01-01,0.305957,6858.466797
3,Albania,2008-01-01,0.299847,8522.129883
4,Albania,2012-01-01,0.289605,9592.000000
...,...,...,...,...
1805,Zambia,2006-01-01,0.546175,2133.593994
1806,Zambia,2010-01-01,0.556215,3032.067871
1807,Zambia,2015-01-01,0.571361,3478.000000
1808,Zimbabwe,2011-01-01,0.431536,1515.000000


# Is there a relation between a country's Gross Domestrict Product (GDP) and its income inequality?

In [95]:
group_df = df.groupby('country')

correlation = group_df[['gini','gdp']].corr().iloc[0::2,-1]
new_df = pd.merge(df, correlation, on = 'country')
new_df.columns = ['country', 'year', 'gini', 'gdp', 'pearson']

# new_df[new_df['country'].str.contains('France')]
new_df

Unnamed: 0,country,year,gini,gdp,pearson
0,Albania,1996-01-01,0.270103,3965.685303,0.567214
1,Albania,2002-01-01,0.317390,5608.962402,0.567214
2,Albania,2005-01-01,0.305957,6858.466797,0.567214
3,Albania,2008-01-01,0.299847,8522.129883,0.567214
4,Albania,2012-01-01,0.289605,9592.000000,0.567214
...,...,...,...,...,...
1805,Zambia,2006-01-01,0.546175,2133.593994,0.392477
1806,Zambia,2010-01-01,0.556215,3032.067871,0.392477
1807,Zambia,2015-01-01,0.571361,3478.000000,0.392477
1808,Zimbabwe,2011-01-01,0.431536,1515.000000,1.000000


In [62]:
avg_gini = group_df['gini'].mean()
avg_gdp = group_df['gdp'].mean()
df1 = pd.merge(avg_gini, avg_gdp, on = 'country')

pearson = df1.corr().iloc[0::2,-1]
pearson

gini   -0.457807
Name: gdp, dtype: float64

When looking at the data overall and assessing the differences between gdp and income inequality, we obtain a pearson correlation value of -0.457807. According to the Rea and Parker(2014) interpretations, this is a 'relatively strong' negative correlation between the two values. This states that as one increases, the other decreases. 

However, when breaking this down and running the same relationship test per country instead of overall, the values significantly change. Even only looking at the top few countries (alphabetically), you can see Albania has a pearson coefficient of 0.567214 - which alone is very different to our overall observation. But this is also vastly different to the second country Algeria, which has a coefficient of -0.934937.

Of course, the difference in the amount of data present will impact these results (Algeria for example only has 3 measures while Albania has 10), with larger volumes of data producing a more accurate result. Simply assessing by country may also not take into consideration any significant changes that occurred over time that may impact this value, such as changes in technology, political structures or other significant events. As such it may be relevant to assess how these values change over time.

In [None]:
time_df = df.groupby('year')

correlation = time_df[['gini','gdp']].corr().iloc[0::2,-1]
df2 = pd.merge(df, correlation, on = 'year')
df2.columns = ['country', 'year', 'gini', 'gdp', 'pearson']

# df2[df2['country'].str.contains('United Kingdom')]
df2

In [88]:
time_gini = time_df['gini'].mean()
time_gdp = time_df['gdp'].mean()
df3 = pd.merge(time_gini, time_gdp, on = 'year')

pearson = df3.corr().iloc[0::2,-1]
pearson

gini   -0.486979
Name: gdp, dtype: float64

In [None]:
df4 = df2.groupby('year')
df4['pearson'].mean()

When looking at the data for all countries over time and assessing the differences between gdp and income inequality, we obtain a pearson correlation value of -0.486979. According to the Rea and Parker(2014) interpretations, this is still a 'relatively strong' negative correlation between the two values. This states that as one increases, the other decreases.

Compared to looking at the value overall, the difference is relatively small and does not change our overall assumption of the relationship. However, it does show us that the relationship between these values remains more consistently negative than when looking at each country individually. Also, while there is some deviation from the correlation stated above, the values for each years data are much more close to each other than previously seen between countries. This also seems to have become more stable after 2003 which only the data for 2018 being further away since that year. The data for earlier years should specifically be noted to not be reliable due to lack of data for most countries during those years and also the clearly unusual results of the coefficient (a 1 or -1 is noted as a perfect relationship which is very unlikely). 

Still, despite the values calculated, it is very difficult to be certain of the correlation between the two. To further explore, I looked at the coefficients of a those who have the lowest gdp and higher but there is still variation. While those with low gdp tend more towards a negative pearson relationship and those with higher gdp tend towards to a positive one, there are some notable outliers such as Bangladesh (a low gdp but 0.543098 coefficient) and Ireland or Switzerland (high gdp but -0.883508 and -0.641813 respectively for their pearson coefficient). This may mean that the relationship between the two is more closely linked to the socio-economic structures within the individual countries which, while often closely resembled by gpd, is much more difficult to gauge without a much more extensive study. This can also be seen somewhat with the time related data - as many countries have an increased availability of technology which improves many factors throughout everyday life, the gdp has also increased which has been particularly notable in the last 20 years compared to previous. 

In conclusion, while the data does show an overall relationship between gdp and income inequality, the variations between countries both statically and overtime make it difficult to draw this as the ultimate cause-effect. It is much more likely to be more complicated this and rely on factors not considered in this data set.