# Ageing population and GDP
by Llewellyn Deeprose-Morrison, 14 August 2020  

In this document we look at the relationships between:
- Percentage of population over 65 years old and GDP (\$)
- Percentage of population over 65 years old and population size
- Percentage of population over 65 years old and GDP per capita (\$)

## Collecting the data
The data will be collected from the World Bank's website <https://data.worldbank.org/>. We will collect data on the percentage of over-65s in the country, its population, and its GDP in \$.

In [16]:
import warnings
warnings.simplefilter('ignore', FutureWarning)
from pandas import *
from pandas_datareader.wb import download
age_indicator = 'SP.POP.65UP.TO.ZS'
pop_indicator = 'SP.POP.TOTL'
gdp_indicator = 'NY.GDP.MKTP.CD'
YEAR = 2019
ages = download(indicator=age_indicator, country='all', start=YEAR, end=YEAR)[47:].dropna().reset_index()
pop = download(indicator=pop_indicator, country='all', start=YEAR, end=YEAR)[47:].dropna().reset_index()
gdp = download(indicator=gdp_indicator, country='all', start=YEAR, end=YEAR)[47:].dropna().reset_index()

The GDP data needs to be rounded to the nearest million, and the population data to the nearest thousand, to make them easier to work with. This is achieved below.

In [17]:
def roundToMillions(g):
    return round(g/1000000)
GDP = 'GDP in $m'
COUNTRY = 'country'
gdp[GDP] = gdp[gdp_indicator].apply(roundToMillions)
gdpClean = gdp[[COUNTRY, GDP]]
gdpClean

Unnamed: 0,country,GDP in $m
0,Afghanistan,19291
1,Albania,15279
2,Algeria,171091
3,Andorra,3154
4,Angola,88816
...,...,...
181,Vanuatu,934
182,Vietnam,261921
183,"Yemen, Rep.",22581
184,Zambia,23310


In [18]:
def roundToThousands(p):
    return round(p/1000)

POP = 'Population (1000s)'
pop[POP] = pop[pop_indicator].apply(roundToThousands)
popClean = pop[[COUNTRY, POP]]
popClean

Unnamed: 0,country,Population (1000s)
0,Afghanistan,38042
1,Albania,2854
2,Algeria,43053
3,American Samoa,55
4,Andorra,77
...,...,...
211,Virgin Islands (U.S.),107
212,West Bank and Gaza,4685
213,"Yemen, Rep.",29162
214,Zambia,17861


We can use this data to calculate each country's GDP per capita.

In [19]:
gdpVsPop = merge(gdp, pop, on=COUNTRY, how='inner')
GDP_PC = 'GDP per capita ($)'
gdpVsPop[GDP_PC] = round(gdpVsPop[gdp_indicator]/gdpVsPop[pop_indicator])

gdp_pcClean = gdpVsPop[[COUNTRY, GDP_PC]]
gdp_pcClean

Unnamed: 0,country,GDP per capita ($)
0,Afghanistan,507.0
1,Albania,5353.0
2,Algeria,3974.0
3,Andorra,40886.0
4,Angola,2791.0
...,...,...
181,Vanuatu,3115.0
182,Vietnam,2715.0
183,"Yemen, Rep.",774.0
184,Zambia,1305.0


Now we will round the percentage of over-65s to 1 decimal point and merge all of our tables to create the datframe `df`.

In [20]:
AGE = 'Percentage of population above 65'
def round1dp(x):
    return round(x, 1)
ages[AGE] = ages[age_indicator].apply(round1dp)
agesClean = ages[[COUNTRY, AGE]]

In [None]:
ages_gdp = merge(agesClean, gdpClean, on=COUNTRY, how='inner')
ages_gdp_pop = merge(ages_gdp, popClean, on=COUNTRY, how='inner')
df = merge(ages_gdp_pop, gdp_pcClean, on=COUNTRY, how='inner')
df

We are now ready to compare the information stated in the introduction using the Spearman rank correlation coefficient. We also calculate the p-value to determine statistical significance.

## Age and GDP

In [None]:
from scipy.stats import spearmanr
(r, p) = spearmanr(df[AGE], df[GDP])
print('The correlation is ', r)
if p < 0.05:
    print('Statistically significant.')
else:
    print('Not statistically significant')

In [None]:
%matplotlib inline
df.plot(x=GDP, y=AGE, logx=True, kind='scatter', grid=True, figsize=(10,5))

## Age and population

In [None]:
(r, p) = spearmanr(df[AGE], df[POP])
print('The correlation is ', r)
if p < 0.05:
    print('Statistically significant.')
else:
    print('Not statistically significant')

In [None]:
df.plot(x=POP, y=AGE, logx=True, grid=True, kind='scatter', figsize=(10,5))

## Age and GDP per capita

In [None]:
(r, p) = spearmanr(df[AGE], df[GDP_PC])
print('The correlation is ', r)
if p < 0.05:
    print('Statistically significant.')
else:
    print('Not statistically significant')

In [None]:
df.plot(x=GDP_PC, y=AGE, logx=True, kind='scatter', grid=True, figsize=(10,5))

## Conclusion
It appears that the strongest relationship is between the percentage of population over 65 and the country's GDP per capita. There is a strong positive correlation which is unsurprising - a higher GDP per capita indicates the citizens of a country tend to be richer, so may have easier access to healthcare, safer jobs, and a better standard of living.