In [1]:
import pandas as pd

POPULATION = '/kaggle/input/2024-population-projections-by-country/2024Populations.csv'

df = pd.read_csv(filepath_or_buffer=POPULATION, index_col=['rank'])
df.head()

Unnamed: 0_level_0,country,TwoLetterID,unMember,pop1980,pop2000,pop2010,pop2023,pop2024,pop2030,pop2050,landAreaKm,2024YoYChange,2024YoYGrowthRate,2024WorldPercentage,Density_2024
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,India,IN,True,696828385,1059633675,1240613620,1428627663,1441719852,1514994080,1670490596,2973190.0,13092189,0.009164,0.177614,484.906734
2,China,CN,True,982372466,1264099069,1348191368,1425671352,1425178782,1415605906,1312636325,9424702.9,-492570,-0.000346,0.175577,151.21737
3,United States,US,True,223140018,282398554,311182845,339996563,341814420,352162301,375391963,9147420.0,1817857,0.005347,0.04211,37.367304
4,Indonesia,ID,True,148177096,214072421,244016173,277534122,279798049,292150100,317225213,1877519.0,2263927,0.008157,0.03447,149.025416
5,Pakistan,PK,True,80624057,154369924,194454498,240485658,245209815,274029836,367808468,770880.0,4724157,0.019644,0.030209,318.090773


Almost all of our data has population embedded in it, and population is highly unevenly distributed across countries. Let's try to plot the two variables that are not just proxies for population against each other.

In [2]:
import warnings
from plotly import express

warnings.filterwarnings(action='ignore', category=FutureWarning)
express.scatter(data_frame=df, x='Density_2024', y='2024YoYGrowthRate', hover_name='country', log_x=True, color='unMember')

We have some outliers in the density direction so we need to use a log plot in the density direction to get a nice-looking plot.

In [3]:
express.scatter(data_frame=df.sort_values(by='pop2024').reset_index(), y='pop2024', hover_name='country', log_y=True, color='unMember')

If we plot the 2024 population on a log plot in the population direction we see a pretty smooth-looking curve, ignoring the outliers.

In [4]:
express.scatter(data_frame=df.sort_values(by='Density_2024').reset_index(), y='Density_2024', hover_name='country', log_y=True, color='unMember')

We see something similar with the population density, although the outliers are more pronounced and the overall graph is flatter.

Let's have a look at the population at the two ends of the period of interest: the projected 2050 data as a function of the historical 1980 data.

In [5]:
express.scatter(data_frame=df, x='pop1980', y='pop2050', log_x=True, log_y=True, hover_name='country', color='2024YoYGrowthRate')

The growth rate in the middle of the period (the 2024 growth rate) gives us a good sense of which countries are growing and which are shrinking over the period of interest, although we do see some anomalies (e.g. Ukraine, the UAE).