We know our data is synthetic, so let's load it up and take a look.

In [1]:
import pandas as pd

DATA = '/kaggle/input/global-population-dataset-20142024/population_by_year_2014_2024.csv'
df = pd.read_csv(filepath_or_buffer=DATA)
years = [column for column in df.columns if column.isnumeric()]
df.head()

Unnamed: 0,Country,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Afghanistan,56805036,58070279,59363704,60685938,62037623,63419415,64831984,66276015,67752210,69261285,70803973
1,Albania,127571863,129443136,131341857,133268430,135223262,137206769,139219371,141261494,143333572,145436043,147569355
2,Algeria,147582820,151776865,156090098,160525905,165087770,169779276,174604106,179566049,184669002,189916972,195314080
3,Andorra,133274038,134398607,135532666,136676293,137829571,138992580,140165403,141348122,142540821,143743584,144956495
4,Angola,35838921,36323701,36815040,37313024,37817745,38329292,38847760,39373240,39905829,40445621,40992715


First let's look at how the total population grows over the course of the period of interest.

In [2]:
from plotly import express

express.line(data_frame=df[years].sum(axis='index').to_frame().reset_index().rename(columns={'index': 'year', 0: 'population'}),
             x='year', y='population', )


What do we see? We see that the total population grows perfectly linearly over the period of interest, starting from just below 15 billion and ending just below 18 billion. What a different world this would be from the one we live in.

Let's look at the same data broken out by country. We might expect that every country's synthetic population might grow smoothly year over year since the total grows smoothly year over year.

In [3]:
express.line(data_frame=df.melt(id_vars=['Country']).rename(columns={'variable': 'year', 'value': 'population'}), 
                x='year', y='population', color='Country', height=800, log_y=False)

Surprisingly some of the lines seem to cross, suggesting that not all countries grow at the same rate.

In [4]:
t_df = df[['Country', '2014', '2024']].copy()
t_df['change'] = t_df['2024'] / t_df['2014']
t_df = t_df.drop(columns=['2014', '2024'])
express.line(data_frame=t_df, x='Country', y='change')

Interestingly the change per country is seemingly randomly distributed.