<a href="https://www.kaggle.com/code/mikedelong/scatter-plot-eda?scriptVersionId=162326617" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd

WORLD = '/kaggle/input/world-population-growth/World Population Growth.csv'

df = pd.read_csv(filepath_or_buffer=WORLD, thousands=',')
df['growth pct'] = df['Yearly Growth %'].str[:-1].astype(float)
df.head()

Unnamed: 0,Year,Population,Yearly Growth %,Number,Density (Pop/km2),growth pct
0,1951,2543130380,1.75%,43808223,17,1.75
1,1952,2590270899,1.85%,47140519,17,1.85
2,1953,2640278797,1.93%,50007898,18,1.93
3,1954,2691979339,1.96%,51700542,18,1.96
4,1955,2746072141,2.01%,54092802,18,2.01


In [2]:
from plotly.express import scatter
scatter(data_frame=df, x='Year', y='Population', trendline='ols', size='growth pct', height=900)

We might expect population growth to be more exponential, but over the last seventy years it looks very linear. Choosing growth percentage as the size here suggests that annual cohorts are getting smaller. Let's look again with a different variable for the size.

In [3]:
scatter(data_frame=df, x='Year', y='Population', trendline='ols', size='Number', height=900)

This shows that while the growth percentage has gotten smaller the cohorts themselves have tapered off their highs but are still substantially larger than their 1950s cousins.

In [4]:
# https://stackoverflow.com/a/60491177

from plotly.graph_objects import Figure
from plotly.graph_objects import Layout
from plotly.graph_objects import Scatter

data = [Scatter(x=df['Year'], y=df['Number'], name='Growth', mode='lines+markers', yaxis='y1'),
    Scatter(x=df['Year'], y=df['growth pct'], name='Growth (%)', mode='lines+markers', yaxis='y2')]

layout = Layout(title='Annual growth - absolute and relative', yaxis=dict(title='Net change'),
                       yaxis2=dict(title='Y/Y Change (%)', overlaying='y', side='right'))
Figure(data=data, layout=layout).show()

This is probably the nut graf because it captures the annual population change in both absolute and relative terms, showing the Great Chinese Famine era in 1959-1960 and the gradual decline of the birth control era. For the most part this graph briefs well because it shows detail that is hard to grasp just looking at the annual population numbers above.

In [5]:
scatter(data_frame=df, x='Number', y='growth pct', color='Year')

This attempt to capture both growth measures is a bit conceptual, meaning that it is difficult to explain, but it conveys how current data is part of a trend that began in 2013, and how growth is decelerating even while population is growing.

In [6]:
scatter(data_frame=df, x='Number', y='growth pct', color='Density (Pop/km2)')

This is even more difficult to explain, but it suggests that population deceleration might be a function of density.

In [7]:
scatter(data_frame=df, x='Density (Pop/km2)', y='growth pct', color='Year', trendline='lowess')

This is even more suggestive, especially with the trendine and year both indicating motion into the future.