Population Lecture I
====================

**Author:** Ethan Ligon

**Date:** January 29, 2020



## Introduction



Today we&rsquo;ll introduce some key &ldquo;stylized facts&rdquo; about human
population and its growth.  None of these are &ldquo;causal&rdquo; statements,
just observations about relationships.

-   **Fact I:** Population growth is fundamentally exponential, but the
    rate of growth has fallen over time.
-   **Fact II:** Population growth rates are generally higher in places
    where people are poorer.
-   **Fact III:** Variation in growth rates across countries is
    accounted for more by variation in fertility than by mortality.



## Getting Data



### The World Development Indicators & `wbdata`



The World Bank maintains a large set of &ldquo;World Development Indicators&rdquo; (WDI),
including information on population.

-   API for WDI is available at [https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation)

-   A `python` module that uses the API is `wbdata`, written by Oliver Sherouse.

-   Available at [http://github.com/OliverSherouse/wbdata](http://github.com/OliverSherouse/wbdata).

-   Documented at [https://wbdata.readthedocs.io](https://wbdata.readthedocs.io).



### Getting Population Data Using wbdata



#### Goals



We want to devise ways to visualize the following:

-   Global population growth from 1960 to the present;
-   Population growth rates versus GDP per capita;
-   Age-sex population pyramids.



#### Methods (using wbdata)



We walk through the process of getting data from the WDI into a
`pandas` DataFrame.

The `wbdata` module has several key functions we&rsquo;ll want to use:

-   **`get_countries()`:** Returns code for different countries or
    regions.
-   **`get_sources()`:** Gives list of different data sources that can
    be accessed using the module; returns a numeric key;
-   **`get_indicators()`:** Given a source, this returns a list of
    available variables (indicators).
-   **`get_dataframe()`:** Given a source and a list of indicators,
    this returns a dataframe populated with the requested data.

Begin by importing the module:



In [7]:
%pip install plotly

Collecting plotly
  Downloading plotly-6.5.2-py3-none-any.whl.metadata (8.5 kB)
Collecting narwhals>=1.15.1 (from plotly)
  Using cached narwhals-2.15.0-py3-none-any.whl.metadata (13 kB)
Downloading plotly-6.5.2-py3-none-any.whl (9.9 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m11.0 MB/s[0m  [33m0:00:00[0m1.7 MB/s[0m eta [36m0:00:01[0m:01[0m
[?25hUsing cached narwhals-2.15.0-py3-none-any.whl (432 kB)
Installing collected packages: narwhals, plotly
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [plotly]━━━━[0m [32m1/2[0m [plotly]
[1A[2KSuccessfully installed narwhals-2.15.0 plotly-6.5.2
Note: you may need to restart the kernel to use updated packages.


In [1]:
## If import fails with "ModuleNotFoundError"
## uncomment below & try again
%pip install wbdata

import wbdata

Collecting wbdata
  Downloading wbdata-1.1.0-py3-none-any.whl.metadata (2.1 kB)
Collecting appdirs<2,>=1.4 (from wbdata)
  Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting backoff<3,>=2.2.1 (from wbdata)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting cachetools<6,>=5.3.2 (from wbdata)
  Downloading cachetools-5.5.2-py3-none-any.whl.metadata (5.4 kB)
Collecting dateparser<2,>=1.2.0 (from wbdata)
  Downloading dateparser-1.2.2-py3-none-any.whl.metadata (29 kB)
Collecting shelved-cache<0.4,>=0.3.1 (from wbdata)
  Downloading shelved_cache-0.3.1-py3-none-any.whl.metadata (4.7 kB)
Collecting tabulate<1,>=0.8.5 (from wbdata)
  Using cached tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)
Collecting regex>=2024.9.11 (from dateparser<2,>=1.2.0->wbdata)
  Downloading regex-2026.1.15-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.metadata (40 kB)
Collecting tzlocal>=0.2 (from dateparser<2,>=1.2.0->wbdata)


##### `wbdata.get_countries()`



What countries and regions are available?  Looking up the country
codes, or searching for particular strings:



In [2]:
import wbdata

# Return list of all country/region codes:
wbdata.get_countries()

# Return list matching a query term:
#wbdata.get_countries(query="World")
#wbdata.get_countries(query="United")

## Try your own search!
# wbdata.get_countries(query="")

id    name
----  --------------------------------------------------------------------------------
ABW   Aruba
AFE   Africa Eastern and Southern
AFG   Afghanistan
AFR   Africa
AFW   Africa Western and Central
AGO   Angola
ALB   Albania
AND   Andorra
ARB   Arab World
ARE   United Arab Emirates
ARG   Argentina
ARM   Armenia
ASM   American Samoa
ATG   Antigua and Barbuda
AUS   Australia
AUT   Austria
AZE   Azerbaijan
BDI   Burundi
BEA   East Asia & Pacific (IBRD-only countries)
BEC   Europe & Central Asia (IBRD-only countries)
BEL   Belgium
BEN   Benin
BFA   Burkina Faso
BGD   Bangladesh
BGR   Bulgaria
BHI   IBRD countries classified as high income
BHR   Bahrain
BHS   Bahamas, The
BIH   Bosnia and Herzegovina
BLA   Latin America & the Caribbean (IBRD-only countries)
BLR   Belarus
BLZ   Belize
BMN   Middle East, North Africa, Afghanistan & Pakistan (IBRD only)
BMU   Bermuda
BOL   Bolivia
BRA   Brazil
BRB   Barbados
BRN   Brunei Darussalam
BSS   Sub-Saharan Africa (IBRD-only countries)
BTN  

##### `wbdata.get_sources()`



To see possible datasets we can access via the API, use `get_sources()`



In [3]:
wbdata.get_sources()

  id  name
----  --------------------------------------------------------------------
   1  Doing Business
   2  World Development Indicators
   3  Worldwide Governance Indicators
   5  Subnational Malnutrition Database
   6  International Debt Statistics
  11  Africa Development Indicators
  12  Education Statistics
  13  Enterprise Surveys
  14  Gender Statistics
  15  Global Economic Monitor
  16  Health Nutrition and Population Statistics
  18  IDA Results Measurement System
  19  Millennium Development Goals
  20  Quarterly Public Sector Debt
  22  Quarterly External Debt Statistics SDDS
  23  Quarterly External Debt Statistics GDDS
  25  Jobs
  27  Global Economic Prospects
  28  Global Findex database
  29  The Atlas of Social Protection: Indicators of Resilience and Equity
  30  Exporter Dynamics Database – Indicators at Country-Year Level
  31  Country Policy and Institutional Assessment
  32  Global Financial Development
  33  G20 Financial Inclusion Indicators
  34  Global P

##### `wbdata.get_indicators()`



&ldquo;Population estimates and projections&rdquo; looks promising.
See what indicators/variables are available?



In [4]:
SOURCE = 40 # "Population estimates and projections

indicators = wbdata.get_indicators(source=SOURCE)
indicators

id                 name
-----------------  -------------------------------------------------------------------
SH.DTH.0509        Number of deaths ages 5-9 years
SH.DTH.0514        Number of deaths ages 5-14 years
SH.DTH.1014        Number of deaths ages 10-14 years
SH.DTH.1019        Number of deaths ages 10-19 years
SH.DTH.1519        Number of deaths ages 15-19 years
SH.DTH.2024        Number of deaths ages 20-24 years
SH.DTH.IMRT        Number of infant deaths
SH.DTH.IMRT.FE     Number of infant deaths, female
SH.DTH.IMRT.MA     Number of infant deaths, male
SH.DTH.MORT        Number of under-five deaths
SH.DTH.MORT.FE     Number of under-five deaths, female
SH.DTH.MORT.MA     Number of under-five deaths, male
SH.DTH.NMRT        Number of neonatal deaths
SH.DYN.0509        Probability of dying among children ages 5-9 years (per 1,000)
SH.DYN.0514        Probability of dying at age 5-14 years (per 1,000 children age 5)
SH.DYN.1014        Probability of dying among adolescents ages 1

#### Getting Population Over Time



Let&rsquo;s get data on the global population and see how it has changed over
time. The variable `SP.POP.TOTL` seems like a reasonable place to
start.

We want to get a `pandas.DataFrame` of total population:



In [5]:
# Give variable for clarity
variable_labels = {"SP.POP.TOTL":"World Population"}

world = wbdata.get_dataframe(variable_labels, country="WLD",parse_dates=True)

# Print a few years' data
world.head()

Unnamed: 0_level_0,World Population
date,Unnamed: 1_level_1
2024-01-01,8141809000.0
2023-01-01,8064058000.0
2022-01-01,7989545000.0
2021-01-01,7920515000.0
2020-01-01,7854748000.0


## Plotting Data



### Plotting data from pandas.DataFrame



Let&rsquo;s make a time-series plot of global population.  We&rsquo;ll use `plotly` as a backend for plotting data in a `pandas.DataFrame`.
Here are a couple lines to set up the plotting environment:



In [8]:
import pandas as pd
pd.options.plotting.backend = 'plotly'

### Plotting Global Population Over time



With that done, after we have a DataFrame making a plot is just one
line of code:



In [10]:
# Useful arguments to pass include xTitle, yTitle, Title
world.plot(title="Fact I: Growth Rates Falling over Time",
            labels=dict(date='Year',value='Population'))

### Plotting Different Countries&rsquo; Population Growth Rates



Globally, population growth has been basically linear over the last 60
years.

-   Increases by 1 billion about every 12 years.
-   Implies *rate* of growth falling over time.

How do population growth rates vary by country?



In [1]:
import numpy as np

variable_labels = {"SP.POP.TOTL":"Population"}

# Three letter codes come from wbdata.get_countries()
countries = {"WLD":"World",
             "LIC":"Low income",
             "LMC":"Low-medium income",
             "UMC":"Upper-medium income",
             "HIC":"High income",
            }

df = wbdata.get_dataframe(variable_labels, country = countries,parse_dates=True).squeeze()

df = df.unstack('country')
df = df.sort_index()

# Differences (over time) in logs give us growth rates
np.log(df).diff().plot(title="Fact II: Poorer places have higher growth rates",
                       labels=dict(value="Growth Rate",date='Year'))

### Population Growth vs Per capita GDP



Our second stylized fact was that there&rsquo;s an inverse association between
income and population growth.  We&rsquo;ll investigate this fact here,
constructing a scatter plot relating population growth rates to (log) GDP per capita.



In [1]:
import numpy as np
# wbdata.get_indicators(query="GDP per capita")

indicators = {"NY.GDP.PCAP.CD":"GDP per capita",
              "SP.DYN.TFRT.IN":"Total Fertility Rate",
              "SP.POP.GROW":"Population Growth Rate",
              "SP.DYN.AMRT.MA":"Male Mortality",
              "SP.DYN.AMRT.FE":"Female Mortality",
              "SP.POP.1564.FE.ZS":"% Adult Female",
              "SP.POP.TOTL.FE.ZS":"% Female"}

data = wbdata.get_dataframe(indicators,parse_dates=True)

# Just grab data from one year
df = data.xs("2022-01-01",level='date') # Latest year missing some data

df['Log GDP per capita'] = np.log(df['GDP per capita'])

df.plot.scatter(title="Fact II: Population growth is lower in higher-income countries",
         x="Log GDP per capita",y="Population Growth Rate",
         hover_name=df.reset_index('country')['country'].values.tolist())

### Decomposing Population Growth



Consider the human population at a particular time $t$, and let the
size of the population be given by $N_t$ at time $t$.  Also, let
$\phi_t$ be the *share* of the population at time $t$ that are women
of child-bearing age (e.g., 15&ndash;49).

Now, as a matter of accounting, population in the next period $t+1$ will be given by
$$
    N_{t+1} = (1-\mbox{mortality rate})N_t + \mbox{TFR}\cdot\phi_t N_t.
 $$

Thus, we can think of population growth as depending on mortality, fertility, and the share of the population that can bear children.

We&rsquo;ve seen that population growth is falling over time.  Is the fall due to changes in mortality, fertility, or $\phi_t$?



### Mortality Over Time



Can mortality changes account for declining population?  Look at
deaths per 10,000 people.



In [1]:
world = data.xs("World",level='country')

world[["Male Mortality","Female Mortality"]].plot(title="Deaths per 10,000")

### Adult female share of population over time



Decreases in population growth could also be due to a decreasing share of adult women, perhaps due to gender selection at birth.  How does this share ($\phi_t$) vary over time?



In [1]:
# % Adult Female is % of females who are adult.
# To make a share of total population take product
world["% Adult Female"] = world["% Adult Female"]*world["% Female"]/100

world["% Adult Female"].plot(title="% of Adult Females in World Population")

### Fertility over time



Finally, decreases in population growth could be due to reduced fertility.  How does global fertility vary over time?



In [1]:
world["Total Fertility Rate"].plot()

### Relation between income and fertility



In [1]:
df.plot.scatter(x="Log GDP per capita",y="Total Fertility Rate",
         hover_name=df.reset_index('country')['country'].values.tolist(),
         labels=dict(index="Log GDP per capita",values="Total Fertility Rate"),
         title="Fact II: Women in Poorer Countries Have Higher Fertility")