## Introduction



Today we'll introduce some key "stylized facts" about human
population and its growth.  None of these are "causal" statements,
just observations about relationships.

-   **Fact I:** Population growth is fundamentally exponential, but the
    rate of growth has fallen over time.
-   **Fact II:** Population growth rates are generally higher in places
    where people are poorer.
-   **Fact III:** Variation in growth rates across countries is due more
    to variation in fertility than it is mortality.



## Getting Data



### The World Development Indicators & =wbdata=



The World Bank maintains a large set of "World Development Indicators" (WDI),
including information on population.  

-   API for WDI is available at [https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation)

-   A `python` module that uses the API is `wbdata`, written by Oliver Sherouse.

-   Available at [http://github.com/OliverSherouse/wbdata](http://github.com/OliverSherouse/wbdata).

-   Documented at [https://wbdata.readthedocs.io](https://wbdata.readthedocs.io).



### Getting Population Data Using wbdata



#### Goals



We want to devise ways to visualize the following:

-   Global population growth from 1960 to the present;
-   Population growth rates versus GDP per capita;
-   Age-sex population pyramids.



#### Methods (using wbdata)



We walk through the process of getting data from the WDI into a
`pandas` DataFrame. 

The `wbdata` module has several key functions we'll want to use:

-   **search\_countries():** Returns code for different countries or
    regions.
-   **get\_source():** Gives list of different data sources that can
    be accessed using the module; returns a numeric key;
-   **get\_indicator():** Given a source, this returns a list of
    available variables (indicators).
-   **get\_dataframe():** Given a source and a list of indicators,
    this returns a dataframe populated with the requested data
    for whatever

Begin by importing the module:



In [1]:
## If import fails with "ModuleNotFoundError"
## uncomment below & try again
# !pip install wbdata

import wbdata

:results:
   # Out[30]:
   :end:

##### =wbdata.search_countries()=



What countries and regions are available?  Looking up the country
 codes, or searching for particular strings:



In [1]:
import wbdata

# Return list of all country/region codes:
#wbdata.get_country()

# Return list matching a query term:
wbdata.search_countries("World")

## Try your own search!
# wbdata.search_countries("")

##### =wbdata.get_source()=



To see possible datasets we can access via the API, use `get_source()`



In [1]:
wbdata.get_source()

##### =wbdata.get_indicator()=



"Population estimates and projections" looks promising.
See what indicators/variables are available?



In [1]:
SOURCE = 40 # "Population estimates and projections

indicators = wbdata.get_indicator(source=SOURCE)

#### Getting Population Over Time



Let's get data on the global population has changed over
time. The variable `SP.POP.TOTL` seems like a reasonable place to
start.  

We want to get a `pandas.DataFrame` of total population:



In [1]:
# Give variable for clarity
variable_labels = {"SP.POP.TOTL":"World Population"}

world = wbdata.get_dataframe(variable_labels, country="WLD")

# Date index is of type string; change to integers
world.index = world.index.astype(int)

print(world.head())

# Out[20]:
# output
      World Population
date                  
2018               NaN
2017      7.530360e+09
2016      7.444157e+09
2015      7.357559e+09
2014      7.271323e+09

## Plotting Data



#### Plotting data from pandas.DataFrame



Let's make a time-series plot of global population.  We'll use the
`plot.ly` `cufflinks` module, which integrates with `pandas`.  Here's two lines to set up the plotting environment:



In [1]:
#!pip install cufflinks # IF NECESSARY
import cufflinks as cf
cf.go_offline()

#### Plotting Global Population Over time



With that done, after we have a DataFrame making a plot is just one
line of code:



In [1]:
# Useful arguments to pass include xTitle, yTitle, Title
world.iplot(title="Fact I: Growth Rates Falling over Time",xTitle='Year',yTitle='Population')

#### Plotting Different Countries' Population Growth Rates



Globally, population growth has been basically linear over the last 60
years.

-   Increases by 1 billion about every 12 years.
-   Implies *rate* of growth falling over time.

How do population growth rates vary by country?



In [1]:
import numpy as np

variable_labels = {"SP.POP.TOTL":"Population"}

# Three letter codes come from wbdata.get_country()
countries = {"WLD":"World",
             "LIC":"Low income",
             "LMC":"Low-medium income",
             "UMC":"Upper-medium income",
             "HIC":"High income",
            }

df = wbdata.get_dataframe(variable_labels, country = countries).squeeze()

df = df.unstack('country')
# Date index is of type string; change to integers
df.index = df.index.astype(int)

# Differences (over time) in logs give us growth rates
np.log(df).diff().iplot(title="Fact II: Poorer places have higher growth rates",
                        yTitle="Growth Rate",xTitle='Year')

#### Population growth vs Per capita GDP



In [1]:
# wbdata.search_indicators("GDP per capita")

indicators = {"NY.GDP.PCAP.CD":"GDP per capita",
              "SP.DYN.TFRT.IN":"Total Fertility Rate",
              "SP.POP.GROW":"Population Growth Rate"}

df = wbdata.get_dataframe(indicators)

df = df.query("date=='2016'") # 2017 not available yet

# All dates now the same; not a useful index
df.index = df.index.droplevel('date')

df.iplot(kind='scatter', mode='markers+text',
         x="GDP per capita",y="Total Fertility Rate",
         title="Fact II: Women in Poorer Countries Have Higher Fertility")

# Out[32]:
# output

KeyErrorTraceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Total Fertility Rate'

During handling of the above exception, another exception occurred:

KeyErrorTraceback (most recent call last)
<ipython-input-32-a627a5935d0e> in <module>
     13 
     14 df.iplot(x="GDP per capita",y="Total Fertility Rate",
---> 15          title="Fact II: Women in Poorer Countries Have Higher Fertility")
     16 

~/anaconda3/lib/python3.7/s

### Get data on population by age-sex



In [1]:
age_ranges = []

for i in range(0,80,5):
    age_ranges.append(f"{i:02d}"+f"{i+4:02d}")

age_ranges.append("80UP")

print(age_ranges)

In [1]:
male_variables = {"SP.POP."+age_range+".MA":"Males "+age_range for age_range in age_ranges}
female_variables = {"SP.POP."+age_range+".FE":"Females "+age_range for age_range in age_ranges}

variables = male_variables
variables.update(female_variables)

print(variables)

In [1]:
df = wbdata.get_dataframe(variables,country="WLD")
print(df.tail())

In [1]:
import plotly.offline as py
  import plotly.graph_objs as go
  import pandas as pd
  import numpy as np

  py.init_notebook_mode(connected=True)

  layout = go.Layout(barmode='overlay',
                     yaxis=go.layout.YAxis(range=[0, 90], title='Age'),
                     xaxis=go.layout.XAxis(title='Number'))

  year = 2017

  bins = [go.Bar(x = df.loc[str(year),:].filter(regex="Male").values,
                 y = [int(s[:2])+1 for s in age_ranges],
                 orientation='h',
                 name='Men',
                 marker=dict(color='purple'),
                 hoverinfo='skip'
                 ),

          go.Bar(x = -df.loc[str(year),:].filter(regex="Female").values,
                 y=[int(s[:2])+1 for s in age_ranges],
                 orientation='h',
                 name='Women',
                 marker=dict(color='pink'),
                 hoverinfo='skip',
                 )
          ]
  py.plot(dict(data=bins, layout=layout))

In [1]:
years = range(2017,1960,-10)

  bins = [go.Bar(x = df.loc[str(year),:].filter(regex="Male").values,
                 y = [int(s[:2])+1 for s in age_ranges],
                 orientation='h',
                 name='Men',
                 marker=dict(color='purple'),
                 hoverinfo='skip'
                 )
          for year in years]
          
  bins += [go.Bar(x = -df.loc[str(year),:].filter(regex="Female").values,
                 y=[int(s[:2])+1 for s in age_ranges],
                 orientation='h',
                 name='Women',
                 marker=dict(color='pink'),
                 hoverinfo='skip',
                 )
          for year in years]

  py.plot(dict(data=bins, layout=layout))