Skip to content

Latest commit

 

History

History
155 lines (131 loc) · 3.27 KB

World population.pandoc.md

File metadata and controls

155 lines (131 loc) · 3.27 KB
jupyter
jupytext kernelspec nbformat nbformat_minor
cell_markers formats text_representation
region,endregion
ipynb,.pct.py:percent,.lgt.py:light,.spx.py:sphinx,md,Rmd,.pandoc.md:pandoc
extension format_name format_version jupytext_version
.md
pandoc
2.7.2
1.1.0
display_name language name
Python 3
python
python3
4
2

::: {.cell .markdown}

A quick insight at world population

Collecting population data

In the below we retrieve population data from the World Bank using the wbdata python package :::

::: {.cell .code}

import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20

:::

::: {.cell .markdown} Corresponding indicator is found using search method - or, directly, the World Bank site. :::

::: {.cell .code}

wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use

:::

::: {.cell .markdown} Now we download the population data :::

::: {.cell .code}

indicators = {'SP.POP.TOTL': 'Population, total',
              'AG.SRF.TOTL.K2': 'Surface area (sq. km)',
              'AG.LND.TOTL.K2': 'Land area (sq. km)',
              'AG.LND.ARBL.ZS': 'Arable land (% of land area)'}
data = wb.get_dataframe(indicators, convert_date=True).sort_index()
data

:::

::: {.cell .markdown} World is one of the countries :::

::: {.cell .code}

data.loc['World']

:::

::: {.cell .markdown} Can we classify over continents? :::

::: {.cell .code}

data.loc[(slice(None), '2017-01-01'), :]['Population, total'].dropna(
).sort_values().tail(60).index.get_level_values('country')

:::

::: {.cell .markdown} Extract zones manually (in order of increasing population) :::

::: {.cell .code}

zones = ['North America', 'Middle East & North Africa',
         'Latin America & Caribbean', 'Europe & Central Asia',
         'Sub-Saharan Africa', 'South Asia',
         'East Asia & Pacific'][::-1]

:::

::: {.cell .markdown} And extract population information (and check total is right) :::

::: {.cell .code}

population = data.loc[zones]['Population, total'].swaplevel().unstack()
population = population[zones]
assert all(data.loc['World']['Population, total'] == population.sum(axis=1))

:::

::: {.cell .markdown}

Stacked area plot with matplotlib

:::

::: {.cell .code}

import matplotlib.pyplot as plt

:::

::: {.cell .code}

plt.clf()
plt.figure(figsize=(10, 5), dpi=100)
plt.stackplot(population.index, population.values.T / 1e9)
plt.legend(population.columns, loc='upper left')
plt.ylabel('Population count (B)')
plt.show()

:::

::: {.cell .markdown}

Stacked bar plot with plotly

:::

::: {.cell .code}

import plotly.offline as offline
import plotly.graph_objs as go

offline.init_notebook_mode()

:::

::: {.cell .code}

data = [go.Scatter(x=population.index, y=population[zone], name=zone, stackgroup='World')
        for zone in zones]
fig = go.Figure(data=data,
                layout=go.Layout(title='World population'))
offline.iplot(fig)

:::