# An Infographic of World Population Throughout Years 

This project is a data viualisation demonstration and is going to show some interesting information about world population and all the nations' population over years. There are two datasets used in this project, one of them is the world population by countries, year, age, and gender from United Nation, while the other one contians information of countries area. The following are the links to these data sets:

https://population.un.org/wpp/Download/Standard/CSV/

https://data.worldbank.org/indicator/ag.lnd.totl.k2 

And here we go! First avoid all the warning in the notebook first and import some neccesary modules to play around with our dataset.

In [1]:
import pandas as pd
import numpy as np
import pickle
import warnings
import country_converter as coco
warnings.filterwarnings('ignore')

Import the dataset of world population from United Nation and print a few rows out to see what we got, then remove those columns that are not neccesary.
* As the dataset is too large, it was splited into three in order save in Github 

In [2]:
df = pd.concat([pickle.load(open('WPP2017_1.p','rb')),
                pickle.load(open('WPP2017_2.p','rb')),
                pickle.load(open('WPP2017_3.p','rb')),
                pickle.load(open('WPP2017_4.p','rb')),
                pickle.load(open('WPP2017_5.p','rb')),
                pickle.load(open('WPP2017_6.p','rb')),
                pickle.load(open('WPP2017_7.p','rb')),
                pickle.load(open('WPP2017_8.p','rb')),
                pickle.load(open('WPP2017_9.p','rb')),
                pickle.load(open('WPP2017_10.p','rb')),
                pickle.load(open('WPP2017_11.p','rb')),
                pickle.load(open('WPP2017_12.p','rb'))])
df.head()

Unnamed: 0,LocID,Location,VarID,Variant,Time,MidPeriod,AgeGrp,AgeGrpStart,AgeGrpSpan,PopMale,PopFemale,PopTotal
0,4,Afghanistan,2,Medium,1950,1950.5,0,0,1,139.669,154.913,294.581
1,4,Afghanistan,2,Medium,1950,1950.5,1,1,1,131.916,141.851,273.767
2,4,Afghanistan,2,Medium,1950,1950.5,2,2,1,125.127,130.632,255.759
3,4,Afghanistan,2,Medium,1950,1950.5,3,3,1,119.22,121.097,240.317
4,4,Afghanistan,2,Medium,1950,1950.5,4,4,1,114.112,113.085,227.198


In [3]:
df.drop(['VarID', 'Variant', 'MidPeriod','AgeGrp','AgeGrpSpan','PopMale','PopFemale'],axis=1,inplace=True)

We want to show population information of 1950 to 2019, thus we can filter out information of other years

In [4]:
df = df[df.Time<2020]

A dataframe of world population can be separated out to plot a bar plot for showing population growth over year

In [5]:
world_df = df[df.Location=='World']
world_df['PopTotal'] = world_df['PopTotal']*1000
world = pd.DataFrame({'Year':world_df['Time'].unique()})
world['Population'] = world['Year'].apply(lambda year: world_df[world_df['Time']==year].PopTotal.sum())
world['Young'] = world['Year'].apply(lambda year: world_df[(world_df['Time']==year) & (world_df['AgeGrpStart']<25)].PopTotal.sum())
world['Middle'] = world['Year'].apply(lambda year: world_df[(world_df['Time']==year) & 
                                                           (world_df['AgeGrpStart']>=25) & (world_df['AgeGrpStart']<=65)].PopTotal.sum())
world['Old'] = world['Year'].apply(lambda year: world_df[(world_df['Time']==year) & 
                                                           (world_df['AgeGrpStart']>65)].PopTotal.sum())
world['Text']=pd.Series(['Total: ']*len(world))+(world['Population']*0.000000001).apply(lambda x: round(x,6)).apply(str)+pd.Series(['B']*len(world))
world.head()

Unnamed: 0,Year,Population,Young,Middle,Old,Text
0,1950,2536275000.0,1330979000.0,1089017000.0,116278323.0,Total: 2.536275B
1,1951,2583817000.0,1360800000.0,1103984000.0,119032633.0,Total: 2.583817B
2,1952,2630584000.0,1389830000.0,1119342000.0,121411585.0,Total: 2.630584B
3,1953,2677230000.0,1417964000.0,1135799000.0,123467441.0,Total: 2.67723B
4,1954,2724302000.0,1445323000.0,1153769000.0,125211152.0,Total: 2.724302B


This world dataframe is for plotting world population barplot, and now we can go on to prepare the dataframe for plotting country population map.  

Identify those areas in the dataframe which are not countries, most of them have large population.

In [6]:
places = pd.DataFrame({'LocID':df['LocID'].unique(),'Location':df['Location'].unique()})
places.head()

Unnamed: 0,LocID,Location
0,4,Afghanistan
1,903,Africa
2,8,Albania
3,12,Algeria
4,24,Angola


convert 'LocID' into ISO3 code which is compatible to plotly map plot, as this process will generate a lot of warning message, the variable in the following cell is pickled and reloaded in the next cell, thus I do not encourage to run this cell again if not necessary.

In [7]:
# places['ISO3'] = areas['LocID'].apply(lambda places: coco.convert(names=places,to='ISO3'))
# pickle.dump(places,open('places.p','wb'))

Instead, for reading purpose, please run this cell.

In [8]:
places = pickle.load(open('places.p','rb'))

In [9]:
places.head()

Unnamed: 0,LocID,Location,ISO3
0,4,Afghanistan,AFG
1,903,Africa,not found
2,8,Albania,ALB
3,12,Algeria,DZA
4,24,Angola,AGO


In [10]:
places[places['ISO3']=='not found']

Unnamed: 0,LocID,Location,ISO3
1,903,Africa,not found
9,935,Asia,not found
11,927,Australia/New Zealand,not found
35,915,Caribbean,not found
37,916,Central America,not found
38,5500,Central Asia,not found
40,830,Channel Islands,not found
61,910,Eastern Africa,not found
62,906,Eastern Asia,not found
63,923,Eastern Europe,not found


We can confirm that all these places are not countries! We can now convert places to a dictionary for later use. 

In [11]:
places_dict = pd.Series(places['ISO3'].values,index=places['LocID']).to_dict()

In [12]:
df['ISO3'] = df['LocID'].apply(lambda locid: places_dict[locid])

In [13]:
df = df[df['ISO3']!='not found']
df['PopTotal'] = df['PopTotal']*1000
df.head()

Unnamed: 0,LocID,Location,Time,AgeGrpStart,PopTotal,ISO3
0,4,Afghanistan,1950,0,294581.0,AFG
1,4,Afghanistan,1950,1,273767.0,AFG
2,4,Afghanistan,1950,2,255759.0,AFG
3,4,Afghanistan,1950,3,240317.0,AFG
4,4,Afghanistan,1950,4,227198.0,AFG


Combining country area dataset with the dataframe, some of the countries are not in the area dataset, thus their areas are manually append into the dictionary using data found in google.

In [14]:
area_info = pd.read_csv('API_AG.LND.TOTL.K2_DS2_en_csv_v2_10578310.csv')
country_area = pd.Series(area_info['2018'].values, index=area_info['Country Code'])
country_area['TWN']=36193
country_area['MYT']=374
country_area['GUF']=83534
country_area['GLP']=1628
country_area['MTQ']=1128
country_area['REU']=2512
country_area['ESH']=266000
country_area['CUW']=444
country_area['SSD']=619745
country_area['SDN']=1886068

Create a function that calculate population and population density of a specific year

In [15]:
def YearPopulation(year = 2018):
    
    year_data = df[df["Time"] == year]
    data = pd.DataFrame({'LocID': year_data['LocID'].unique(),
                         'Code': year_data['ISO3'].unique(),
                         'Location': year_data['Location'].unique(),
                         'Year':[str(year)]*len(year_data['LocID'].unique())})

    data.index = year_data['Location'].unique()
    data['Population'] = data['Code'].apply(lambda code: year_data[year_data['ISO3']==code]['PopTotal'].sum())
    data['Area'] = data['Code'].apply(lambda code: country_area[code])
    data['Density'] = data['Population'] / data['Area']
    data['Density'] = data['Density'].apply(np.log2)
    return data

In [16]:
density = pd.DataFrame(YearPopulation(1950)['Density'])
density.rename(index=str, columns={'Density':'1950'},inplace=True)
for year in range(1951,2020):
    density[str(year)]=YearPopulation(year)['Density']
density.head()
pickle.dump(density,open('density.p','wb'))

Import Plotly and create plots object

In [17]:
import plotly.offline as ply
import plotly.graph_objs as go
%matplotlib inline
ply.init_notebook_mode(connected=True)

In [18]:
def gen_choro(year):
    pop = YearPopulation(year)
    out = [go.Choropleth(
        visible=False,
        locations=pop['Code'],
        z=pop['Population'],
        text=pop['Location'],
        zmin=0,
        zmax = 1500000000,
        colorscale=[
            [0, "rgb(255, 245, 240)"],
            [0.2, "rgb(175, 95, 85)"],
            [1, "rgb(60, 40, 20)"]
        ],
        hoverlabel=dict(namelength=0),
        autocolorscale=False,
        reversescale=False,
        marker=go.choropleth.Marker(
            line=go.choropleth.marker.Line(
                color='rgb(200,200,200)',
                width=0.5
            )),
        geo='geo',
        colorbar=go.choropleth.ColorBar(
            thicknessmode= 'fraction',
            thickness=0.02,
            lenmode = 'fraction',
            len = 0.37,
            x = 1.02,
            xanchor = 'left',
            xpad = 0,
            ypad=0,
            outlinewidth=0.5,
            y = 0.5,
            tickprefix='',
            title='Population')
    ),go.Choropleth(
        visible=False,
        locations=pop['Code'],
        z=pop['Density'],
        text=pop['Location'],
        zmin=-5,
        zmax=15,
        colorscale=[
            [0, "rgb(235, 255, 250)"],
            [0.17229314496802367, "rgb(130, 215, 199)"],
            [0.29107161961173356, "rgb(50, 190, 180)"],
            [0.4964636140318991, "rgb(40, 110, 140)"],
            [1, "rgb(0, 15, 51)"]
        ],
        hoverlabel=dict(namelength=0),
        autocolorscale=False,
        reversescale=False,
        geo='geo2',
        marker=go.choropleth.Marker(
            line=go.choropleth.marker.Line(
                color='rgb(200,200,200)',
                width=0.5
            )),
        colorbar=go.choropleth.ColorBar(
            thicknessmode='fraction',
            thickness=0.02,
            lenmode='fraction',
            len=0.4,
            x=-0.01,
            xanchor='right',
            xpad=0,
            ypad=0,
            outlinewidth=0.5,
            y=0.5,
            tickprefix='',
            title='Population<br>Log2 Density<br>(log(km^-2))')
    )]
    return out

Create maps of different years and show them by adjusting the slider. This step requires abit of time so a pickled file of data is prepared to be used for reading.

In [19]:
# Year range variables
year_start = 1950
year_stop =2020

 Uncomment this cell to run if necessary.

In [20]:
# # Creating map objects
# d1 = []
# d2 = []
# for step in range(year_start,year_stop):
#     coro = gen_choro(step)
#     d1.append(coro[0])
#     d2.append(coro[1])
# d1[year_stop-year_start-1]['visible'] = True
# d2[year_stop-year_start-1]['visible'] = True
# data = d1+d2
# pickle.dump(data,open('slider_data.p','wb'))

In [21]:
data = pickle.load(open('slider_data.p','rb'))

In [22]:
# Defining configurations for the slider
steps = []
for i in range(year_stop-year_start):
    step = dict(
        method = 'restyle',
        args = ['visible', [False] * len(data)],
        label = str(year_start+i)
    )
    step['args'][1][i] = True
    step['args'][1][i+year_stop-year_start] = True
    steps.append(step)
    
sliders = [dict(
    active = year_stop-year_start-1,
    currentvalue = {"prefix": "Year: "},
    pad = {"t": 10, "b":0},
    steps = steps,
    len = 0.85,
    bgcolor = 'rgb(240,240,240)',
    bordercolor = 'rgb(200,200,200)',
    y=0.2,
    xanchor = 'center',
    x=0.5
)]

Set up the layout of the maps plot.

In [23]:
layout = go.Layout(

    margin = dict(t=0,b=20,l=0),
    geo = go.layout.Geo(
        domain=dict(x=[0.5,1],
                    y=[0.15,0.95]),
        showframe = True,
        framecolor = 'rgb(220,220,220)',
        framewidth = 0.2,
        showcoastlines = False,
        projection = go.layout.geo.Projection(
            type = "miller"
        ),
        showocean = True,
        oceancolor = 'rgb(240,253,255)',
        resolution = 50,
        showcountries = True,
        countrywidth = 0.5,
        countrycolor = 'rgb(200,200,200)',
        bgcolor = 'rgb(255,255,255)'
    ),
    geo2 = go.layout.Geo(
        domain = dict(x=[0,0.5],
                      y=[0.15,0.95]),
        showframe = True,
        framecolor = 'rgb(220,220,220)',
        framewidth = 0.2,
        showcoastlines = False,
        projection = go.layout.geo.Projection(
            type = "miller"
        ),
        showocean = True,
        oceancolor = 'rgb(240,253,255)',
        resolution = 50,
        showcountries = True,
        countrywidth = 0.5,
        countrycolor = 'rgb(200,200,200)',
        bgcolor = 'rgb(255,255,255)'
    ),
    annotations=[dict(font=dict(family = 'Impact',
            size = 20,
            color = 'rgb(90, 45, 45)'),
            showarrow=False,
            text='Country Population',
            x=0.85,
            y=0.975
            ),
            dict(font=dict(family = 'Impact',
            size = 20,
            color = 'rgb(0,50,80)'),
            showarrow=False,
            text='Country Population Density',
            x=0.10,
            y=0.975
            )
            ],
    sliders=sliders

)

Create world population barplot traces.

In [24]:
trace0 = go.Bar(
    x=world["Year"],
    x0=1950,
    y=world['Young'],
    name = '<25',
    marker = dict(
        color = 'rgba(180, 220, 235,0.8)',
        line = dict(
            width = 0)),
    hoverlabel=dict(bgcolor='rgba(100, 100, 100,0.8)',
                    bordercolor='rgba(100, 100, 100,0.8)',
                    font=dict(color='rgb(255,255,255)'))
)

trace1 = go.Bar(
    x=world["Year"],
    x0=1950,
    y=world['Middle'],
    name='25-65',
    marker = dict(
        color = 'rgba(120, 170, 200, 0.8)',
        line = dict(
            width = 0)),
    hoverlabel=dict(bgcolor='rgba(100, 100, 100,0.8)',
                    bordercolor='rgba(100, 100, 100,0.8)',
                    font=dict(color='rgb(255,255,255)'))
)

trace2 = go.Bar(
    x=world["Year"],
    x0=1950,
    y=world['Old'],
    name = '>65',
    marker = dict(
        color = 'rgba(60, 105, 150, 0.8)',
        line = dict(
            width = 0)),
    hoverinfo='x+y+z+text+name',
    hovertext=world['Text'],
    hoverlabel=dict(bgcolor='rgba(100, 100, 100,0.8)',
                    bordercolor='rgba(100, 100, 100,0.8)',
                    font=dict(color='rgb(255,255,255)'))
)

data_world = [trace0,trace1,trace2]

Set up layout of the barplot.

In [25]:
layout_world = go.Layout(legend=dict(x=1,y=0.9),
            margin=dict(t=100,b=70,l=100,r=40),
            barmode='stack',
            xaxis = dict(title = 'Year',
                         range=(1949.5,2019.5),
                         fixedrange=False),
            yaxis = dict(title = 'Population',
                         range=(0,8100000000),
                         fixedrange=True),
            annotations = [dict(font=dict(family='Arial',
                              size=12,
                              color='rgb(15,15,15)'),
                    showarrow=False,
                    text='Age Group',
                    xref='paper',
                    yref='paper',
                    xanchor='left',
                    x=1.01,
                    y=0.93
                    ),
               dict(font=dict(family='Impact',
                              size=30,
                              color='rgb(0,50,80)'),
                    showarrow=False,
                    text='World Population',
                    xref='paper',
                    x=0.5,
                    yref='paper',
                    y=1.11
                    )
               ]
            )

Create the figures.

In [26]:
world_fig = go.Figure(data = data_world, layout = layout_world)
country_fig = go.Figure(data = data, layout = layout)

In [27]:
ply.iplot(world_fig,filename = 'world_population')

As expected, the population of the world is increasing. Also worth to note that from the proportion of different age group, we can see that the world population is suffering aging!  

In [28]:
ply.iplot(country_fig,filename = 'country_population')

And finally we got our population and population density comparison. The India and China are the countries with largest population, which are 1.369 and 1.420 BILLION respectively! The third populated country is USA, though it is far behind China and India, with 'only' 0.329 million people.

For density, the top three all go to some micro states in Asia, which are Macao with logged density 14.37, Singapore with logged density 13.01, and Hong Kong with logged density 12.80. A little bit more history about these states provided, Singapore and Hong Kong were both British colonies and Macao is a former Portugal colony. Singapore eventually gained its independence while the other two micro states failed to do so and sadly end up as two Chinese colonies at the moment.