# World Development Indicators from the World Bank

In [1]:
import pandas as pd
import ipywidgets

## Import the Dataset

The [World Development Indicators](https://databank.worldbank.org/source/world-development-indicators) database includes 1599 indicators for over 200 countries from 1960 to 2018. We've selected 55 indicators and saved the data in a file `worldbank_development_indicators.csv` in the `data` folder. Let's import the data with [pandas](https://pandas.pydata.org).

In [2]:
data = pd.read_csv('./data/world-development-indicators.csv')

The data is imported as a DataFrame object which is like a spreadsheet. Let's look at the first 5 rows of the DataFrame.

In [3]:
data.head()

Unnamed: 0,Country Name,Series Name,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Argentina,"Adolescent fertility rate (births per 1,000 wo...",63.8308,63.8792,63.9276,63.976,63.7388,63.5016,63.2644,63.0272,62.79,
1,Argentina,"Agriculture, forestry, and fishing, value adde...",5.273623,7.132167,6.998734,5.781744,6.052918,6.712704,5.156686,6.264566,5.478382,6.140206
2,Argentina,"Annual freshwater withdrawals, total (% of int...",,,,12.907534,,12.907534,,,,
3,Argentina,Births attended by skilled health staff (% of ...,97.9,94.97,97.14,98.2,97.04,99.6,99.6,,,
4,Argentina,CO2 emissions (metric tons per capita),4.445388,4.607164,4.644373,4.60918,4.49854,4.781508,,,,


Use the `.info()` method get an overview of the DataFrame.

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11935 entries, 0 to 11934
Data columns (total 12 columns):
Country Name    11935 non-null object
Series Name     11935 non-null object
2009            8587 non-null float64
2010            8647 non-null float64
2011            8618 non-null float64
2012            8914 non-null float64
2013            8580 non-null float64
2014            8695 non-null float64
2015            8083 non-null float64
2016            8115 non-null float64
2017            7657 non-null float64
2018            4380 non-null float64
dtypes: float64(10), object(2)
memory usage: 1.1+ MB


There are nearly 12,000 rows of data but it looks like there are many missing values.

## Indicators and Countries

Let's verify that there are 55 distinct entries in the series column.

In [5]:
indicators = data['Series Name'].unique().tolist()

In [6]:
len(indicators)

55

Let's look at the first 10 in the list of indicators.

In [7]:
indicators[:10]

['Adolescent fertility rate (births per 1,000 women ages 15-19)',
 'Agriculture, forestry, and fishing, value added (% of GDP)',
 'Annual freshwater withdrawals, total (% of internal resources)',
 'Births attended by skilled health staff (% of total)',
 'CO2 emissions (metric tons per capita)',
 'Contraceptive prevalence, any methods (% of women ages 15-49)',
 'Domestic credit provided by financial sector (% of GDP)',
 'Electric power consumption (kWh per capita)',
 'Energy use (kg of oil equivalent per capita)',
 'Exports of goods and services (% of GDP)']

Let's see how many different countries are in the dataset.

In [8]:
countries = data['Country Name'].unique().tolist()

In [9]:
len(countries)

217

In [10]:
countries[:10]

['Argentina',
 'Australia',
 'Brazil',
 'China',
 'France',
 'Germany',
 'India',
 'Indonesia',
 'Italy',
 'Japan']

## Country Comparisons

Let's use [ipywidgets](https://ipywidgets.readthedocs.io) to create an interactive graphic which compares countries for a given indicator over time indicators over time.

In [13]:
dropdown1 = ipywidgets.Dropdown(options=countries,description='Country 1:',value='India')
dropdown2 = ipywidgets.Dropdown(options=countries,description='Country 2:',value='China')
dropdown3 = ipywidgets.Dropdown(options=indicators,description='Indicator:',value='Population, total')

@ipywidgets.interact(country1=dropdown1,country2=dropdown2,indicator=dropdown3)
def compare(country1,country2,indicator):
    df1 = data[(data['Country Name'] == country1) & (data['Series Name'] == indicator)].loc[:,'2009':'2018'].transpose()
    df2 = data[(data['Country Name'] == country2) & (data['Series Name'] == indicator)].loc[:,'2009':'2018'].transpose()
    df = df1.merge(df2,left_index=True,right_index=True)
    df.columns = [country1,country2]
    df.plot(marker='o',figsize=(12,8))

interactive(children=(Dropdown(description='Country 1:', index=6, options=('Argentina', 'Australia', 'Brazil',…

## Correlations among Indicators

In [12]:
dropdown1 = ipywidgets.Dropdown(options=indicators,description='Indicator 1:',value='Energy use (kg of oil equivalent per capita)')
dropdown2 = ipywidgets.Dropdown(options=indicators,description='Indicator 2:',value='CO2 emissions (metric tons per capita)')
dropdown3 = ipywidgets.Dropdown(options=list(range(2009,2019)),description='Year:',value=2014)

@ipywidgets.interact(indicator1=dropdown1,indicator2=dropdown2,year=dropdown3)
def compare(indicator1,indicator2,year):
    df = data.groupby(['Country Name','Series Name']).first().loc[:,str(year)].unstack()
    df.plot(kind='scatter',x=indicator1,y=indicator2,figsize=(12,8),grid=True)

interactive(children=(Dropdown(description='Indicator 1:', index=8, options=('Adolescent fertility rate (birth…