[This post by Ramiro Gomez](http://exploringdata.github.io/vis/evolution-internet-users/) is nice: it shows the number of internet users by country and gdp and allows to evolve this graph in time. Given that this animation looks a lot like this [gapminder animation](https://notebooks.anaconda.org/bokeh/gapminder), I would like to try and replicate this plot using [Bokeh](http://bokeh.pydata.org/en/latest/). One of my incentives for doing so is also to add a slider instead of the implicit bottom-right interaction.

# Getting the data and cleaning it

Ramiro Gomez has [created a repository for the data he is using](https://github.com/yaph/evolution-internet-users), but the data points themselves are not directly downloadable in text form. Therefore I decided to get it in raw form from the [world bank website](http://data.worldbank.org/data-catalog/world-development-indicators) and export it as an Excel file. Let's load the data and look at it:

In [1]:
import numpy as np

In [2]:
import pandas as pd

In [3]:
df = pd.read_excel('files/InternetUsers_GDP.xlsx')

Let's have a quick look at head of the data:

In [4]:
df.head()

Unnamed: 0,Series Name,Series Code,Country Name,Country Code,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
0,Internet users (per 100 people),IT.NET.USER.P2,Afghanistan,AFG,..,..,..,..,..,..,...,2.10712,1.9,1.84,3.55,4,5,5.45455,5.9,6.39,..
1,Internet users (per 100 people),IT.NET.USER.P2,Albania,ALB,..,..,..,..,0.0111687,0.0321968,...,9.60999,15.0361,23.86,41.2,45,49,54.656,57.2,60.1,..
2,Internet users (per 100 people),IT.NET.USER.P2,Algeria,DZA,..,..,..,0.000360674,0.00176895,0.00173853,...,7.37598,9.45119,10.18,11.23,12.5,14,15.228,16.5,18.09,..
3,Internet users (per 100 people),IT.NET.USER.P2,American Samoa,ASM,..,..,..,..,..,..,...,..,..,..,..,..,..,..,..,..,..
4,Internet users (per 100 people),IT.NET.USER.P2,Andorra,ADO,..,..,..,..,..,1.5266,...,48.9368,70.87,70.04,78.53,81,81,86.4344,94,95.9,..


The columns feature the years, while the countries represented are in the rows.

In [5]:
df.columns

Index(['Series Name', 'Series Code', 'Country Name', 'Country Code',
       '1991 [YR1991]', '1992 [YR1992]', '1993 [YR1993]', '1994 [YR1994]',
       '1995 [YR1995]', '1996 [YR1996]', '1997 [YR1997]', '1998 [YR1998]',
       '1999 [YR1999]', '2000 [YR2000]', '2001 [YR2001]', '2002 [YR2002]',
       '2003 [YR2003]', '2004 [YR2004]', '2005 [YR2005]', '2006 [YR2006]',
       '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]',
       '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]',
       '2015 [YR2015]'],
      dtype='object')

We're now going to build three dataframes with this data:

- population 
- internet use
- gdp

Let's build the dataframe for internet use.

In [6]:
df_internet = df[df['Series Name'] == 'Internet users (per 100 people)'][['Country Name', '1991 [YR1991]', '1992 [YR1992]', '1993 [YR1993]', '1994 [YR1994]',
       '1995 [YR1995]', '1996 [YR1996]', '1997 [YR1997]', '1998 [YR1998]',
       '1999 [YR1999]', '2000 [YR2000]', '2001 [YR2001]', '2002 [YR2002]',
       '2003 [YR2003]', '2004 [YR2004]', '2005 [YR2005]', '2006 [YR2006]',
       '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]',
       '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]',
       '2015 [YR2015]']]

In [7]:
df_internet.replace(to_replace='..', value=np.nan, inplace=True)

In [8]:
s = df_internet.pop('Country Name')
df_internet = df_internet.set_index(s)
df_internet

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,2.107124,1.900000,1.840000,3.550000,4.00,5.000000,5.454545,5.90000,6.39000,
Albania,,,,,0.011169,0.032197,0.048594,0.065027,0.081437,0.114097,...,9.609991,15.036115,23.860000,41.200000,45.00,49.000000,54.655959,57.20000,60.10000,
Algeria,,,,0.000361,0.001769,0.001739,0.010268,0.020239,0.199524,0.491706,...,7.375985,9.451191,10.180000,11.230000,12.50,14.000000,15.228027,16.50000,18.09000,
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,,,,,,1.526601,3.050175,6.886209,7.635686,10.538836,...,48.936847,70.870000,70.040000,78.530000,81.00,81.000000,86.434425,94.00000,95.90000,
Angola,,,,,,0.000776,0.005674,0.018454,0.071964,0.105046,...,1.907648,3.200000,4.600000,6.000000,10.00,14.776000,16.937210,19.10000,21.26000,
Antigua and Barbuda,,,,,2.200769,2.858450,3.480537,4.071716,5.300681,6.482226,...,30.000000,34.000000,38.000000,42.000000,47.00,52.000000,58.000000,63.40000,64.00000,
Argentina,,0.002993,0.029527,0.043706,0.086277,0.141955,0.280340,0.830767,3.284482,7.038683,...,20.927202,25.946633,28.112623,34.000000,45.00,51.000000,55.800000,59.90000,64.70000,
Armenia,,,,0.009117,0.052743,0.094573,0.111651,0.128659,0.970738,1.300470,...,5.631788,6.021253,6.210000,15.300000,25.00,32.000000,37.500000,41.90000,46.30000,
Aruba,,,,,,2.768383,,,4.506179,15.442823,...,28.000000,30.900000,52.000000,58.000000,62.00,69.000000,74.000000,78.90000,83.78000,


Let's now build the dataframe for GDP.

In [9]:
df_gdp = df[df['Series Name'] == 'GDP per capita (constant 2005 US$)'][['Country Name', '1991 [YR1991]', '1992 [YR1992]', '1993 [YR1993]', '1994 [YR1994]',
       '1995 [YR1995]', '1996 [YR1996]', '1997 [YR1997]', '1998 [YR1998]',
       '1999 [YR1999]', '2000 [YR2000]', '2001 [YR2001]', '2002 [YR2002]',
       '2003 [YR2003]', '2004 [YR2004]', '2005 [YR2005]', '2006 [YR2006]',
       '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]',
       '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]',
       '2015 [YR2015]']]

In [10]:
df_gdp.replace(to_replace='..', value=np.nan, inplace=True)

In [11]:
s = df_gdp.pop('Country Name')
df_gdp = df_gdp.set_index(s)
df_gdp

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,263.012374,291.128823,294.238183,347.208097,366.324813,377.292766,418.426197,413.233959,408.898697,
Albania,1211.431789,1131.047006,1247.214425,1359.050613,1549.345209,1700.873966,1536.967481,1743.097819,1931.344256,2085.582721,...,2939.136002,3136.159294,3398.487894,3536.231650,3685.568522,3790.101771,3857.339741,3916.231237,3994.625479,
Algeria,2484.153186,2470.566078,2366.015901,2297.100086,2339.655291,2393.552046,2381.351303,2465.744244,2509.110754,2530.011442,...,3109.768204,3167.388563,3179.776708,3176.744294,3233.176770,3262.063307,3304.701427,3330.802120,3400.732802,
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,29833.095771,28970.387067,27685.024766,27574.571029,27825.961843,28921.863630,31615.216190,32757.517100,33955.180327,33700.762129,...,40745.465162,40054.227486,36296.274603,34968.485540,33512.015532,32713.610776,33357.462074,34835.709370,,
Angola,1376.983953,1241.205271,904.267859,906.193896,970.202243,1048.154225,1100.061612,1142.965689,1146.517824,1145.236277,...,1838.473307,2178.362871,2397.088332,2373.831980,2373.765225,2385.576823,2426.366161,2507.085626,2521.102581,
Antigua and Barbuda,9793.500455,9717.444050,10014.827235,10379.394318,9684.082744,10057.745742,10258.959051,10426.206800,10593.961477,10897.778167,...,13547.711833,14671.133807,14517.635341,12629.713786,11602.142227,11275.280156,11607.745344,11481.385188,11731.963766,
Argentina,4395.007076,4852.381210,5070.344112,5296.817066,5081.892652,5298.425979,5661.959372,5813.822438,5554.697946,5449.989077,...,6108.427140,6527.158500,6659.230509,6594.499948,7143.504413,7662.156733,7642.930027,7781.549510,7737.715767,
Armenia,1021.638633,605.352200,565.159435,610.113998,665.722822,715.818418,748.023547,808.996242,840.862615,895.603701,...,1847.746891,2111.675685,2267.312170,1952.342100,1997.052261,2087.751968,2230.288855,2297.661964,2364.748214,
Aruba,,,,23086.937951,22319.245642,23233.536705,24129.274973,23896.337025,24490.145798,23902.920892,...,23662.635648,22710.463505,21121.812032,19913.149353,,,,,,


Finally, let's build the dataframe for population:

In [12]:
df_pop = df[df['Series Name'] == 'Population, total'][['Country Name', '1991 [YR1991]', '1992 [YR1992]', '1993 [YR1993]', '1994 [YR1994]',
       '1995 [YR1995]', '1996 [YR1996]', '1997 [YR1997]', '1998 [YR1998]',
       '1999 [YR1999]', '2000 [YR2000]', '2001 [YR2001]', '2002 [YR2002]',
       '2003 [YR2003]', '2004 [YR2004]', '2005 [YR2005]', '2006 [YR2006]',
       '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]',
       '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]',
       '2015 [YR2015]']]

In [13]:
df_pop.replace(to_replace='..', value=np.nan, inplace=True)

In [14]:
s = df_pop.pop('Country Name')
df_pop = df_pop.set_index(s)
df_pop

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,12789374.0,13745630.0,14824371.0,15869967.0,16772522.0,17481800.0,18034130.0,18511480,19038420,19701940,...,25183615,25877544,26528741,27207291,27962207,28809167,29726803,30682500,31627506,
Albania,3266790.0,3247039.0,3227287.0,3207536.0,3187784.0,3168033.0,3148281.0,3128530,3108778,3089027,...,2992547,2970017,2947314,2927519,2913021,2904780,2900489,2897366,2894475,
Algeria,26554277.0,27180921.0,27785977.0,28362015.0,28904300.0,29411839.0,29887717.0,30336880,30766551,31183658,...,33749328,34261971,34811059,35401790,36036159,36717132,37439427,38186135,38934334,
American Samoa,48379.0,49597.0,50725.0,51807.0,52874.0,53926.0,54942.0,55899,56768,57522,...,58648,57904,57031,56226,55636,55316,55227,55302,55434,
Andorra,56674.0,58904.0,61003.0,62707.0,63854.0,64291.0,64147.0,63888,64161,65399,...,83373,84878,85616,85474,84419,82326,79316,75902,72786,
Angola,11472173.0,11848971.0,12246786.0,12648483.0,13042666.0,13424813.0,13801868.0,14187710,14601983,15058638,...,18541467,19183907,19842251,20520103,21219954,21942296,22685632,23448202,24227524,
Antigua and Barbuda,62412.0,63434.0,64868.0,66550.0,68349.0,70245.0,72232.0,74206,76041,77648,...,83467,84397,85350,86300,87233,88152,89069,89985,90900,
Argentina,33193920.0,33655149.0,34110912.0,34558114.0,34994818.0,35419683.0,35833965.0,36241578,36648054,37057453,...,39558750,39969903,40381860,40798641,41222875,41655616,42095224,42538304,42980026,
Armenia,3511912.0,3449497.0,3369673.0,3289943.0,3223173.0,3173425.0,3137652.0,3112958,3093820,3076098,...,3002161,2988117,2975029,2966108,2963496,2967984,2978339,2992192,3006154,
Aruba,64623.0,68235.0,72498.0,76700.0,80326.0,83195.0,85447.0,87276,89004,90858,...,100830,101218,101342,101416,101597,101936,102393,102921,103441,


As a final step, we only keep a selection of countries from all these dataframes:

In [15]:
df_pop

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,12789374.0,13745630.0,14824371.0,15869967.0,16772522.0,17481800.0,18034130.0,18511480,19038420,19701940,...,25183615,25877544,26528741,27207291,27962207,28809167,29726803,30682500,31627506,
Albania,3266790.0,3247039.0,3227287.0,3207536.0,3187784.0,3168033.0,3148281.0,3128530,3108778,3089027,...,2992547,2970017,2947314,2927519,2913021,2904780,2900489,2897366,2894475,
Algeria,26554277.0,27180921.0,27785977.0,28362015.0,28904300.0,29411839.0,29887717.0,30336880,30766551,31183658,...,33749328,34261971,34811059,35401790,36036159,36717132,37439427,38186135,38934334,
American Samoa,48379.0,49597.0,50725.0,51807.0,52874.0,53926.0,54942.0,55899,56768,57522,...,58648,57904,57031,56226,55636,55316,55227,55302,55434,
Andorra,56674.0,58904.0,61003.0,62707.0,63854.0,64291.0,64147.0,63888,64161,65399,...,83373,84878,85616,85474,84419,82326,79316,75902,72786,
Angola,11472173.0,11848971.0,12246786.0,12648483.0,13042666.0,13424813.0,13801868.0,14187710,14601983,15058638,...,18541467,19183907,19842251,20520103,21219954,21942296,22685632,23448202,24227524,
Antigua and Barbuda,62412.0,63434.0,64868.0,66550.0,68349.0,70245.0,72232.0,74206,76041,77648,...,83467,84397,85350,86300,87233,88152,89069,89985,90900,
Argentina,33193920.0,33655149.0,34110912.0,34558114.0,34994818.0,35419683.0,35833965.0,36241578,36648054,37057453,...,39558750,39969903,40381860,40798641,41222875,41655616,42095224,42538304,42980026,
Armenia,3511912.0,3449497.0,3369673.0,3289943.0,3223173.0,3173425.0,3137652.0,3112958,3093820,3076098,...,3002161,2988117,2975029,2966108,2963496,2967984,2978339,2992192,3006154,
Aruba,64623.0,68235.0,72498.0,76700.0,80326.0,83195.0,85447.0,87276,89004,90858,...,100830,101218,101342,101416,101597,101936,102393,102921,103441,


In [16]:
selection = df_pop.index

df_pop = df_pop.loc[selection]
df_gdp = df_gdp.loc[selection]
df_internet = df_internet.loc[selection]

In [17]:
df_pop

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,12789374.0,13745630.0,14824371.0,15869967.0,16772522.0,17481800.0,18034130.0,18511480,19038420,19701940,...,25183615,25877544,26528741,27207291,27962207,28809167,29726803,30682500,31627506,
Albania,3266790.0,3247039.0,3227287.0,3207536.0,3187784.0,3168033.0,3148281.0,3128530,3108778,3089027,...,2992547,2970017,2947314,2927519,2913021,2904780,2900489,2897366,2894475,
Algeria,26554277.0,27180921.0,27785977.0,28362015.0,28904300.0,29411839.0,29887717.0,30336880,30766551,31183658,...,33749328,34261971,34811059,35401790,36036159,36717132,37439427,38186135,38934334,
American Samoa,48379.0,49597.0,50725.0,51807.0,52874.0,53926.0,54942.0,55899,56768,57522,...,58648,57904,57031,56226,55636,55316,55227,55302,55434,
Andorra,56674.0,58904.0,61003.0,62707.0,63854.0,64291.0,64147.0,63888,64161,65399,...,83373,84878,85616,85474,84419,82326,79316,75902,72786,
Angola,11472173.0,11848971.0,12246786.0,12648483.0,13042666.0,13424813.0,13801868.0,14187710,14601983,15058638,...,18541467,19183907,19842251,20520103,21219954,21942296,22685632,23448202,24227524,
Antigua and Barbuda,62412.0,63434.0,64868.0,66550.0,68349.0,70245.0,72232.0,74206,76041,77648,...,83467,84397,85350,86300,87233,88152,89069,89985,90900,
Argentina,33193920.0,33655149.0,34110912.0,34558114.0,34994818.0,35419683.0,35833965.0,36241578,36648054,37057453,...,39558750,39969903,40381860,40798641,41222875,41655616,42095224,42538304,42980026,
Armenia,3511912.0,3449497.0,3369673.0,3289943.0,3223173.0,3173425.0,3137652.0,3112958,3093820,3076098,...,3002161,2988117,2975029,2966108,2963496,2967984,2978339,2992192,3006154,
Aruba,64623.0,68235.0,72498.0,76700.0,80326.0,83195.0,85447.0,87276,89004,90858,...,100830,101218,101342,101416,101597,101936,102393,102921,103441,


In [18]:
df_gdp

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,263.012374,291.128823,294.238183,347.208097,366.324813,377.292766,418.426197,413.233959,408.898697,
Albania,1211.431789,1131.047006,1247.214425,1359.050613,1549.345209,1700.873966,1536.967481,1743.097819,1931.344256,2085.582721,...,2939.136002,3136.159294,3398.487894,3536.231650,3685.568522,3790.101771,3857.339741,3916.231237,3994.625479,
Algeria,2484.153186,2470.566078,2366.015901,2297.100086,2339.655291,2393.552046,2381.351303,2465.744244,2509.110754,2530.011442,...,3109.768204,3167.388563,3179.776708,3176.744294,3233.176770,3262.063307,3304.701427,3330.802120,3400.732802,
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,29833.095771,28970.387067,27685.024766,27574.571029,27825.961843,28921.863630,31615.216190,32757.517100,33955.180327,33700.762129,...,40745.465162,40054.227486,36296.274603,34968.485540,33512.015532,32713.610776,33357.462074,34835.709370,,
Angola,1376.983953,1241.205271,904.267859,906.193896,970.202243,1048.154225,1100.061612,1142.965689,1146.517824,1145.236277,...,1838.473307,2178.362871,2397.088332,2373.831980,2373.765225,2385.576823,2426.366161,2507.085626,2521.102581,
Antigua and Barbuda,9793.500455,9717.444050,10014.827235,10379.394318,9684.082744,10057.745742,10258.959051,10426.206800,10593.961477,10897.778167,...,13547.711833,14671.133807,14517.635341,12629.713786,11602.142227,11275.280156,11607.745344,11481.385188,11731.963766,
Argentina,4395.007076,4852.381210,5070.344112,5296.817066,5081.892652,5298.425979,5661.959372,5813.822438,5554.697946,5449.989077,...,6108.427140,6527.158500,6659.230509,6594.499948,7143.504413,7662.156733,7642.930027,7781.549510,7737.715767,
Armenia,1021.638633,605.352200,565.159435,610.113998,665.722822,715.818418,748.023547,808.996242,840.862615,895.603701,...,1847.746891,2111.675685,2267.312170,1952.342100,1997.052261,2087.751968,2230.288855,2297.661964,2364.748214,
Aruba,,,,23086.937951,22319.245642,23233.536705,24129.274973,23896.337025,24490.145798,23902.920892,...,23662.635648,22710.463505,21121.812032,19913.149353,,,,,,


In [19]:
df_internet

Unnamed: 0_level_0,1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],...,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015]
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,,,,,,,,,,,...,2.107124,1.900000,1.840000,3.550000,4.00,5.000000,5.454545,5.90000,6.39000,
Albania,,,,,0.011169,0.032197,0.048594,0.065027,0.081437,0.114097,...,9.609991,15.036115,23.860000,41.200000,45.00,49.000000,54.655959,57.20000,60.10000,
Algeria,,,,0.000361,0.001769,0.001739,0.010268,0.020239,0.199524,0.491706,...,7.375985,9.451191,10.180000,11.230000,12.50,14.000000,15.228027,16.50000,18.09000,
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,,,,,,1.526601,3.050175,6.886209,7.635686,10.538836,...,48.936847,70.870000,70.040000,78.530000,81.00,81.000000,86.434425,94.00000,95.90000,
Angola,,,,,,0.000776,0.005674,0.018454,0.071964,0.105046,...,1.907648,3.200000,4.600000,6.000000,10.00,14.776000,16.937210,19.10000,21.26000,
Antigua and Barbuda,,,,,2.200769,2.858450,3.480537,4.071716,5.300681,6.482226,...,30.000000,34.000000,38.000000,42.000000,47.00,52.000000,58.000000,63.40000,64.00000,
Argentina,,0.002993,0.029527,0.043706,0.086277,0.141955,0.280340,0.830767,3.284482,7.038683,...,20.927202,25.946633,28.112623,34.000000,45.00,51.000000,55.800000,59.90000,64.70000,
Armenia,,,,0.009117,0.052743,0.094573,0.111651,0.128659,0.970738,1.300470,...,5.631788,6.021253,6.210000,15.300000,25.00,32.000000,37.500000,41.90000,46.30000,
Aruba,,,,,,2.768383,,,4.506179,15.442823,...,28.000000,30.900000,52.000000,58.000000,62.00,69.000000,74.000000,78.90000,83.78000,


# Plotting this with Bokeh 

We follow the [gapminder](https://anaconda.org/bokeh/gapminder/notebook) example. We need to setup the data sources for the Bokeh plot and then draw it.

##  Setting up the data sources

In [20]:
from bokeh.models import ColumnDataSource

In [21]:
POP_SCALING = lambda x:  np.sqrt(x / np.pi) / 200

years = df_pop.columns

year_label = lambda year: "_" + year.split()[0]

sources = {}

for year in years:
    population = POP_SCALING(df_pop[year])
    population.name = 'population' 
    
    internet = df_internet[year]
    internet.name = 'internet users'

    gdp = df_gdp[year]
    gdp.name = 'gdp (2005 $)'
    
    new_df = pd.concat([internet, gdp, population], axis=1)
    new_df.index.name = 'country'
    sources[year_label(year)] = ColumnDataSource(new_df)

In [22]:
sources

{'_1991': <bokeh.models.sources.ColumnDataSource at 0x8ec5860>,
 '_1992': <bokeh.models.sources.ColumnDataSource at 0x8ec5cf8>,
 '_1993': <bokeh.models.sources.ColumnDataSource at 0x8ec5eb8>,
 '_1994': <bokeh.models.sources.ColumnDataSource at 0x8ec5e80>,
 '_1995': <bokeh.models.sources.ColumnDataSource at 0x8ec5be0>,
 '_1996': <bokeh.models.sources.ColumnDataSource at 0x8ec5e48>,
 '_1997': <bokeh.models.sources.ColumnDataSource at 0x8ec5a90>,
 '_1998': <bokeh.models.sources.ColumnDataSource at 0x8ed4978>,
 '_1999': <bokeh.models.sources.ColumnDataSource at 0x8ed4ba8>,
 '_2000': <bokeh.models.sources.ColumnDataSource at 0x8ed4dd8>,
 '_2001': <bokeh.models.sources.ColumnDataSource at 0x8ed4a20>,
 '_2002': <bokeh.models.sources.ColumnDataSource at 0x8ec5dd8>,
 '_2003': <bokeh.models.sources.ColumnDataSource at 0x8ec5da0>,
 '_2004': <bokeh.models.sources.ColumnDataSource at 0x8ec58d0>,
 '_2005': <bokeh.models.sources.ColumnDataSource at 0x8ed45c0>,
 '_2006': <bokeh.models.sources.ColumnDa

Let's inspect one of these sources:

In [23]:
src = sources['_1991']

In [24]:
src.column_names

['gdp (2005 $)', 'population', 'internet users', 'country']

Let's build the dict that references the datasources:

In [25]:
year_int = lambda year: int(year.split()[0])
dictionary_of_sources = dict(zip([year_int(year) for year in years], [year_label(year) for year in years]))
js_source_array = str(dictionary_of_sources).replace("'", "")

In [26]:
js_source_array

'{1991: _1991, 1992: _1992, 1993: _1993, 1994: _1994, 1995: _1995, 1996: _1996, 1997: _1997, 1998: _1998, 1999: _1999, 2000: _2000, 2001: _2001, 2002: _2002, 2003: _2003, 2004: _2004, 2005: _2005, 2006: _2006, 2007: _2007, 2008: _2008, 2009: _2009, 2010: _2010, 2011: _2011, 2012: _2012, 2013: _2013, 2014: _2014, 2015: _2015}'

## Setting up the plot 

In [27]:
import bokeh.plotting as bp
bp.output_notebook()

### Axes 

First, we setup the axes.

In [28]:
from bokeh.models import Range1d, Plot, LinearAxis, SingleIntervalTicker

In [44]:
# Set up the plot
xdr = Range1d(1, 60000)
ydr = Range1d(-10, 110)
plot = Plot(
    x_range=xdr,
    y_range=ydr,
    title="",
    plot_width=800,
    plot_height=400,
    outline_line_color=None,
    toolbar_location=None,    
)
AXIS_FORMATS = dict(
    minor_tick_in=None,
    minor_tick_out=None,
    major_tick_in=None,
    major_label_text_font_size="10pt",
    major_label_text_font_style="normal",
    axis_label_text_font_size="10pt",

    axis_line_color='#AAAAAA',
    major_tick_line_color='#AAAAAA',
    major_label_text_color='#666666',

    major_tick_line_cap="round",
    axis_line_cap="round",
    axis_line_width=1,
    major_tick_line_width=1,
)

xaxis = LinearAxis(ticker=SingleIntervalTicker(interval=10000), axis_label="Gdp (2005 $)", **AXIS_FORMATS)
yaxis = LinearAxis(ticker=SingleIntervalTicker(interval=10), axis_label="Internet usage (per 100 people)", **AXIS_FORMATS)   
plot.add_layout(xaxis, 'below')
plot.add_layout(yaxis, 'left')

Let's have a look at the current format of the plot we are building:

In [45]:
bp.show(plot)

### Background text with the year 

Let's now add the background text:

In [46]:
from bokeh.models import Text

In [48]:
# Add the year in background (add before circle)
text_source = ColumnDataSource({'year': [str(year_int(year)) for year in years[:1]]})
text = Text(x=(xdr.end + xdr.start) / 10, y=20, text='year', text_font_size='150pt', text_color='#EEEEEE')
plot.add_glyph(text_source, text)

<bokeh.models.renderers.GlyphRenderer at 0x61a7b38>

Let's look at our plot in its current form:

In [49]:
bp.show(plot)

### Bubbles and hover

Our plot is now missing two things: the renderer for the bubbles representing the country size as well as the hover tool that allows to identify the country when the mouse is hovering over a bubble.

In [61]:
from bokeh.models import Circle, HoverTool

In [62]:
# Add the circle glyphs
renderer_source = sources[year_label(years[0])]
circle_glyph = Circle(
    x='gdp (2005 $)', y='internet users', size='population',
    fill_color='#7c7e71', fill_alpha=0.8, 
    line_color='#7c7e71', line_width=0.5, line_alpha=0.5)
circle_renderer = plot.add_glyph(renderer_source, circle_glyph)

# Add the hover (only against the circle and not other plot elements)
tooltips = "@country"
plot.add_tools(HoverTool(tooltips=tooltips, renderers=[circle_renderer]))

Let's inspect some properties of this:

In [52]:
renderer_source.column_names

['gdp (2005 $)', 'population', 'internet users', 'country']

Again, let's check our plot:

In [53]:
bp.show(plot)

### Making it interactive using a slider and final plot

This is the tricky part of this plot: it uses Javascript to change the data source of the plot dynamically.

In [54]:
from bokeh.models import CustomJS, Slider

In [55]:
years_label = [year_int(year) for year in years]

In [56]:
# Add the slider
code = """
    var year = slider.get('value'),
        sources = %s,
        new_source_data = sources[year].get('data');
    renderer_source.set('data', new_source_data);
    text_source.set('data', {'year': [String(year)]});
""" % js_source_array

callback = CustomJS(args=sources, code=code)
slider = Slider(start=years_label[0], end=years_label[-1], value=1, step=1, title="Year", callback=callback, name='testy')
callback.args["renderer_source"] = renderer_source
callback.args["slider"] = slider
callback.args["text_source"] = text_source

Finally let's plot the result by putting the slider on the bottom of the plot.

In [57]:
from bokeh.plotting import vplot
from bokeh.resources import JSResources

In [60]:
# Stick the plot and the slider together
layout = vplot(plot, slider, )
bp.show(layout)

# Discussion 

In the end, I've managed to replicate the interactive plot that can be found on Ramiro's website. I've succeeded to integrate a slider that, in my opionion, allows better visualization of the passing of time. However, I'm not sure to what insight this plot leads. I guess it allows to visualize the dramatic deployment of internet usage since the 1990's.

This post was entirely written using the IPython notebook. Its content is BSD-licensed. You can see a static view or download this notebook with the help of nbviewer at [20160526_InternetUsersBokeh.ipynb](http://nbviewer.ipython.org/urls/raw.github.com/flothesof/posts/master/20160526_InternetUsersBokeh.ipynb).