# Visualizing Gapminder Data Using Bokeh

We will create an Interactive Data Visualization in Python where we can see how different countries progress over time with a play button and slider, as well as modifiable X and Y axes.

Bokeh is a flexible, interactive, shareable, productive, powerful, and open-source tool, preferrable when we want to have an interactive plot instead of a static one.


<a href="https://www.internetgeography.net/wp-content/uploads/2018/09/GAPMINDER-1030x575.jpg">Gapminder</a>

![image info](https://www.internetgeography.net/wp-content/uploads/2018/09/GAPMINDER-1030x575.jpg)

## Let's begin

Make sure you have Bokeh already installed in your computer. If not, you can install it using pip:

In [1]:
#pip install bokeh

**Step 1: First we import the packages that we need from Bokeh and pandas**

In [2]:
# Import pandas
import pandas as pd


# Import bokeh
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, CategoricalColorMapper
from bokeh.palettes import Spectral6

# import HoverTool
from bokeh.models import HoverTool

# import Slider
from bokeh.models import Slider, Select
from bokeh.layouts import column, row

# Import curdoc
from bokeh.io import curdoc

#import the Button
from bokeh.models import Button

* We use `ColumnDataSource` to provide the **data source** for the Bokeh plots and `CategoricalColorMapper` to **color code** different geographical regions using `Spectral6` palette.
* Bokeh allows us to add a **Hover tooltip** to provide extra information when we hover over a data point using the `HoverTool`.
* `Slider` tool to add a **Slider** and specify the parameters for it. 
* `Select` to add a **dropdown menu**.
* `curdoc` to run the code on **Bokeh server** to make it interactive. A Bokeh server uses Application code written in Python to create Bokeh Documents. Every new connection from a client browser results in the Bokeh server creating a new document, just for that session.
* `Button` to add a play-pause button.

**Step 2: Import the Gapminder dataset gapminder.csv that we have uploaded here into a `Pandas dataframe`.**

In [3]:
# Import data
gm_data = pd.read_csv("gapminder.csv", header=0, index_col="Year")

**Step 3: We now define the plot axis and axis labels. 
        <br> We will use a `dictionary` for the axis labels so that it will provide clear labels when we update the axis..**

In [4]:
# Define initial plot axes
# Fertility on X axis and Life span on Y axis

x = 'fertility'
xmin, xmax = min(gm_data[x]), max(gm_data[x])

y = 'life'
ymin, ymax = min(gm_data[y]), max(gm_data[y])

year = min(gm_data.index)


# Define a dictionary of labels to make axis labels user friendly

label= {
    'fertility' : 'Fertility (children per woman)',
    'life' : 'Life Expectancy (years)',
    'child_mortality' : 'Child mortality rate',
    'gdp': 'Country GDP'
        }


# Define bokeh data source
source = ColumnDataSource(data={
    'x'       : gm_data.loc[year, x],
    'y'       : gm_data.loc[year, y],
    'country' : gm_data.loc[year, 'Country'],
    'pop'     : gm_data.loc[year, 'population'],
    'pop_size': gm_data.loc[year, 'population'] / 20000000 + 2,
    'region'  : gm_data.loc[year, 'region'],
})


# Make a color mapper for each region in dataset
regions_list = gm_data['region'].unique().tolist()
color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)



**Step 4: Create the figure.**
   <br>
  * Here we plot the `scatter plots` using **plot.circle()**. 
  * We use fertility as x axis, life expectancy as y axis, and population as the dot size.
  * We use `pop_size` so that the dots fit in the plot.
  * We then use `output_file()` to save the plot as a .html file 
  * and finally `show(plot)` to show the plot.
  
**To add a hover tooltip:**
  * Initialize `Hovertool` it with the information we want to have in it. 
  * The parameter tooltips accepts a list of tuples as the input, with the label on the left and the data on the right. 
  * As with figure, the data source is defined on `ColumnDataSource`. 
  * Finally, use `plot.add_tools()` to add the hover tooltip you just made to the figure. 

In [5]:
# Create the figure
plot = figure(title='Gapminder data, plotting {} vs {} for year {}'.format(label[y], label[x], year), plot_height=700, plot_width=1000,
              x_range=(xmin, xmax), y_range=(ymin, ymax))

plot.circle(x='x', y='y', fill_alpha=0.6, source=source,
            color=dict(field='region', transform=color_mapper), size='pop_size', legend_group='region')

plot.legend.location = 'top_right'

plot.xaxis.axis_label = label[x]

plot.yaxis.axis_label = label[y]

# create a HoverTool
hover = HoverTool(tooltips=[
    ('Country', '@country'),
    ('Region', '@region'),
    ('Population', '@pop{0,0}'),
    (label[x], '@x{0,0.00}'),
    (label[y], '@y{0,0.00}')
])

plot.add_tools(hover)



**Step 5: Function update_plot()**

Here we will define the `update_plot()` function for our callback method. 
This function:
* Fetches the new value from the slider and dropdown menus and set it as the data source for the plot. 

**We also need to update the `hover too`l by removing and adding a newly created hover tool. 
It is important to define the hover tool as `global` for the function to work, since the original hover tool was defined outside of this function.**

In [6]:
# Define the callback function for interacting with the widgets
def update_plot(attr, old, new):
    #Use global hover to update HoverTool    
    global hover
    
    #Get updated values of current widgets (slider and dropdown menu)
    year = slider.value
    x = x_select.value
    y = y_select.value
    
    # Update axis Labels
    plot.xaxis.axis_label = label[x]
    plot.yaxis.axis_label = label[y]
    
    # Set new_data dictionary for the updated data
    new_data = {
        'x'       : gm_data.loc[year, x],
        'y'       : gm_data.loc[year, y],
        'country' : gm_data.loc[year, 'Country'],
        'pop'     : gm_data.loc[year, 'population'],
        'pop_size': gm_data.loc[year, 'population'] / 20000000 + 2,
        'region'  : gm_data.loc[year, 'region']
    }
    
    # Assign new_data to source.data
    source.data = new_data

    # Set the range of all axes
    plot.x_range.start = min(gm_data[x])
    plot.x_range.end = max(gm_data[x])
    plot.y_range.start = min(gm_data[y])
    plot.y_range.end = max(gm_data[y])

    # Add title to plot
    plot.title.text = 'Gapminder data, plotting {} vs {} for year {}'.format(label[y], label[x], year)
    
    # Updating the hover tools
    plot.tools.remove(hover)
    
    hover = HoverTool(tooltips=[
        ('Country', '@country'),
        ('Region', '@region'),
        ('Population', '@pop{0,0}'),
        (label[x], '@x{0,0.00}'),
        (label[y], '@y{0,0.00}')
    ])

    plot.add_tools(hover)

**Step 6: Add a slider and dropdown menus**

To make the plot interactive, Bokeh allows you to add slider and dropdown menus as well. 

To add the slider:
* To add a dropdown menu, we use Select and input a list of possible dropdown options, give default value, and the title. 
* To refresh the plot when we move the slider or use the dropdown menu, we use `.on_change()` method.
* This is a callback method to update the plot that will take in `function update_plot()` (user defined function) as a parameter. 

Create plot layout:
* Using column and row, we design the layout of our plot. 

Finally, **we show the layout (not the plot) using `show(layout)`, which replaces show(plot).**

In [7]:
# Create a year slider
slider = Slider(start=min(gm_data.index), end=max(gm_data.index), step=1, value=min(gm_data.index), title='Year')

# Attach the update_plot callback to slider
slider.on_change('value', update_plot)

# Create the list of the dropdown menus
dropdown = ['fertility', 'life', 'child_mortality', 'gdp']

# Create a dropdown Select widget for the x data
x_select = Select(
    options=dropdown,
    value=dropdown[0],
    title='x-axis data'
)

# Attach the update_plot callback to x dropdown
x_select.on_change('value', update_plot)

# Create a dropdown Select widget for the y data
y_select = Select(
    options=dropdown,
    value=dropdown[1],
    title='y-axis data'
)

# Attach the update_plot callback to y dropdown
y_select.on_change('value', update_plot)


**Step 7: Bokeh server application**

 We noticed that `show(layout)` only shows a static plot. 
 To use Python callbacks, we need to use a **`Bokeh server application`**.
 
 We need to run the code on Bokeh server to make it interactive. 
 To do so:
  * we first need to use `curdoc` instead of `show(layout)` with the code below.


In [8]:
# Create play-pause button

#define the callback function to animate the slider when clicking the button
def animate_update():
    year = slider.value + 1
    if year > max(gm_data.index):
        year = min(gm_data.index)
    slider.value = year

#create global variable for callback
callback_animate = None

#define the callback function for clicking the button
def animate():
    global callback_animate
    if button.label == '► Play':
        button.label = '❚❚ Pause'
        callback_animate = curdoc().add_periodic_callback(animate_update, 200)
    else:
        button.label = '► Play'
        curdoc().remove_periodic_callback(callback_animate)

#create the button and set on-click callback
button = Button(label='► Play', width=60)
button.on_click(animate)

In [9]:
# Create layout and add to current document
layout = row(column(slider, button, x_select, y_select), plot)
curdoc().add_root(layout)
curdoc().title = 'Gapminder'

**Step 8: Run the Show**
 The code can be run from Notebook as well as from the Termilal.

In [None]:
# Run on Bokeh server with bash code
!bokeh serve --show --port 5002 gapminder_bokehserver.ipynb

# Conclusion:
    
   Using Bokeh is more challenging than Matplotlib or Seaborn. But the interactivity that Bokeh offers really makes the plot stands out compared to static plots.