<a href="https://colab.research.google.com/github/sandeep92134/The-Data-Visualization-Workshop-by-packt/blob/master/module%206/Activity6.02%3A%20Extending%20Plots%20with%20Widgets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Activity 6.02: Extending Plots with Widgets

This activity will combine most of what you have already learned about Bokeh. You will also need the skills you have acquired while working with Pandas for additional dataframe handling.  

We will create an interactive visualization that lets us explore the end results of the olympics 2016 in Rio. Our visualization will display each country that participated in a coordinate system where the x-axis represents the number of won medals and the y-axis the number of athletes.

Using interactive widgets, we will be able to filter down the displayed countries in both, maximum amount of won medals and maximum amount of athletes.

#### Loading our dataset

In [1]:
# importing the necessary dependencies
import pandas as pd

In [2]:
# make bokeh display figures inside the notebook
from bokeh.io import output_notebook

output_notebook()

In [3]:
# loading the Dataset with geoplotlib
dataset = pd.read_csv('https://raw.githubusercontent.com/sandeep92134/The-Data-Visualization-Workshop-by-packt/master/module%206/datasets/olympia2016_athletes.csv')

In [4]:
# looking at the dataset
dataset.head()

Unnamed: 0,id,name,nationality,sex,dob,height,weight,sport,gold,silver,bronze
0,736041664,A Jesus Garcia,ESP,male,10/17/69,1.72,64.0,athletics,0,0,0
1,532037425,A Lam Shin,KOR,female,9/23/86,1.68,56.0,fencing,0,0,0
2,435962603,Aaron Brown,CAN,male,5/27/92,1.98,79.0,athletics,0,0,1
3,521041435,Aaron Cook,MDA,male,1/2/91,1.83,80.0,taekwondo,0,0,0
4,33922579,Aaron Gate,NZL,male,11/26/90,1.81,71.0,cycling,0,0,0


Our dataset contains the following columns: 

- `id`: unique id of the athlete
- `name`: name of the athlete
- `nationality`: nationality of the athlete
- `sex`: male or female
- `dob`: date of birth of the athlete
- `height`: height of the athlete
- `weight`: weight of the athlete
- `sport`: categorie the athlete is attending
- `gold`: amount of gold medals the athlete won
- `silver`: amount of silver medals the athlete won
- `bronze`: amount of bronze medals the athlete won

We want to use the nationality, gold, silver, and bronze columns to create a custom visualization that let us dig through the olympians.

---

#### Building an interactive visualization

There are many options when it comes to choosing which interactivity to use.   
Since the goal of this activity is to give you a better understanding of configuring widgets and adding tooltips, we will focus on having only two widgets.

In the end, we will have a visualization that allows us to filter countries for the amount of medals and athletes they placed in the olympics and upon hovering the single data points, gives us more information about each country.   

<img src="assets/plot.png" width="500" align="left"/>


In [5]:
# importing the necessary dependencies 
from bokeh.plotting import figure, show, ColumnDataSource
from ipywidgets import interact, widgets

Like in the previous exercises we need to do some data extraction first.   
In this activity we will need:   
- a list of unique countries from the dataset
- the amount of athletes for each country
- the amount of medals won by each country, split in gold, silver, and bronze

In [6]:
# extract countries and group olympians by country
# and the number of medals per country
countries = dataset['nationality'].unique()
athletes_per_country = dataset.groupby('nationality').size()
medals_per_country = dataset.groupby('nationality')['gold', 'silver','bronze'].sum()

  """


Before we go in and implement the plotting for this visualization, we want to set up our widgets and the `@interact` method that will later display the plot upon execution.   

Execute this empty `get_plot()` method cell and then move on to the widget creation. We will implement this later.

The two arguments we get passed are `max_athletes` and `max_medals`. Both of them are int values.   
First we want to filter down our countries dataset that contains all the countries that placed athletes in the olympic games.   
We need to check whether they have less or equal medals and athletes than our max values passed as arguments.

Once we have a filtered down dataset, we can create our datasource. This datasource will be used, both for the tooltips and the printing of the circle glyphs.

> **Note:**   
There is extensive documentation on how to use and setup tooltips, try to make use of that: https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html

Create a new plot using the `figure` method has the following attributes:   
- title of 'Rio Olympics 2016 - Medal comparison'
- x_axis_label of 'Number of Medals'
- y_axis_label of 'Num of Athletes'

In [7]:
# creating the scatter plot
def get_plot(max_athletes, max_medals):
    filtered_countries=[]
    
    for country in countries:
        if (athletes_per_country[country] <= max_athletes and 
            medals_per_country.loc[country].sum() <= max_medals):
            filtered_countries.append(country)
        
    data_source=get_datasource(filtered_countries)
    TOOLTIPS=[
        ('Country', '@countries'),
        ('Num of Athletes', '@y'),
        ('Gold', '@gold'),
        ('Silver', '@silver'),
        ('Bronze', '@bronze')
    ]
    
    plot=figure(title='Rio Olympics 2016 - Medal comparison', 
                x_axis_label='Number of Medals',  
                y_axis_label='Num of Athletes',
                plot_width=800, 
                plot_height=500,
                tooltips=TOOLTIPS)
    
    plot.circle('x', 'y', source=data_source, size=20, color='color', alpha=0.5)
    
    
    return plot  

In order to display every country with a different color, we want to randomly create the colors with a six digit hex code.    
The method below does exactly this.

In [8]:
# get a 6 digit random hex color to differentiate the countries better
import random

def get_random_color():
    return '#%06x' % random.randint(0, 0xFFFFFF)

We will use a bokeh ColumnDataSource to handle our data and make it easily accessible for our tooltip and glyphs.   
Since we want to display additional information in a tooltip we need our datasource to have:
- color field that holds the required amount of random colors
- countries field that holds the list of filtered down countries
- gold field that holds the number of gold medals for each country
- silver field that holds the number of silver medals for each country
- bronze field that holds the number of bronze medals for each country
- x field that holds the summed number of medals for each country
- y field that holds the number of athletes for each country

In [9]:
# build the datasource
def get_datasource(filtered_countries):
    return ColumnDataSource(data=dict(
        color=[get_random_color() for _ in filtered_countries],
        countries=filtered_countries,
        gold=[medals_per_country.loc[country]['gold'] for country in filtered_countries],
        silver=[medals_per_country.loc[country]['silver'] for country in filtered_countries],
        bronze=[medals_per_country.loc[country]['bronze'] for country in filtered_countries],
        x=[medals_per_country.loc[country].sum() for country in filtered_countries],
        y=[athletes_per_country.loc[country].sum() for country in filtered_countries]
    ))

Before we start to implement the plot with bokeh, we want to set up our widgets.   
In this activity we will use two `IntSlider` widgets that will control the max numbers for the amount of athletes or and medals a country is allowed to have in order to be displayed in the visualization.   

We need two values in order to set up the widgets:
- the maximum amount of medals of all the countries
- the maximum amount of athletes of all the countries

In [10]:
# getting the max amount of medals and athletes of all countries
max_medals = medals_per_country.sum(axis=1).max()
max_athletes = athletes_per_country.max()

Using those maximum numbers as the maximum for both widgets will give us reasonable slider values that are dynamically adjusted if we should increase the amount of atheletes or medals in the dataset.

We need two `IntSlider` objects that handle the input for our `max_athletes` and `max_medals`.   
To look like our actual visualization, we want to have the `max_athletes_slider` displayed in a vertical orientation and the `max_medals_slider` in a horizontal orientation.   
In the visualization, they should be display as "Max. Athletes" and "Max. Medals".

In [11]:
# setting up the interaction elements
max_athletes_slider=widgets.IntSlider(
    value=max_athletes,
    min=0,
    max=max_athletes,
    step=1,
    description='Max. Athletes:',
    continuous_update=False,
    orientation='vertical',
    layout={'width': '100px'}
)

max_medals_slider=widgets.IntSlider(
    value=max_medals,
    min=0,
    max=max_medals,
    step=1,
    description='Max. Medals:',
    continuous_update=False,
    orientation='horizontal'
)

After setting up the widgets, we can the method that will be called with each update of the interaction widgets.   
As seen in the previous exercise, we will use the `@interact` decorator for this.   

Instead of value ranges or lists, we will provide the variable names of our already created widgets in the decorator.   
Since we have already set up the empty method that will return a plot above, we can call `show()` with the method call inside to show the result once it is returned from the `get_plot` method.   

Once you've build the widgets, upon execution, you will see them being displayed below the cell.   
We are now ready to to **scroll up and implement the plotting** with Bokeh.

In [14]:
# creating the interact method 
@interact(max_athletes=max_athletes_slider, max_medals=max_medals_slider)
def get_olympia_stats(max_athletes, max_medals):
    show(get_plot(max_athletes, max_medals))

interactive(children=(IntSlider(value=567, continuous_update=False, description='Max. Athletes:', layout=Layou…

This is a nice example that shows us how we can easily add widgets that help us discover our data.   
Tooltips are a very useful way to also make visualizations more interactive and espacially more understandable by providing additional information for each data point.

**Note:**   
Think about what else you could add/change for this visualization. Maybe we also want to display information about how many male vs. female athletes there are for each country.
