<a href="https://colab.research.google.com/github/jotaborrajo/jota/blob/master/bokeh_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hands-on exercise interactive data visualization

By Jan Aerts, Visual Data Analysis Lab, Data Science Institute, UHasselt, http://vda-lab.be

![alt text](https://drive.google.com/uc?id=1nXc5zgZsQUAZnSNq4dvTp_gF4R0L-2j8)

In this tutorial, we will use python Bokeh (http://bokeh.pydata.org) to generate visualizations based on a flights dataset. This tutorial holds numerous code snippets that can by copy/pasted and modified for your own purpose. It is also available as an ipython notebook so you can run it interactively.
The contents of this tutorial is available under the **CC-BY license**. The tutorial is written in a very incremental way: we start with something simple, and gradually add little bits and pieces that allow us to make more complex visualizations. So make sure to not skip parts of the tutorial: everything depends on everything that precedes it.

As this is an jupyter notebook, you can "run" each of the codeblocks using **Shift-Enter**, and you should see the output appear underneath the codeblock. However, we will need to have some things installed (e.g. the bokeh library itself, the dataset, etc). For this we can use the `!command` approach. Don't worry about this too much as that is not the point of this tutorial.

## Data visualisation and python
Many libraries exist to create data visualisations, in different programming languages. In the data visualisation community, javascript and [D3](http://d3js.org) are the most prominent. As much data analysis is however based on python, we will use a python library here, called [bokeh](http://bokeh.pydata.org).

## Plotting vs charting
Bokeh provides both a high-level language to create pre-defines charts (scatterplot, barchart, ...) and a low-level API which gives you more flexibility (lines, points, areas). In this tutorial, we will jump into the lower-level API so that you'd be able to implement custom visuals as well. Moreover, the charts library has been deprecated and many simpler charts are now possible through the `bokeh.plotting` library.

## Installation of Bokeh

See http://bokeh.pydata.org/en/latest/docs/installation.html for dependencies and installation instructions.

To use bokeh, add `import bokeh` at the top of your script. Add `output_notebook()` if you want to see your plots inline with the rest of your notebook.
 
But first, we'll need to the library on your system. You only have to do this once:

In [0]:
!pip install bokeh

## IMPORTANT - Bokeh API documentation
For documentation on arguments to use in any of the coding examples below, check the bokeh API:
* for `figure`: [https://bokeh.pydata.org/en/latest/docs/reference/plotting.html](https://bokeh.pydata.org/en/latest/docs/reference/plotting.html)
* for `plot`: [https://bokeh.pydata.org/en/latest/docs/reference/models/plots.html#bokeh.models.plots.Plot](https://bokeh.pydata.org/en/latest/docs/reference/models/plots.html#bokeh.models.plots.Plot)
* for `grid`: [https://bokeh.pydata.org/en/latest/docs/reference/models/grids.html](https://bokeh.pydata.org/en/latest/docs/reference/models/grids.html)
* for `circle` and other marks: [https://bokeh.pydata.org/en/latest/docs/reference/models/markers.html](https://bokeh.pydata.org/en/latest/docs/reference/models/markers.html) and [https://bokeh.pydata.org/en/latest/docs/reference/models/glyphs.html](https://bokeh.pydata.org/en/latest/docs/reference/models/glyphs.html)

**Make sure to have these pages open while you're learning bokeh**

## A minimal script

A minimal script is provided below. *In all scripts, the line numbering includes the first line with the comment, as well as any empty lines*

In [0]:
# script 1
from bokeh.plotting import *
output_notebook()

p = figure(plot_width=500, plot_height=500)
p.grid.grid_line_color=None
p.circle(x=100,y=150,size=20,fill_color="red")
p.rect(x=200,y=200,width=50,height=60,fill_color="green", line_color="red")
p.line(x=[10,280],y=[5,250],line_color="blue")
show(p)

Let’s walk through each line. The script is made up of a list of statements. The first line in the script is a comment: anything after a `#` will not be read and interpreted. Line 2 loads the bokeh library. The third line makes sure that we will see the actual plots inline in our ipython notebook. The magic starts to happen at line 5, where we **create a new empty figure** and set its dimensions. In this case, we'll generate a picture of 500x500 pixels (see [here](https://bokeh.pydata.org/en/latest/docs/reference/models/plots.html#bokeh.models.plots.Plot) for information on arguments used). On line 6, we remove the gridlines that are are shown by default (see [here](https://bokeh.pydata.org/en/latest/docs/user_guide/styling.html#grids) for info on grids). See what happens if your removed this line... Next, we add a **circle** to the picture with the (horizontal) x-position set to 100 and the (vertical) y-position set to 150. The radius is defined as 20 pixels, and we colour it red. There are 2 types of colour commands: **`fill_color`** defines the colour of the mass of a marker, whereas **`line_color`** refers to the colour of the line around the object. For a circle, therefore, the `fill_color` sets the colour of the *disk* whereas `line_color` defines the colour of the circle around the disk.
Next, we draw a **rectangle** with its center position set at (200,200), a width of 50 pixels, a height of 60 pixels, and make it green. We also draw a **line** from the point (10,5) to the point (280,250). The first parameter of the `p.line` command should be a list of the x-positions; the second parameter a list of the y-positions. Finally, in line 9, we show the actual picture.

A full list of all parameters you can use for each of these marks (circle, rectangle, line, ...) is available at **http://bokeh.pydata.org/en/latest/docs/reference/plotting.html**.

## Variables, loops and conditionals

What if we want to do something multiple times? Suppose we want to draw 10 lines underneath each other. We could do that like this:

In [0]:
# script 2
from bokeh.plotting import *
output_notebook()

p = figure(plot_width=500,plot_height=300,x_range=[0,500],y_range=[0,200])
p.grid.grid_line_color=None
p.line(x=[100,400],y=[0,0])
p.line(x=[100,400],y=[10,10])
p.line(x=[100,400],y=[20,20])
p.line(x=[100,400],y=[30,30])
p.line(x=[100,400],y=[40,40])
p.line(x=[100,400],y=[50,50])
p.line(x=[100,400],y=[60,60])
p.line(x=[100,400],y=[70,70])
p.line(x=[100,400],y=[80,80])
p.line(x=[100,400],y=[90,90])
show(p)

In this plot, we added `x_range=[0,500],y_range=[0,200]`. Why is this? If bokeh wants to plot something, it will by default zoom into the glyphs. For this exercise, we however want to see the whole plot: its full width and height.

Of course this code is not optimal: what if we have 5,000 datapoints to plot? To handle real data, we will need variables, loops, and conditionals.
In the code block above, we can replace the hard-coded numbers with variables. These can be integers, floats, strings, arrays, etc. The statement `start_x = 100` below means “create a variable called `start_x`, and give it the value of 100”.

In [0]:
# script 3
from bokeh.plotting import *
output_notebook()

start_x = 100
stop_x = 400

p = figure(plot_width=500,plot_height=300,x_range=[0,500],y_range=[0,300])
p.grid.grid_line_color=None
for i in range(0,9):
    p.line(x=[start_x,stop_x],y=[i*10,i*10])
show(p)

In script 3, we added 2 new variables (called `start_x` and `stop_x`) to illustrate the use of variables. Also, we replaced all the lines from script 2 that look the same with a single `for`-loop (lines 10 and 11). In this loop, we go through the numbers 0 to 9, put these in a variable called `i`, and use that variable in line 11. The first time the loop is run, `i` will be 0, so that line will effectively be `p.line([start_x,stop_x,[0,0])`; in the second loop, `i` will be 1, so line 11 will become `p.line([start_x,stop_x],[10,10])`.
**Whitespace** is very important in python: it defines code blocks. In the `for`-loop above, line 11 is indented relative to line 10. This means that line 11 is *part of* the block started at line 10. In other languages (java, C, perl, ruby, ...), you will for example see curly brackets (`{}`) used instead of whitespace to define blocks. A `for`-loop in java for example can look like this:
``` {.java}
for ( int i = 0; i < 10; i++ ) { println "Variable i is not " + i };
```
As you can see, in java curly brackets have the same role as whitespace in python. This also means that the following code would not work, and will give an `IndentationError`:
``` {.python}
for i in range(0,9):
print i
```

We can use **conditionals** to for example distinguish odd or even lines by colour.

In [0]:
# script 4
from bokeh.plotting import *
output_notebook()

start_x = 100
stop_x = 400

p = figure(plot_width=500,plot_height=300,x_range=[0,500],y_range=[0,300])
p.grid.grid_line_color=None
for i in range(0,9):
    if i%2 == 0:
        p.line(x=[start_x,stop_x],y=[i*10,i*10],line_color="red")
    else:
        p.line(x=[start_x,stop_x],y=[i*10,i*10],line_color="blue")
show(p)

In this code snippet, we check in each loop if `i` is even or odd, and let the line colour depend on that result. An if-clause has the following form:
``` {.python}
if *condition*:
    # do something
else:
    # do something else
```

The condition `i%2 == 0` means: does dividing the number `i` with 2 result in a remainder of zero? Note that we have to use 2 equal-signs here (`==`) instead of just one (`=`). This is so to distinguish between a **test for equality**, and an **assignment**. Don’t make errors against this...

## Exercise data

The data for this exercise concerns flight information between different cities. Each entry in the dataset contains the following fields:
￼
* from_airport
* from_city
* from_country
* from_long
* from_lat
* to_airport
* to_city
* to_country
* to_long
* to_lat
* airline
* airline_country
* distance

### Getting the data

You can download the data from `http://vda-lab.github.io/assets/flights.csv`. As this is a comma-separated file, we will import the `csv` library. Now how do we go through such a file? Here's a typical code snippet:
``` {.python}
import csv
f = open('flights.csv')
reader = csv.reader(f)
for line in reader:
    # do something with the line
```

But first let's download the data locally.

In [0]:
!wget http://vda-lab.github.io/assets/flights.csv
!head flights.csv
!tail flights.csv

### Accessing the data from Python

Let’s write a small script to visualize this data. The visual encoding that we’ll use for each flight will be the following:

* x position is defined by longitude of departure airport
* y position is defined by latitude of departure airport

In [0]:
# script 5
import csv
from bokeh.plotting import *
output_notebook()

def normalize(input, domain_start, domain_stop, range_start, range_stop):
    return (range_stop - range_start)*((input-domain_start)/(domain_stop-domain_start))+range_start

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs = []
ys = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    xs.append(normalize(float(from_long),-180,180,0,800))
    ys.append(normalize(float(from_lat),-180,180,0,400))
p = figure(plot_width=800,plot_height=400,title="Departure airports")
p.grid.grid_line_color=None
p.circle(x=xs,y=ys,line_color=None,fill_color="blue",fill_alpha=0.1)
show(p)

You can see that the resulting image shows a map of the world, with areas with high airport density clearly visible. Notice that the data itself does not contain any information where the continents, oceans and coasts are; still, these are clearly visible in the image.

Let's go through the code:

* Line 2: we load the `csv` module.
* Lines 6-7: we define a `normalize` method. More about that a bit later, when we discuss lines 20 and 21.
* Lines 9-11: we prepare the data from the file to be read
* Lines 13-14: As mentioned for script 1, the first two parameters of the `circle` command are arrays of x- and y-positions. Here, we define these 2 arrays.
* Line 16: Here's where we start looping over the lines in the file. For each line, the code block from line 17 to line 21 will be executed.
* Lines 17-19: We put the value of each column for that particular row in a separate variable.
* Lines 20-21: We append the value from `from_long` to the list of xs, and the value of `from_lat` to the list of ys. However, we need to normalize these: the actual values range from -180 to 180, but we need pixel positions between 0 and 800. That's where the `normalize` function comes in. The values -180 and 180 define the *domain* of the data, whereas 0 and 800 define the *range*. Other plotting libraries (e.g. processing.org (java) or p5js.org (javascript)) provide a function (called `map`) that does this as well. See the figure below.
* Line 24: For `p.circle`, we provide several arguments. The first two should each be an array which list the x and y coordinates respectively. This is a similar behaviour to R, where these list are basically independent.

![alt text](https://drive.google.com/uc?id=1ohfelY9Fh2p3jamNmQe-oDPUCSujQ-9R)

In the plot above, we get a lot of overplotting: there are often many flights leaving from the same `from_airport`, and we draw those for each. We should better keep track of those airports that we already drew.

In [0]:
from bokeh.plotting import *
output_notebook()
import csv

xs = []
ys = []
from_airports = []

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    if from_airport not in from_airports:
        from_airports.append(from_airport)
        xs.append(normalize(float(from_long),-180,180,0,800))
        ys.append(normalize(float(from_lat),-180,180,0,400))
script5b_plot = figure(plot_width=800,plot_height=400,title="Departure airports")
script5b_plot.grid.grid_line_color=None
script5b_plot.scatter(xs,ys,line_color=None,fill_color="blue",fill_alpha=0.3)
show(script5b_plot)

The relevant lines here are line 7 (`from_airports = []`) and lines 19 and 20, where we only add a value to the xs and ys if that airport has not been seen yet.

Let's do something more advanced now. In the next plot, we want to:
1. have a **different colour for domestic and international flights**
1. have the **size of the circle depend on the distance of the flight**: the longer the distance, the larger the radius

In [0]:
from bokeh.plotting import *
output_notebook()
import csv

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs_dom = []
ys_dom = []
xs_int = []
ys_int = []
rs_dom = []
rs_int = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    if from_country == to_country:
        xs_dom.append(normalize(float(from_long),-180,180,0,800))
        ys_dom.append(normalize(float(from_lat),-180,180,0,400))
        rs_dom.append(normalize(float(distance),1,15406,3,15))
    else:
        xs_int.append(normalize(float(from_long),-180,180,0,800))
        ys_int.append(normalize(float(from_lat),-180,180,0,400))
        rs_int.append(normalize(float(distance),1,15406,3,15))
script6_plot = figure(plot_width=800,plot_height=400)
script6_plot.grid.grid_line_color=None
script6_plot.circle(xs_dom,ys_dom,size=rs_dom,line_color=None,fill_color="red",fill_alpha=0.1)
script6_plot.circle(xs_int,ys_int,size=rs_int,line_color=None,fill_color="blue",fill_alpha=0.1)
show(script6_plot)

We basically create separate lists of `x` and `y`: one for domestic (`x_dom` and `y_dom`) and one for international. The same goes for the new arrays `r_dom` and `r_int` which contain the radius of all the points. Eventually, we just plot 2 sets of circles:
```
script6_plot.circle(xs_dom,ys_dom,size=rs_dom,...)
script6_plot.circle(xs_int,ys_int,size=rs_int,...)
```


Let's add some **hovering**...

To use hovering, we need to look into a different way of storing our data internally: the **ColumnDataSource**. Here's a proof-of-concept. When you hover over a datapoint, you should get a tooltip with additional information. Later, we'll put something useful in there.

In [0]:
from bokeh.plotting import *
output_notebook()

from bokeh.models import HoverTool
TOOLS = [HoverTool()]

p = figure(plot_width=400, plot_height=400, title=None, tools=TOOLS)

p.circle([1, 2, 3, 4, 5], [2, 5, 8, 2, 7], size=10)

show(p)

In lines 4 and 5, we define the `TOOLS` variable, which contains a list of tools (in this case only for hovering). We then have to add `tools=TOOLS` to the definition of the figure in line 7. The result is the small glyph you see on the right of the image.

Now merge this with the airports script:

In [0]:
from bokeh.plotting import *
output_notebook()
import csv
from bokeh.models import HoverTool

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs_dom = []
ys_dom = []
xs_int = []
ys_int = []
rs_dom = []
rs_int = []
from_airports_dom = []
from_airports_int = []
distances_dom = []
distances_int = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    if from_country == to_country:
        if from_airport not in from_airports_dom:
            xs_dom.append(normalize(float(from_long),-180,180,0,800))
            ys_dom.append(normalize(float(from_lat),-180,180,0,400))
            rs_dom.append(normalize(float(distance),1,15406,3,15))
            from_airports_dom.append(from_airport)
            distances_dom.append(distance)
    else:
        if from_airport not in from_airports_int:
            xs_int.append(normalize(float(from_long),-180,180,0,800))
            ys_int.append(normalize(float(from_lat),-180,180,0,400))
            rs_int.append(normalize(float(distance),1,15406,3,15))
            from_airports_int.append(from_airport)
            distances_int.append(distance)

TOOLS="pan,box_zoom,lasso_select,reset,hover"

source_dom = ColumnDataSource(             # <=====
    data = dict(                           # <=====
        x = xs_dom,                        # <=====
        y = ys_dom,                        # <=====
        radius = rs_dom,                   # <=====
        from_airport = from_airports_dom,  # <=====
        distance = distances_dom           # <=====
    )
)
source_int = ColumnDataSource(             # <=====
    data = dict(                           # <=====
        x = xs_int,                        # <=====
        y = ys_int,                        # <=====
        radius = rs_int,                   # <=====
        from_airport = from_airports_int,  # <=====
        distance = distances_int           # <=====
    )
)

script7_plot = figure(plot_width=800,plot_height=400,tools=TOOLS)
script7_plot.grid.grid_line_color=None
# was: script6_plot.circle(xs_dom,ys_dom,size=rs_dom,line_color=None,fill_color="red",fill_alpha=0.1)
script7_plot.circle(source=source_dom, x="x", y="y",size="radius",line_color=None,fill_color="red",fill_alpha=0.3)  # <=====
script7_plot.circle(source=source_int, x="x",y="y", size="radius",line_color=None,fill_color="blue",fill_alpha=0.3)  # <=====

hover = script7_plot.select(dict(type=HoverTool))                            # <=====
hover.tooltips = [                                                           # <=====
    ("index, from_airport, distance", "$index, @from_airport, @distance"),   # <=====
]

show(script7_plot)

Now you can:
* get more information by hovering over an airport
* lasso-select a region with airports
* zoom by drawing a box
* reset back to the original state

Bokeh automatically draws non-selected datapoints as light-blue/greyish circles. Now also try selecting a group of red airports in Alaska without including a blue one. We see that only the red non-selected airports are marked as "not selected". Nothing happens to the blue airports. This is because we actually draw 2 plots on top of each other (i.e. you see a `script7_plot.circle` twice). So let's try and do the same thing again, but now only drawing a single plot. The reason we drew 2 plots before is because we wanted to have 2 colours. Another way of doing this, is to give the `fill_color` parameter an *array* as its value instead of a single value.

In [0]:
from bokeh.plotting import *
output_notebook()
import csv
from bokeh.models import HoverTool

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs = []
ys = []
rs = []
from_airports = []
distances = []
colours = []   # <=======

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    if from_airport not in from_airports:
        xs.append(normalize(float(from_long),-180,180,0,800))
        ys.append(normalize(float(from_lat),-180,180,0,400))
        rs.append(normalize(float(distance),1,15406,3,15))
        from_airports.append(from_airport)
        distances.append(distance)
        if from_country == to_country:   # <=======
            colours.append("red")        # <=======
        else:                            # <=======
            colours.append("blue")       # <=======

TOOLS="pan,box_zoom,lasso_select,reset,hover"

source = ColumnDataSource(
    data = dict(
        x = xs,
        y = ys,
        radius = rs,
        from_airport = from_airports,
        distance = distances,
        colour = colours            # <=======
    )
)

script7_plot = figure(plot_width=800,plot_height=400,tools=TOOLS)
script7_plot.grid.grid_line_color=None
script7_plot.circle(source=source,x="x",y="y",size="radius",line_color=None,fill_color="colour",fill_alpha=0.3)

hover = script7_plot.select(dict(type=HoverTool))
hover.tooltips = [
    ("index, from_airport, distance", "$index, @from_airport, @distance"),
]

show(script7_plot)

Instead of making two `ColumnDataSources` as before (one for domestic flights, and one for international flights), we now just add an additional key which holds the colour for that particular flight.

## Interactivity

Now let's add some **interactivity**: we'll add a slider that sets the size of the circles.

In [0]:
from bokeh.plotting import *
output_notebook()
import csv
from bokeh.models import HoverTool, CustomJS, Slider
from bokeh.layouts import row, widgetbox

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs = []
ys = []
rs = []
from_airports = []
distances = []
colours = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    if from_airport not in from_airports:
        xs.append(normalize(float(from_long),-180,180,0,600))
        ys.append(normalize(float(from_lat),-180,180,0,300))
        rs.append(normalize(float(distance),1,15406,3,15))
        from_airports.append(from_airport)
        distances.append(distance)
        if from_country == to_country:
            colours.append("red")
        else:
            colours.append("blue")

TOOLS="pan,box_zoom,lasso_select,reset,hover"

source = ColumnDataSource(
    data = dict(
        x = xs,
        y = ys,
        radius = rs,
        adjustedRadius = rs,
        from_airport = from_airports,
        distance = distances,
        colour = colours
    )
)

script8_plot = figure(plot_width=600,plot_height=300,tools=TOOLS)
script8_plot.grid.grid_line_color=None
script8_plot.circle("x","y",source=source,size="adjustedRadius",line_color=None,fill_color="colour",fill_alpha=0.3)

hover = script8_plot.select(dict(type=HoverTool))
hover.tooltips = [
    ("index, from, distance", "$index, @from_airport, @distance"),
]

# >>>>>>>>
# Code that has to do with the slider
callbackFunction = CustomJS(args=dict(source=source), code="""
    var data = source.data;
    var z = zoom.value;
    radius = data['radius'];
    adjustedRadius = data['adjustedRadius']
    for ( i = 0; i < radius.length; i++ ) {
        adjustedRadius[i] = radius[i]*z;
    }
    source.change.emit();
""")

zoom_slider = Slider(start=0.1, end=10, value=1, step=.1,
                    title="zoom", callback=callbackFunction)
callbackFunction.args["zoom"] = zoom_slider

layout = row(
    script8_plot,
    widgetbox(zoom_slider),
)
# <<<<<<<<<<<

show(layout)

This is getting more tricky. For more information on using something like a Slider (i.e. a widget), see http://bokeh.pydata.org/en/latest/docs/user_guide/interaction/widgets.html.
To get this working, we need several things:
* import the relevant libraries: `from bokeh.models import HoverTool, CustomJS, Slider` and `from bokeh.layouts import row, widgetbox`
* a `Slider` object (here called `zoom_slider`), which also specifies which callback to run. This means that every time the value of that slider is changed, that particular function is called.
* We need to add this `Slider` object as an argument to that callback function
* the `callback` function itself: this is a javascript function that gets called everytime the value of the slider changes. In essence, the `adjustedRadius` in the ColumnDataSource is adjusted by multiplying it with the zooming factor. On line 49, the `adjustedRadius` was used to set the size of the dots.

A more useful interaction would be to use this slider for **filtering flights**. Let's draw a plot with all the airports drawn, but the ones that have flights with a certain distance are drawn bigger. When the slider is on the left, only short distance flights are shown; when it is on the right only long distance ones.

In [0]:
from bokeh.models import HoverTool, CustomJS, Slider
from bokeh.layouts import row, widgetbox
from bokeh.plotting import *
import csv

def normalize(input, domain_start, domain_stop, range_start, range_stop):
    return (range_stop - range_start)*((input-domain_start)/(domain_stop-domain_start))+range_start

reset_output()
output_notebook()

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs = []
ys = []
rs = []
from_airports = []
distances = []
colours = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    if from_airport not in from_airports:
        xs.append(normalize(float(from_long),-180,180,0,600))
        ys.append(normalize(float(from_lat),-180,180,0,300))
        rs.append(normalize(float(distance),1,15406,3,15))
        from_airports.append(from_airport)
        distances.append(int(distance))
        if from_country == to_country:
            colours.append("red")
        else:
            colours.append("blue")

source = ColumnDataSource(
    data = dict(
        x = xs,
        y = ys,
        radius = rs,
        adjustedRadius = rs,
        from_airport = from_airports,
        distance = distances,
        colour = colours
    )
)

script9_plot = figure(plot_width=600,plot_height=300, tools="hover")
script9_plot.grid.grid_line_color=None
script9_plot.circle("x","y",source=source,size="adjustedRadius",line_color=None,fill_color="colour",fill_alpha=0.3)


callback = CustomJS(args=dict(source=source), code="""
    var data = source.data;
    var s = select.value;
    distance = data['distance'];
    radius = data['radius'];
    adjustedRadius = data['adjustedRadius']
    for ( i = 0; i < distance.length; i++ ) {
        if ( s - 200 < distance[i] && distance[i] < s + 200 ) {
            adjustedRadius[i] = 5;
        } else {
            adjustedRadius[i] = 1;
        }
    }
    source.change.emit();
""")

select_slider = Slider(start=0, end=15406, value=1000, step=1,
                    title="Select", callback=callback)
callback.args["select"] = select_slider

layout = row(
    script9_plot,
    widgetbox(select_slider),
)

hover = script9_plot.select(dict(type=HoverTool))
hover.tooltips = [
    ("index, from, distance", "$index, @from_airport, @distance"),
]

show(layout)

## Brushing and linking

A very powerful technique in interactive data visualization is to link two different plots: *linking* and *brushing*. By selecting a set of datapoints in one plot, the corresponding points in the other plot are highlighted. Here, we'll build two plots: one of all airports where flights leave (on the left) and one of all airports where flights arrive (on the right). By selecting departure airports in the left plot, the corresponding arrival airports in the right plot will be hightlighted.

In [0]:
from bokeh.plotting import *
output_notebook()
import csv

f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs_from = []
ys_from = []
xs_to = []
ys_to = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    xs_from.append(normalize(float(from_long),-180,180,0,500))
    ys_from.append(normalize(float(from_lat),-180,180,0,300))
    xs_to.append(normalize(float(to_long),-180,180,0,500))
    ys_to.append(normalize(float(to_lat),-180,180,0,300))

source = ColumnDataSource({
    "xs_from": xs_from,
    "ys_from": ys_from,
    "xs_to": xs_to,
    "ys_to": ys_to
})

TOOLS = "pan,box_zoom,box_select,lasso_select,reset"
plot1 = figure(plot_width=500,plot_height=300,tools=TOOLS,title="Departure airports")
plot1.grid.grid_line_color=None
plot1.circle("xs_from","ys_from",source=source,line_color=None,fill_color="blue",fill_alpha=0.05)
plot2 = figure(plot_width=500,plot_height=300,tools=TOOLS,title="Arrival airports")
plot2.grid.grid_line_color=None
plot2.circle("xs_to","ys_to",source=source,line_color=None,fill_color="blue",fill_alpha=0.05)
show(gridplot([[plot1,plot2]]))

This plot could be improved a bit because both selected and unselected airports are displayed in blue (albeit light and dark blue). 

In [0]:
from bokeh.plotting import *
output_notebook()
import csv
from bokeh.models.glyphs import Circle
f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs_from = []
ys_from = []
xs_to = []
ys_to = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    xs_from.append(normalize(float(from_long),-180,180,0,500))
    ys_from.append(normalize(float(from_lat),-180,180,0,300))
    xs_to.append(normalize(float(to_long),-180,180,0,500))
    ys_to.append(normalize(float(to_lat),-180,180,0,300))

source = ColumnDataSource({
    "xs_from": xs_from,
    "ys_from": ys_from,
    "xs_to": xs_to,
    "ys_to": ys_to
})

TOOLS = "pan,box_zoom,box_select,lasso_select,reset"
plot1 = figure(plot_width=500,plot_height=300,tools=TOOLS,title="Departure airports")
plot1.grid.grid_line_color=None
b = plot1.circle("xs_from","ys_from",source=source,line_color=None,fill_color="blue",fill_alpha=0.05)
b.nonselection_glyph = Circle(fill_color='black', fill_alpha=0.01, line_color=None)     # <======
b.selection_glyph = Circle(fill_color='red', fill_alpha=0.5, line_color=None)           # <======
plot2 = figure(plot_width=500,plot_height=300,tools=TOOLS,title="Arrival airports")
plot2.grid.grid_line_color=None
c = plot2.circle("xs_to","ys_to",source=source,line_color=None,fill_color="blue",fill_alpha=0.05)
c.nonselection_glyph = Circle(fill_color='black', fill_alpha=0.01, line_color=None)     # <======
c.selection_glyph = Circle(fill_color='red', fill_alpha=0.5, line_color=None)           # <======
show(gridplot([[plot1,plot2]]))

## Speeding things up
As you will have noticed, interactivity is really slow above. That is because all points are redrawn with every interaction. Luckily, there is a very easy way to speed things up by using webgl to render the images. The only change to make is to add `output_backend="webgl"` to the plot definitions.

In [0]:
from bokeh.plotting import *
output_notebook()
import csv
from bokeh.models.glyphs import Circle
f = open('flights.csv')
reader = csv.reader(f)
next(reader)

xs_from = []
ys_from = []
xs_to = []
ys_to = []

for line in reader:
    [from_airport, from_city, from_country, from_long, from_lat,
     to_airport, to_city, to_country, to_long, to_lat,
     airline, airline_country, distance] = line
    xs_from.append(normalize(float(from_long),-180,180,0,500))
    ys_from.append(normalize(float(from_lat),-180,180,0,300))
    xs_to.append(normalize(float(to_long),-180,180,0,500))
    ys_to.append(normalize(float(to_lat),-180,180,0,300))

source = ColumnDataSource({
    "xs_from": xs_from,
    "ys_from": ys_from,
    "xs_to": xs_to,
    "ys_to": ys_to
})

TOOLS = "pan,box_zoom,box_select,lasso_select,reset"
plot1 = figure(plot_width=500,plot_height=300,tools=TOOLS,title="Departure airports",output_backend="webgl")   # <=====
plot1.grid.grid_line_color=None
b = plot1.circle("xs_from","ys_from",source=source,line_color=None,fill_color="blue",fill_alpha=0.05)
b.nonselection_glyph = Circle(fill_color='black', fill_alpha=0.01, line_color=None)
b.selection_glyph = Circle(fill_color='red', fill_alpha=0.5, line_color=None)
plot2 = figure(plot_width=500,plot_height=300,tools=TOOLS,title="Arrival airports", output_backend="webgl")
plot2.grid.grid_line_color=None
c = plot2.circle("xs_to","ys_to",source=source,line_color=None,fill_color="blue",fill_alpha=0.05)
c.nonselection_glyph = Circle(fill_color='black', fill_alpha=0.01, line_color=None)
c.selection_glyph = Circle(fill_color='red', fill_alpha=0.5, line_color=None)
show(gridplot([[plot1,plot2]]))