# Bokeh

This notebook provides an introduction to the basic features of _Bokeh_, which is a _Python_ plotting package that can be used as an alternative to _Matplotlib_.

Just like _Matplotlib_, _Bokeh_ can be used to produce graphical renderings of your data. However, _Bokeh_ does have certain advantages over _Matplotlib_,particularly if you want to interact with the plot. 

_Bokeh_ makes it straightforward to make interactive plots for subsets of data. For example, you might want to plot and examine light curve data from separate days on subsequent iteration of a loop. 

Trying to achieve this using _Matplotlib_ in Jupyter notebooks can be a little frustrating. Within a notebook _Matplotlib_ in Jupyter will either wait until the loop exits and then render all the plots in one go, or overplot the output of each loop on the same axes.

You may find the example notebooks hosted on [this GitHub repository](https://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/index.ipynb) useful

Before you can start using _Bokeh_ you'll need to import some of the modules and functions that the package provides. For example, in this notebook, we'll be using the **`figure`**, **`output_notebook`** and **`show`** functions:

```python
from bokeh.plotting import figure, output_notebook, show
```

In this notebook, we will use data stored in _Pandas_ `DataFrames` (see the `UsingPandas` notebook) as inputs for the _Bokeh_ functions, but you could just as easily use _NumPy_ `arrays` or plain _Python_ `list`s instead.

As well as the functions we imported from the _Bokeh_ package, we'll need to import **`display`** and **`clear_output`** from the **`IPython.display`** module (one of the core modules that runs the Jupyter notebook in the background). 

```python
from IPython.display import display, clear_output
```

These two funuctions give us the ability to render and clear plots directly within the notebook interface just gives us the ability to clear plots from the Jupyter notebook. This is really  useful if you want to repeatedly refresh a plot on subsequent iterations of a loop.

Finally, we'll import and configure the **`InteractiveShell`** class from the **`IPython.core.interactiveshell`** module:

```python
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
```

This step is not essential, but it will allow us to render a 'pretty' _Pandas_ `DataFrame` using just the name of a `DataFrame` or by calling one of its methods on **any line** in a code cell. This means that just typing **`df.head()`** **anywhere** within a code cell will produce a nicely tabulated display of the first 5 lines of the `DataFrame`. Normally, in Jupyter notebooks this would only work if **`df.head()`** was the **last line** of the cell. Without using these two lines, you could just use **`print(df.head())`** to see a textual display.

In [1]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
#from bokeh.plotting import reset_output

from IPython.display import display, clear_output

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## 1. Loading the data for plotting 
By now, you're probably pretty confident when it comes to reading data from CSV files. In this notebook we'll use some spectral data from the ARROW radio telescope, which you can find in the `Archive_Spectra.csv` file. The file contains spectra that correspond with observations along the Galactic plane at Galactic longitudes between 0 to 90 degrees, in 10 degree intervals. 

The first column in `Archive_Spectra.csv` contains the radial velocities measured at each galactic longitude (already corrected to the LSR), the subsequent columns contain the intensity data for each observed longitude.

After reading the CSV file into a `DataFrame`, We use the **`dropna()`** method to clean the data by removing any rows that contain `NaN` (not-a-number) values. Then we diplay the first and last five rows of the cleaned data.

In [2]:
df = pd.read_csv('Archive_Spectra.csv', header=1, skip_blank_lines=True)
# Any lines that are empty of values - just a row of 'commas' in the csv produces a row 
# of NaN. WE get rid of these by using the method dropna()
df=df.dropna()
df.head()
df.tail()

Unnamed: 0,km per sec,l = 0 degrees,l = 10 degrees,l = 20 degrees,l = 30 degrees,l = 40 degrees,l = 50 degrees,l = 60 degrees,l = 70 degrees,l = 80 degrees,l = 90 degrees
1,-396.74,0.38,-0.32,0.48,0.16,-0.18,0.48,0.21,0.1,0.28,0.03
2,-395.71,0.21,-0.11,-0.41,0.35,-0.11,0.24,0.16,-0.2,0.46,-0.3
3,-394.68,0.29,0.22,-0.14,-0.14,-0.07,0.36,0.34,-0.18,-0.12,0.08
4,-393.65,0.3,-0.11,0.34,0.35,0.12,0.25,-0.36,0.44,-0.26,-0.19
5,-392.62,-0.45,0.57,0.35,-0.36,0.39,-0.14,0.24,-0.31,-0.06,0.44


Unnamed: 0,km per sec,l = 0 degrees,l = 10 degrees,l = 20 degrees,l = 30 degrees,l = 40 degrees,l = 50 degrees,l = 60 degrees,l = 70 degrees,l = 80 degrees,l = 90 degrees
770,395.71,0.27,-0.05,-0.33,-0.47,0.08,0.26,-0.28,-0.22,0.3,-0.22
771,396.74,-0.42,-0.53,-0.13,-0.19,0.18,-0.1,-0.36,0.03,0.32,-0.34
772,397.77,0.24,0.0,-0.42,-0.17,0.13,0.19,0.16,0.01,0.11,-0.32
773,398.8,0.37,-0.05,-0.18,-0.09,-0.02,-0.22,-0.01,-0.41,-0.05,0.0
774,399.83,-0.39,-0.22,0.08,-0.07,-0.03,-0.39,-0.07,-0.39,0.27,0.21


## 2.  Plotting the data

Now that we have the data, producing a plot in _Bokeh_ is very straightforward. There a three simple steps.

1. Set up a figure. In contrast with _Matplotlib_ we create the figure and assign a title and some axis labels in the same call to **`figure()`**.
2. Use the `figure` object to plot our data. We use a specific method of the `figure` object (in this case **`line()`**) to tell _Bokeh_ how it should render the data and we pass the data themselves as arguments to that method. In _Bokeh_ the generic name for some data plotted in a certain way is "_Glyph_".
3. Finally, we call a separate **`show()`** function to render the figure we just created and the line plot Glyph within it.

> **Note:** we need to include the line **`output_notebook()`** at the start to ensure it works OK in our notebook.

> **Note:** We select the columns of data to plot from the _Pandas_ `DataFrame` by using their heading labels. These are shown in bold text in the display above.


In [10]:
output_notebook()
p1 = figure(title = "Spectral data from Galactic longitude 30 degrees", 
          x_axis_label='Velocity (kms^-1)', 
          y_axis_label='Intensity')
p1.line(df['km per sec'],df['l = 30 degrees'])
show(p1)


In the top right corner of the plot, there are a series of icons.
![Bokeh Tools](../images/bokeh-glyph-1.png)

These allow you to interact with the plot within the notebook interface. Two of them have light blue lines on their left-hand sides. This indicates that the actions represented by these icons have been selected (by a left mouse click). Try clicking and dragging or scrolling the mouse wheel when the pointer is over the plot canvas. In this case, the highlighted icons correspond with the the _pan_ and _mouse wheel zoom_ actions.

This default set of icons can be customised to augment or restrict the interactive features. We'll show how to do this this later in the notebook.

### 2.1 Error Bars

_Bokeh_ also has a few slight disadvantages in comparison with _Matplotlib_. For example, adding error bars to your plots is quite a convoluted process. To do so, you would need to import two classes `bokeh.models` module. The **`ColumnDataSource`** class is used to group all of your data and errors into a single object. You can then pass this object as an argument to the **`Whisker`** class. Finally, you can use the **`add_layout`** method of a figure object to render the **`Whisker`** object as a Glyph.  Our data file doesn't contain any columns that we can use for error bars, but the following snippet gives you an idea of the required code. If you want to use _Bokeh_ to plot data with error bars then please consult the online documentation for more details and a more complete explanation.

```python
p = figure()

# x_vals, y_vals, y_error_vals can be NumPy arrays, pandas.Series objects or lists
src = ColumnDataSource(data=dict(
    y = x_vals,
    lower = y_vals - y_error_vals,
    upper = y_vals + y_error_vals))

w = Whisker(base='y', 
          lower='lower',
          upper='upper', 
          line_color='black', 
          dimension='height', 
          source=src)

p.add_layout(w)
show(p)
```

### 2.2 Plotting multiple datasets on the same axes - and some interactivity

Plotting multiple datasets on the same figure in _Bokeh_ is simple. Just issue multiple calls to glyph-generating methods (e.g. **`line()`**) of the figure object, passing different data each time, before issuing the final call to **`show()`**. You can modify the way that each dataset is rendered using optional arguments to the glyph-generating method.

In the following example we've also added a legend and used it to demonstrate some of the interactive features of _Bokeh_. If you include the line:

```python
p1.legend.click_policy="hide"
```

then can click on an item in the legend and hide or show the corresponding data!

In [4]:
p1 = figure(title = "Spectral data from Galactic observations", 
          x_axis_label='Velocity (kms^-1)', 
          y_axis_label='Intensity')
p1.line(df['km per sec'],df['l = 30 degrees'], legend='l=30')
p1.line(df['km per sec'],df['l = 90 degrees'], color='red', line_dash="dashed", legend='l=90')
p1.legend.location = "top_left"
p1.legend.click_policy="hide"
show(p1)



## 3. Adding tools to the icon bar

As we mentioned above, _Bokeh_ provides a lot of interactive tools addition to the pan and zoom and you can add extra icons to your plots to access their functionality. Here we'll look at just one of them - the **`HoverTool`**, which allows you to inspect the data values for points under the mouse cursor.

To use the **`HoverTool`** class, you must first import it from the **`bokeh.models.tools`** module:

```python
from bokeh.models.tools import HoverTool
```

Then, just use the **`add_tools()`** method of any figure object to add the extra icon. The detailed functionality of the **`HoverTool`** can be specifed and augmented using optional arguments to the class constructor. In this example we pass **`mode='vline'`** to draw a moving vertical cursor at the mouse pointer position. This cursor is useful if you want to more easily read off the values on the horizontal axis - in this case the _velocity offset_. Note a new icon has been added to the panel and automatically enabled.


In [5]:
from bokeh.models.tools import HoverTool

p1 = figure(title = "Spectral data from Galactic longitude at 90 degrees", 
          x_axis_label='Velocity (kms^-1)', 
          y_axis_label='Intensity')
p1.line(df['km per sec'],df['l = 90 degrees'])
p1.add_tools(HoverTool(mode='vline'))
show(p1)

## 4. Rendering sub-plots

Like _Matplotlib_, _Bokeh_ allows you to render multiple subplots on the same canvas. To do so, we need to import the **`gridplot`** class from **`bokeh.layouts`** module. Once you have done that, there are three general steps involved in rendering the subplots.

1. Create multiple figures, which will be rendered as subplots in the final output.
2. Create a grid and add the figures, specifying their placement within it.
3. Instead of showing each of the figures individually, show the entire grid at once.

In this example, we create four figures and configure them independently. We create a **`grid`** object and add the figures to it as elements of a 2x2, two-dimensional array. The layout of the subplots in the final output mimics the position of the **`figure`** objects in the array.

In [11]:
from bokeh.layouts import gridplot
from bokeh.models import Range1d

# Let's just set up an 'x' value once here.
xvals = df['km per sec']

s1 = figure(plot_width=250, plot_height=175, title='S1: l = 20',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s1.line(xvals,df['l = 20 degrees'], color='red')
s2 = figure(plot_width=250, plot_height=175, title='S2: l = 30',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s2.line(xvals,df['l = 30 degrees'], color='green')
s2.y_range = Range1d(0,100)  # You can use this to mach the scales
s3 = figure(plot_width=250, plot_height=175, title='S3: l = 50',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s3.line(xvals,df['l = 50 degrees'], color='blue')
s4 = figure(plot_width=250, plot_height=175, title='S4: l = 90',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s4.line(xvals,df['l = 90 degrees'], color='purple')

grid = gridplot([[s1,s2],[s3,s4]])
show(grid)



### EXCERISE 4.1

Choose 3 velocity columns, corresponding to different galactic longitudes, and render the data as subplots on a 3x1 (3 rows, 1 column) grid.


In [12]:
# Write your code here...

In [7]:
from bokeh.layouts import gridplot
from bokeh.models import Range1d

xvals = df['km per sec']

s1 = figure(plot_width=250, plot_height=175, title='l = 20',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s1.line(xvals,df['l = 20 degrees'], color='red')
s2 = figure(plot_width=250, plot_height=175, title='l = 30',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s2.line(xvals,df['l = 30 degrees'], color='green')
s2.y_range = Range1d(0,100)  # You can use this to mach the scales
s3 = figure(plot_width=250, plot_height=175, title='l = 50',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s3.line(xvals,df['l = 50 degrees'], color='blue')

grid = gridplot([[s1],[s2],[s3]])
show(grid)


### EXCERCISE 4.2

This time pick three velocity columns and generate a 1x3 (1 row, 3 columns) grid.

In [12]:
# Write your code here...

In [8]:
from bokeh.layouts import gridplot
from bokeh.models import Range1d

xvals = df['km per sec']

s1 = figure(plot_width=250, plot_height=175, title='l = 20',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s1.line(xvals,df['l = 20 degrees'], color='red')
s2 = figure(plot_width=250, plot_height=175, title='l = 30',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s2.line(xvals,df['l = 30 degrees'], color='green')
s2.y_range = Range1d(0,100)  # You can use this to mach the scales
s3 = figure(plot_width=250, plot_height=175, title='l = 50',
            x_axis_label='Velocity (kms^-1)', 
            y_axis_label='Intensity')
s3.line(xvals,df['l = 50 degrees'], color='blue')

grid = gridplot([[s1,s2,s3]])
show(grid)


## 5. Displaying multiple data sets, one buy one, in a loop

In this section we finally show how to achieve the functionality we promised in the introductory cells.
We're going to cycle through **all** of the velocity data columns and generate an interactive plot for each one. After generating each plot, we'll wait for user input before proceeding to the next iteration of the loop. This allows the user to interact with the data that are currently plotted before moving on.

The example in the next cell shows how this behaviour can be achieved using _Bokeh_. Using what you've learned so far in this notebook and the rest of the course, see if you can work out what the code is doing. Feel free to experiment using different plotting options and interactivity tools.

In [13]:
# How many spectra have we got? It's one less than the total number of columns.
spec_no = len(df.columns)-1

#Cycle through this number of columns columns starting at column 1 (remember index 
#starts at 0 - which is the velocity column)
for idx in range(1,spec_no+1):
    # Get the name of the column
    colname=df.columns[idx]
    p1 = figure(title = "Spectral data from spectum column"+colname, 
          x_axis_label='Velocity (kms^-1)', 
          y_axis_label='Intensity')
    p1.line(df['km per sec'],df[colname])
    p1.add_tools(HoverTool(mode='vline'))
    show(p1)
    # Wait between plots - we'll ask the user whether they want to continue
    showNext = input('Show another plot? (y/n)')
    # For simplicity abort if anything other than "y" is entered.
    if showNext != 'y': 
        break
    # Clear the display before starting again - otherwise we get multiple plots
    clear_output(wait=True)


Show another plot? (y/n) n
