# Models and Primitives

# Overview

Although typically referred to by a single name, Bokeh comprises two separate libraries. 

The first component is a JavaScript library, BokehJS, that runs in the browser. This library is responsible for all of the rendering and user interaction. Its input is a collection of declarative JSON objects that comprise a “scenegraph”. The objects in this scenegraph describe everything that BokehJS should handle: what plots and widgets are present and in what arrangement, what tools and renderers and axes the plots will have, etc. These JSON objects are converted into JavaScript objects in the browser.

The second component is a library in Python (or other languages) that can generate the JSON described above. In the Python Bokeh library, this is accomplished at the lowest level by exposing a set of `Model` subclasses that exactly mirror the set of BokehJS Models that are use in the browser. Most of the models are very simple, usually consisting of a few property attributes and no methods. Model attributes can either be configured when the model is created, or later by setting attribute values on the model object:

#### properties can be configured when a model object is initialized
```python
glyph = Rect(x="x", y="y2", w=10, h=20, line_color=None)
```

#### properties can be configured by assigning values to attributes on exitsting models
```python
glyph.fill_alpha = 0.5
glyph.fill_color = "navy"
```

These methods of configuration work in general for all Bokeh models. 

Bokeh models are eventually collected into a Bokeh `Document` for serialization between BokehJS and the Bokeh Python bindings. As an example, the following image shows an example document for simple plot:



Such a document could be created at a high level using the `bokeh.plotting` API, which automatically assembles any models such as axes and grids and toolbars in a reasonable way. However it is also possible to assemble all the components "by hand" using the low-level `bokeh.models` API. Using the `bokeh.models` interface provides complete control over how Bokeh plots and Bokeh widgets are put together and configured. However, it provides no help with assembling the models in meaningful or correct ways. It is entirely up to developers to build the scenegraph “by hand”. 


# Example Walkthrough

Let's try to reproduce this NYTimes interactive chart [Usain Bolt vs. 116 years of Olympic sprinters](http://www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one-race-every-medalist-ever.html) using the `bokeh.models` interface.

The first thing we need is to get the data. The data for this chart is located in the ``bokeh.sampledata`` module as a Pandas DataFrame. You can see the first ten rows below:

In [1]:
from bokeh.sampledata.sprint import sprint
sprint[:10]

Unnamed: 0,Name,Country,Medal,Time,Year
0,Usain Bolt,JAM,GOLD,9.63,2012
1,Yohan Blake,JAM,SILVER,9.75,2012
2,Justin Gatlin,USA,BRONZE,9.79,2012
3,Usain Bolt,JAM,GOLD,9.69,2008
4,Richard Thompson,TRI,SILVER,9.89,2008
5,Walter Dix,USA,BRONZE,9.91,2008
6,Justin Gatlin,USA,GOLD,9.85,2004
7,Francis Obikwelu,POR,SILVER,9.86,2004
8,Maurice Greene,USA,BRONZE,9.87,2004
9,Maurice Greene,USA,GOLD,9.87,2000


Next we import some of the Bokeh models that need to be assembled to make a plot. At a minimum, we need to start with ``Plot``, the glyphs (``Circle`` and ``Text``) we want to display, as well as ``ColumnDataSource`` to hold the data and range obejcts to set the plot bounds. 

In [2]:
from bokeh.io import output_notebook, show
from bokeh.models.glyphs import Circle, Text
from bokeh.models import ColumnDataSource, Range1d, DataRange1d, Plot

In [3]:
output_notebook()

## Setting up Data

Next we need set up all the columns we want in our column data source. Here we add a few extra columns like `MetersBack` and `SelectedName` that we will use for a `HoverTool` later.

In [4]:
abbrev_to_country = {
    "USA": "United States",
    "GBR": "Britain",
    "JAM": "Jamaica",
    "CAN": "Canada",
    "TRI": "Trinidad and Tobago",
    "AUS": "Australia",
    "GER": "Germany",
    "CUB": "Cuba",
    "NAM": "Namibia",
    "URS": "Soviet Union",
    "BAR": "Barbados",
    "BUL": "Bulgaria",
    "HUN": "Hungary",
    "NED": "Netherlands",
    "NZL": "New Zealand",
    "PAN": "Panama",
    "POR": "Portugal",
    "RSA": "South Africa",
    "EUA": "United Team of Germany",
}

fill_color = { "gold": "#efcf6d", "silver": "#cccccc", "bronze": "#c59e8a" }
line_color = { "gold": "#c8a850", "silver": "#b0b0b1", "bronze": "#98715d" }

def selected_name(name, medal, year):
    return name if medal == "gold" and year in [1988, 1968, 1936, 1896] else ""

t0 = sprint.Time[0]

sprint["Abbrev"]       = sprint.Country
sprint["Country"]      = sprint.Abbrev.map(lambda abbr: abbrev_to_country[abbr])
sprint["Medal"]        = sprint.Medal.map(lambda medal: medal.lower())
sprint["Speed"]        = 100.0/sprint.Time
sprint["MetersBack"]   = 100.0*(1.0 - t0/sprint.Time)
sprint["MedalFill"]    = sprint.Medal.map(lambda medal: fill_color[medal])
sprint["MedalLine"]    = sprint.Medal.map(lambda medal: line_color[medal])
sprint["SelectedName"] = sprint[["Name", "Medal", "Year"]].apply(tuple, axis=1).map(lambda args: selected_name(*args))

source = ColumnDataSource(sprint)

## Building in stages

Let's build up our plot in stages, stopping to check the output along the way to see how things look.

As we go through, note the three methods that `Plot`, `Chart`, and `Figure` all have:

* `p.add_glyph`
* `p.add_tools`
* `p.add_layout`

These are actually small convenience methods that help us add models to `Plot` objects in the correct way.

### Basic Plot with Just Glyphs

First we create just the `Plot` with a title and some basic styling applied, as well add a few `Circle` glyphs for the actual race data. To manually configure glyphs, we first create a glyph object (e.g., `Text` or `Circle`) that is configured with the visual properties we want as well as the data columns to use for coordinates, etc. Then we call `plot.add_glyph` with the glyph, and the data source that the glyph should use. 

In [5]:
plot_options = dict(plot_width=800, plot_height=480, toolbar_location=None, outline_line_color=None)

In [6]:
medal_glyph = Circle(x="MetersBack", y="Year", size=10, fill_color="MedalFill", 
                     line_color="MedalLine", fill_alpha=0.5)

athlete_glyph = Text(x="MetersBack", y="Year", x_offset=10, text="SelectedName",
                     text_align="left", text_baseline="middle", text_font_size="9pt")

no_olympics_glyph = Text(x=7.5, y=1942, text=["No Olympics in 1940 or 1944"],
                         text_align="center", text_baseline="middle",
                         text_font_size="9pt", text_font_style="italic", text_color="silver")

In [7]:
xdr = Range1d(start=sprint.MetersBack.max()+2, end=0)  # +2 is for padding
ydr = DataRange1d(range_padding=0.05)  

plot = Plot(x_range=xdr, y_range=ydr, **plot_options)
plot.title.text = "Usain Bolt vs. 116 years of Olympic sprinters"
plot.add_glyph(source, medal_glyph)
plot.add_glyph(source, athlete_glyph)
plot.add_glyph(no_olympics_glyph)

In [8]:
show(plot)

### Adding Axes and Grids

Next we add in models for the `Axis` and `Grids` that we would like to see. Since we want to exert more control over the appearance, we can choose specific tickers for the axes models to use (`SingleIntervalTicker` in this case). We add these guides to the plot using the `plot.add_layout` method. 

In [9]:
from bokeh.models import Grid, LinearAxis, SingleIntervalTicker

In [10]:
xdr = Range1d(start=sprint.MetersBack.max()+2, end=0)  # +2 is for padding
ydr = DataRange1d(range_padding=0.05)  

plot = Plot(x_range=xdr, y_range=ydr, **plot_options)
plot.title.text = "Usain Bolt vs. 116 years of Olympic sprinters"
plot.add_glyph(source, medal_glyph)
plot.add_glyph(source, athlete_glyph)
plot.add_glyph(no_olympics_glyph)

In [11]:
xticker = SingleIntervalTicker(interval=5, num_minor_ticks=0)
xaxis = LinearAxis(ticker=xticker, axis_line_color=None, major_tick_line_color=None,
                   axis_label="Meters behind 2012 Bolt", axis_label_text_font_size="10pt", 
                   axis_label_text_font_style="bold")
plot.add_layout(xaxis, "below")

xgrid = Grid(dimension=0, ticker=xaxis.ticker, grid_line_dash="dashed")
plot.add_layout(xgrid)

yticker = SingleIntervalTicker(interval=12, num_minor_ticks=0)
yaxis = LinearAxis(ticker=yticker, major_tick_in=-5, major_tick_out=10)
plot.add_layout(yaxis, "right")

In [12]:
show(plot)

### Adding a Hover Tool

Finally we add a hover tool to display those extra columns that we put into our column data source. We use the template syntax for the tooltips, to have more control over the appearance. Tools can be added using the `plot.add_tools` method.

In [13]:
from bokeh.models import HoverTool

In [14]:
tooltips = """
<div>
    <span style="font-size: 15px;">@Name</span>&nbsp;
    <span style="font-size: 10px; color: #666;">(@Abbrev)</span>
</div>
<div>
    <span style="font-size: 17px; font-weight: bold;">@Time{0.00}</span>&nbsp;
    <span style="font-size: 10px; color: #666;">@Year</span>
</div>
<div style="font-size: 11px; color: #666;">@{MetersBack}{0.00} meters behind</div>
"""

In [15]:
xdr = Range1d(start=sprint.MetersBack.max()+2, end=0)  # +2 is for padding
ydr = DataRange1d(range_padding=0.05)  

plot = Plot(x_range=xdr, y_range=ydr, **plot_options)
plot.title.text = "Usain Bolt vs. 116 years of Olympic sprinters"
medal = plot.add_glyph(source, medal_glyph)  # we need this renderer to configure the hover tool
plot.add_glyph(source, athlete_glyph)
plot.add_glyph(no_olympics_glyph)

xticker = SingleIntervalTicker(interval=5, num_minor_ticks=0)
xaxis = LinearAxis(ticker=xticker, axis_line_color=None, major_tick_line_color=None,
                   axis_label="Meters behind 2012 Bolt", axis_label_text_font_size="10pt", 
                   axis_label_text_font_style="bold")
plot.add_layout(xaxis, "below")

xgrid = Grid(dimension=0, ticker=xaxis.ticker, grid_line_dash="dashed")
plot.add_layout(xgrid)

yticker = SingleIntervalTicker(interval=12, num_minor_ticks=0)
yaxis = LinearAxis(ticker=yticker, major_tick_in=-5, major_tick_out=10)
plot.add_layout(yaxis, "right")

In [16]:
hover = HoverTool(tooltips=tooltips, renderers=[medal])
plot.add_tools(hover)

In [17]:
show(plot)

# Exercises

# Custom User Models

It is possible to extend the set of built-in Bokeh models with your own custom user models. The capability opens some valuable use-cases:
* customizing existing Bokeh model behaviour
* wrapping and connecting other JS libraries to Bokeh

With this capability, advanced users can try out new features or techniques easily, without having to set up a full Bokeh development environment. 

The section gives a basi outline of a custom model starts with a JavaScript implementation, which subclasses an existing BokehJS model.

### Implement the JavaScript Model

In [18]:
CODE ="""
import {HTMLBox, HTMLBoxView} from "models/layouts/html_box"
import {Slider} from "models/widgets/slider"
import {div} from "core/dom"
import * as p from "core/properties"
export class CustomView extends HTMLBoxView {
  model: Custom
  private content_el: HTMLElement
  connect_signals(): void {
    super.connect_signals()
    this.connect(this.model.slider.change, () => this._update_text())
  }
  render(): void {
    // BokehJS Views create <div> elements by default, accessible as ``this.el``.
    // Many Bokeh views extend this default <div> with additional elements
    // (e.g. <canvas>), and instead do things like paint on the HTML canvas.
    // In this case though, we change the contents of the <div>, based on the
    // current slider value.
    super.render()
    this.content_el = div({style: {
      textAlign: "center",
      fontSize: "1.2em",
      padding: "2px",
      color: "#b88d8e",
      backgroundColor: "#2a3153",
    }})
    this.el.appendChild(this.content_el)
    this._update_text()
  }
  private _update_text(): void {
    this.content_el.textContent = `${this.model.text}: ${this.model.slider.value}`
  }
}
export namespace Custom {
  export type Attrs = p.AttrsOf<Props>
  export type Props = HTMLBox.Props & {
    text: p.Property<string>
    slider: p.Property<Slider>
  }
}
export interface Custom extends Custom.Attrs {}
export class Custom extends HTMLBox {
  properties: Custom.Props
  constructor(attrs?: Partial<Custom.Attrs>) {
    super(attrs)
  }
  static initClass(): void {
    // The ``type`` class attribute should generally match exactly the name
    // of the corresponding Python class.
    this.prototype.type = "Custom"
    // If there is an associated view, this is typically boilerplate.
    this.prototype.default_view = CustomView
    // The define block adds corresponding "properties" to the JS model. These
    // should normally line up 1-1 with the Python model class. Most property
    // types have counterparts, e.g. bokeh.core.properties.String will be
    // ``p.String`` in the JS implementation. Any time the JS type system is not
    // yet as complete, you can use ``p.Any`` as a "wildcard" property type.
    this.define<Custom.Props>({
      text:   [ p.String ],
      slider: [ p.Any    ],
    })
    this.override({
      margin: 5,
    })
  }
}
Custom.initClass()
"""

### Define the Python Model

This JavaScript implementation is then attached to a corresponding Python Bokeh model:

In [19]:
from bokeh.core.properties import String, Instance
from bokeh.util.compiler import TypeScript
from bokeh.models import HTMLBox, Slider

class Custom(HTMLBox):

    __implementation__ = TypeScript(CODE)

    text = String(default="Custom text")

    slider = Instance(Slider)

### Use the Python Model

Then the new model can be used seamlessly in the same way as any built-in Bokeh model:

In [20]:
from bokeh.io import show, output_file
from bokeh.layouts import column
from bokeh.models import Slider

slider = Slider(start=0, end=10, step=0.1, value=0, title="value")

custom = Custom(text="Special Slider Display", slider=slider)

layout = column(slider, custom)

# Necessary to explicitly reload BokehJS to pick up new extension code
output_notebook()

RuntimeError: node.js v14.0.0 or higher is needed to allow compilation of custom models ("conda install nodejs" or follow https://nodejs.org/en/download/)

In [None]:
show(layout)

# Visualizing Big Data with Datashader

Bokeh gets its power by mirroring data from Python (or R) into the web browser.  This approach provides full flexibility and interactivity, but because of the way web browsers are designed and built, there are limitations to how much data can be shown in this way.  Most web browsers can handle up to about 100,000 or 200,000 datapoints in a Bokeh plot before they will slow down or have memory issues.  What do you do when you have larger datasets than that?

The [`datashader`](https://github.com/holoviz/datashader) library is designed to complement Bokeh by providing visualizations for very large datasets, focusing on faithfully revealing the overall distribution, not just individual data points.  datashader installs separately from bokeh, e.g. using `conda install datashader`.

<img src="assets/datashader_examples.png">

## When *not* to use datashader

* Plotting less than 1e5 or 1e6 data points
* When *every* datapoint matters; standard Bokeh will render all of them
* For full interactivity (hover tools) with *every* datapoint

## When *to* use datashader

* Actual *big data*; when Bokeh/Matplotlib have trouble
* When the *distribution* matters more than individual points
* When you find yourself sampling or binning to better understand the *distribution*


# How does datashader work?

* Tools like Bokeh map Data directly into an HTML/JavaScript Plot
* datashader renders Data into a screen-sized Aggregate array, from which an Image can be constructed then embedded into a Bokeh Plot 
* Only the fixed-sized Image needs to be sent to the browser, allowing millions or billions of datapoints to be used
* Every step automatically adjusts to the data, but can be customized

# Visualizations supported by datashader

Datashader currently supports:

* Scatterplots/heatmaps
* Time series
* Connected points (trajectories)
* Rasters

In each case, the output is easily embedded into Bokeh plots, with interactive resampling on pan and zoom, in notebooks or apps. Legends/hover information can be generated from the aggregate arrays, helping provide interactivity.

# Faithfully visualizing big data

Once data is large enough that individual points are not easily discerned, it is crucial that the visualization be constructed in a principled way, faithfully revealing the underlying distribution for your visual system to process.  For instance, all of these plots show the same data -- is any of them the real distribution?

In [None]:
Let's find out!  The data in the above images was created by summing five normal (Gaussian) distributions as follows:

In [None]:
import pandas as pd
import numpy as np

np.random.seed(1)
num=10000

dists = {cat: pd.DataFrame(dict(x=np.random.normal(x,s,num),
                                y=np.random.normal(y,s,num),
                                val=val,cat=cat))
         for x,y,s,val,cat in 
         [(2,2,0.01,10,"d1"), (2,-2,0.1,20,"d2"), (-2,-2,0.5,30,"d3"), (-2,2,1.0,40,"d4"), (0,0,3,50,"d5")]}

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")
df.tail()

Here we have 50000 points, 10000 in each of five categories with associated numerical values.  This amount of data will be slow to plot directly with Bokeh or any similar libraries that copy the full data into the web browser.  Moreover, plotting data of this size with standard approaches has fatal flaws that make the above plots misrepresent the data:

* Plot A suffers from _overplotting_, with the distribution obscured by later-plotted datapoints.  
* Plot B uses smaller dots to avoid overplotting,but suffers from _oversaturation_, with differences in datapoint density not visible because all densities above a certain value show up as the same pure black color
* Plot C uses transparency to avoid oversaturation, but then suffers from _undersaturation_, with the 10,000 datapoints in the largest Gaussian (at 0,0) not visible at all.
* Bokeh can handle 50,000 points, but if the data were larger then these plots would suffer from *undersampling*, with the distribution not visible or misleading due to too few data points in sparse or zoomed-in regions.

Plots A-B also required time-consuming and error-prone manual tweaking of parameters, which is problematic if the data is large enough that the visualization is the main way for us to understand the data.

Using datashader, we can avoid all of these problems by rendering the data to an intermediate array that allows automatic ranging in all dimensions, revealing the true distribution with no parameter tweaking and very little code:

In [None]:
import datashader as ds
import datashader.transfer_functions as tf

%time tf.shade(ds.Canvas().points(df,'x','y'))

### This plot reveals the structure we already know was in this data, i.e. 5 separate 2D Gaussian distributions:

Let's look at each of the stages in the datashader pipeline in turn, to see how images like this are constructed and how they can be controlled and embedded into Bokeh plots.

# Projection and Aggregation

The first stages of the datashader pipeline are to choose:

* which variables you want to plot on the x and y axes,
* what size array do you want to aggregate the values into,
* what range of data values should that array cover, and
* what "reduction" function you want to use for aggregating:

In [None]:
canvas = ds.Canvas(plot_width=250, plot_height=250, x_range=(-4,4), y_range=(-4,4))
agg = canvas.points(df, 'x', 'y', agg=ds.count())
agg

**Here we have chosen to plot the 'x' and 'y' columns of the dataframe on the x and y axes (unsurprisingly!), and to aggregate them by `count`.** 

Available reduction functions that you could use for aggregating include:

**`count()`**: integer count of datapoints for each pixel (the default reduction).

**`any()`**: each pixel 1 if any datapoint maps to it; 0 otherwise.
  
**`sum(column)`**: total value of the given column for all datapoints in this pixel.

**`count_cat(column)`**: count datapoints _per category_ using the given categorical column (which must be declared using 

# Transformation

Once data is in the xarray aggregate form, it can be processed in a variety of ways that provide flexibility and power.  For instance, instead of plotting all the data, we can easily plot only those bins in the 99th percentile by count:

In [None]:
tf.shade(agg.where(agg>=np.percentile(agg,99)))

In [None]:
tf.shade(np.sin(agg))

In [None]:
Multiple aggregates can be made for the same plot range, allowing quite complicated queries to be expressed easily (e.g. `agg1.where(agg2>2*agg1)`).

# Colormapping

Once you have an aggregate array you want to visualize, you need to translate the values in that array into pixel colors.  datashader supports any Bokeh palette or list of colors:

In [None]:
tf.shade(agg, cmap=["darkred", "yellow"])

In [None]:
tf.shade(agg,cmap=["darkred", "yellow"],how='linear')

In [None]:
tf.shade(agg,cmap=["darkred", "yellow"],how='log')

In [None]:
tf.shade(agg,cmap=["darkred", "yellow"],how='eq_hist')

Notice how little of the color range is being used for the linear case, because the high end (yellow) is used only for the single pixel with the highest density, whereas a linear mapping results in all the rest having values near the low end of the colormap.  The log mapping has similar issues, though less severe because it maps a wide range of data values into a smaller range for plotting. The `eq_hist` (default) setting correctly conveys the differences in density between the various distributions, by equalizing the histogram of pixel values such that every pixel color is used equally often.


If you have a categorical aggregate (from `count_cat`), you can now colorize the results:

In [None]:
color_key = dict(d1='blue', d2='green', d3='red', d4='orange', d5='purple')
aggc = canvas.points(df, 'x', 'y', ds.count_cat('cat'))
tf.shade(aggc, color_key)

In [None]:
Here the color of each pixel is computed as a weighted average of the colors for those datapoints falling into this pixel.

In [None]:
If you have trouble seeing the dots, you can increase their size by "spreading" them in the final image:

In [None]:
tf.spread(tf.shade(aggc, color_key))

`tf.spread` uses a fixed (though configurable) spreading size, while a similar command `tf.dynspread` will spread different amounts depending on density of plots in this particular view.


# Embedding

The images produced by datashader can be used with any plotting or display program, however HoloViews provides deep integration with datashader allowing large datasets to be rendered quickly and interactively using its Bokeh integration:

In [None]:
import holoviews as hv

from holoviews.operation.datashader import datashade, dynspread

hv.extension('bokeh')

dynspread(datashade(hv.Points(df, ['x', 'y'], 'cat'), color_key=color_key, aggregator=ds.count_cat('cat')))

You can now see the axis values (not visible in the bare images).  If you enable wheel zoom, you should be able to zoom into any area of the plot, at which point a new datashader image will be rendered by HoloViews and shown in the plot.  E.g. if you zoom into the blue dot, you can see that it does contain 10,000 points, they are just so close together that they show up as only a single tiny blue spot here. Such exploration is crucial for understanding datasets with rich structure across different scales, as in most real-world data.

You can now easily overlay any other Bokeh data onto the same plot, or put map tiles in the background for geographic data in Web Mercator format (see tutorial 10).

Datashader works similarly for line plots (e.g. time series and trajectories), allowing you to use *all* the data points without needing to subsample them, even for millions or billions of points, and faithfully overlaying tens or thousands or millions of individual curves without overplotting or oversaturation problems.  It can also use raster data (such as satellite weather data), re-rasterizing it to a requested grid that can then be analyzed or colorized, and combined with other non-raster data.  For instance, if you have elevation data in raster form, and income data as individual points, you can easily make a plot of all pixels where the average income is above a certain threshold and elevation is below a certain value, a visualization that would be very difficult to express using a traditional workflow.

Hopefully it's now clear how you can use datashader to work with your large datasets.  For more information, see the [extensive notebooks](https://anaconda.org/jbednar/notebooks) online for datashader, which include examples of many different real-world datasets.

# High-Level Charting with Holoviews

Bokeh is designed to make it possible to construct rich, deeply interactive browser-based visualizations from Python source code.  It has a syntax more compact and natural than older libraries like Matplotlib, but still requires a good bit of code to do relatively common data-science tasks like complex multi-figure layouts, animations, and widgets for parameter space exploration.

To make it feasible to generate complex interactive visualizations "on the fly" in Jupyter notebooks while exploring data, we have created the [HoloViews](http://holoviews.org) library built on top of Bokeh.  

HoloViews allows you to annotate your data with a small amount of metadata that makes it instantly visualizable, usually without writing any plotting code.  HoloViews makes it practical to explore datasets and visualize them from every angle interactively, wrapping up Bokeh code for common tasks into a set of configurable and composable components.  HoloViews installs separately from Bokeh, e.g. using `conda install holoviews`, and also works with matplotlib.

In [None]:
import holoviews as hv
import numpy as np

from holoviews import opts

hv.extension('bokeh')

# A simple function

First, let us define a mathematical function to explore, using the Numpy array library:

In [None]:
def sine(x, phase=0, freq=100):
    return np.sin((freq * x + phase))

In [None]:
We will examine the effect of varying phase and frequency:


In [None]:
phases = np.linspace(0,2*np.pi,7)  # Explored phases
freqs = np.linspace(50,150,5)      # Explored frequencies

Over a specific spatial area, sampled on a grid:


In [None]:
dist = np.linspace(-0.5,0.5,81)   # Linear spatial sampling
x,y = np.meshgrid(dist, dist)
grid = (x**2+y**2)                 # 2D spatial sampling


# Succinct data visualization

**With HoloViews, we can immediately view our simple function as an image in a Bokeh plot in the Jupyter notebook, without any coding:**

In [None]:
hv.__version__

In [None]:
hv.Image(sine(grid, freq=20))

But we can just as easily use ``+`` to combine ``Image`` and ``Curve`` objects, visualizing both the 2D array (with associated histogram) and a 1D cross-section:

In [None]:
grating = hv.Image(sine(grid, freq=20), label="Sine Grating")

((grating * hv.HLine(y=0)).hist() + grating.sample(y=0).relabel("Sine Wave"))

Here you can see that a HoloViews object isn't really a plot (though it generates a Bokeh Plot when requested for display by the Jupyter notebook); it is just a wrapper around your data, and the data can be processed directly (as when taking the cross-section using `sample()` here).  In fact, your raw data is *always* still available,allowing you to go back and forth between visualizations and numerical analysis easily and flexibly:

In [None]:
grating[0,0]

In [None]:
type(grating.data)

Here the underlying data is the original Numpy array, but Python dictionaries as well as Pandas and other data formats can also be supplied.

 The underlying objects and data can always be retrieved, even in complex multi-figure objects, if you look at the `repr` of the object to find the indexes needed to address that data:

In [None]:
layout = ((grating * hv.HLine(y=0)) + grating.sample(y=0))
print(repr(layout))
layout.Overlay.Sine_Grating.Image.Sine_Grating[0,0]

Here `layout` is the name of the full complex object, and `Overlay.Sine_Grating` selects the first item (an HLine overlaid on a grating), and `Image.Sine_Grating` selects the grating within the overlay.  The grating itself is then indexed by 'x' and 'y' as shown in the repr, and the return value from such indexing is 'z' (nearly zero in this case, which you can also see by examining the curve plot above).

# Interactive exploration

HoloViews is designed to explore complicated datasets, where there can often be much more data than can be shown on screen at once.  If there are dimensions to your data that have not been laid out as adjacent plots or overlaid plots, then HoloViews will automatically generate sliders covering the remaining range of the data.  For instance, if we add an additional dimension `Y` indicating the location of the cross-section, we'll get a slider for `Y`:

In [None]:
positions = np.linspace(-0.3, 0.3, 17)

hv.HoloMap({y: (grating * hv.HLine(y)) for y in positions}, kdims='Y') + \
hv.HoloMap({y: (grating.sample(y=y))   for y in positions}, kdims='Y')

By default the data will be embedded fully into the output, allowing export to static HTML/JavaScript for distribution, but for parameter spaces too large or using dynamic data, a dynamic callback can be used with a callback that generates the data on the fly using a [DynamicMap](http://holoviews.org/Tutorials/Dynamic_Map.html).


# Setting display options

HoloViews objects like `grating` above directly contain only your data and associated metadata, not any plotting details.  Metadata like titles and units can be set on the objects either when created or subsequently, as shown using `label` and `relabel` above.  

Other properties of the visualization that are just about the view of it, not the actual data, are not stored on the HoloViews objects, but in a separate data structure.  To make it easy to control such options :

In [None]:
((grating * hv.HLine(y=0)).hist() + grating.sample(y=0)).opts(
    opts.HLine(line_color='white', line_width=9),
    opts.Image(cmap='RdYlGn'),
    opts.Curve(color='b', line_dash="dotted"),
)

Options may also be set globally using the `hv.opts.defaults` function, allowing you to alter global defaults. The `opts.<Element>` accessors will tab-complete all the available options in the IPython note. Try adding an additional option to change the `opts.Points` default style:

hv.opts.defaults(
    opts.Points(size=3)
)


# Normalizing your data

**HoloViews is designed to make it easy to understand your data. For instance, consider two circular waves with very different amplitudes:**

In [None]:
comparison = hv.Image(sine(grid)) + hv.Image(sine(grid, phase=np.pi)*0.02)

HoloViews ensures that these differences are visible by default, by normalizing across any elements of the same type that are displayed together, and even across the frames of an animation:

In [None]:
comparison = hv.Image(sine(grid)) + hv.Image(sine(grid, phase=np.pi)*0.02)
comparison.opts('Image', cmap='gray')

This default visualization makes it clear that the two patterns differ greatly in amplitude. However, it is difficult to see the structure of the low-amplitude wave in **B**.  If you wish to focus on the spatial structure rather than the amplitude, you can instruct HoloViews to normalize data in different axes separately:

In [None]:
comparison.opts(shared_axes=False)

Similarly, you could supply ``+framewise`` to tell it to normalize data per frame of an animation, not across all frames as it does by default.  As with any other customization, you can always specify which specific element you want the customization to apply to, even in a complex multiple-subfigure layout.

# External data sources

To conveniently work with external datasource such as Pandas dataframes, HoloViews provides a `.plot` like API for DataFrames using the `hvPlot` library. Let us load the iris dataset and explore how to use it. 

In [None]:
from bokeh.sampledata.iris import flowers
import hvplot.pandas

flowers.head()

In [None]:
hvPlot gives you a fully interactive Bokeh plot in just a single line of code:

In [None]:
flowers.hvplot.scatter('petal_length', 'petal_width')

And even more complicates breaking out each 'species':

In [None]:
flowers.hvplot.scatter('petal_length', 'petal_width', by='species', legend='top_left'

Lastly it let's you quickly explore complex datasets by grouping over 1 or more dimensions and generating widgets to explore the resulting parameter space:

In [None]:
flowers.hvplot.scatter('petal_length', 'petal_width', groupby='species')

In [None]:
Lastle we can easily generate multiple subplots to generate a "small multiple" plot:

In [None]:
flowers.hvplot.scatter('petal_length', 'petal_width', by='species', subplots=True, width=400)

In [None]:
As you can see HoloViews and hvPlot provide a high level API to work with complex datasets using interactive Bokeh plots.

# Interactive features

HoloViews also simplifies the ability to leverage deeply interactive features such as linked brushing. To demonstrate we will load the `autompg` which contains various statistics of a number of car models. The `hvplot.scatter_matrix` will generate a linked matrix of plot to explore all dimensions:

In [None]:
from bokeh.sampledata.autompg import autompg

hvplot.scatter_matrix(autompg, c='origin').opts(
    opts.Histogram(alpha=0.4)
)

HoloViews provides a range of options to provide deep interactivity including more configurable [linked brushing features](http://holoviews.org/user_guide/Linked_Brushing.html), interactive [data transformation](http://holoviews.org/user_guide/Transforming_Elements.html) and [data processing pipelines](http://holoviews.org/user_guide/Data_Pipelines.html).

# Learning more

If you are interested in using HoloViews in your workflow, check out the extensive tutorials at [holoviz.org](https://holoviz.org), the [hvPlot documentation](https://hvplot.holoviz.org/) and the user guides at [holoviews.org](https://holoviews.org/user_guide/) and [geoviews.org](http://geoviews.org/) for geographic applications.