Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box whisker improvements #2187

Merged
merged 2 commits into from Dec 11, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/reference/elements/bokeh/BoxWhisker.ipynb
Expand Up @@ -28,7 +28,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A ``BoxWhisker`` Element is a quick way of visually summarizing one or more groups of numerical data through their quartiles. \n",
"A ``BoxWhisker`` Element is a quick way of visually summarizing one or more groups of numerical data through their quartiles. The boxes of a ``BoxWhisker`` element represent the first, second and third quartiles. The whiskers follow the Tukey boxplot definition representing the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile. Any points falling outside this range are shown as distinct outlier points.\n",
"\n",
"The data of a ``BoxWhisker`` Element may have any number of key dimensions representing the grouping of the value dimension and a single value dimensions representing the distribution of values within each group. See the [Tabular Datasets](../../../user_guide/07-Tabular_Datasets.ipynb) user guide for supported data formats, which include arrays, pandas dataframes and dictionaries of arrays."
]
Expand Down
2 changes: 1 addition & 1 deletion examples/reference/elements/matplotlib/BoxWhisker.ipynb
Expand Up @@ -28,7 +28,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A ``BoxWhisker`` Element is a quick way of visually summarizing one or more groups of numerical data through their quartiles. \n",
"A ``BoxWhisker`` Element is a quick way of visually summarizing one or more groups of numerical data through their quartiles. The boxes of a ``BoxWhisker`` element represent the first, second and third quartiles. The whiskers follow the Tukey boxplot definition representing the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile. Any points falling outside this range are shown as distinct outlier points.\n",
"\n",
"The data of a ``BoxWhisker`` Element may have any number of key dimensions representing the grouping of the value dimension and a single value dimensions representing the distribution of values within each group. See the [Tabular Datasets](../../../user_guide/07-Tabular_Datasets.ipynb) user guide for supported data formats, which include arrays, pandas dataframes and dictionaries of arrays."
]
Expand Down
10 changes: 8 additions & 2 deletions holoviews/element/chart.py
Expand Up @@ -137,8 +137,14 @@ class Bars(Chart):

class BoxWhisker(Chart):
"""
BoxWhisker represent data as a distributions highlighting
the median, mean and various percentiles.
BoxWhisker allows representing the distribution of data grouped
into one or more groups by summarizing the data using quartiles.
The boxes of a BoxWhisker element represent the first, second and
third quartiles. The whiskers follow the Tukey boxplot definition
representing the lowest datum still within 1.5 IQR of the lower
quartile, and the highest datum still within 1.5 IQR of the upper
quartile. Any points falling outside this range are shown as
distinct outlier points.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 'standard Tukey boxplot definition' if it is standard? Otherwise sounds like it is just a definition for boxplots...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth expanding the name to 'John W. Tukey'...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tukey box plots is the common way to refer to this style of plot, don't think the full name adds anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To quote Wikipedia:

But the ends of the whiskers can represent several possible alternative values, among them:

the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile (often called the Tukey boxplot)

"""

group = param.String(default='BoxWhisker', constant=True)
Expand Down
20 changes: 17 additions & 3 deletions holoviews/plotting/bokeh/chart.py
Expand Up @@ -3,7 +3,8 @@
import numpy as np
import param
from bokeh.models import (CategoricalColorMapper, CustomJS, HoverTool,
FactorRange, Whisker, Band, Range1d)
FactorRange, Whisker, Band, Range1d, Circle,
VBar, HBar)
from bokeh.models.tools import BoxSelectTool
from bokeh.transform import jitter

Expand Down Expand Up @@ -988,6 +989,11 @@ def _get_factors(self, element):
xfactors, yfactors = factors, []
return (yfactors, xfactors) if self.invert_axes else (xfactors, yfactors)

def _postprocess_hover(self, renderer, source):
if not isinstance(renderer.glyph, (Circle, VBar, HBar)):
return
super(BoxWhiskerPlot, self)._postprocess_hover(renderer, source)

def get_data(self, element, ranges, style):
if element.kdims:
groups = element.groupby(element.kdims).data
Expand All @@ -996,7 +1002,8 @@ def get_data(self, element, ranges, style):
vdim = dimension_sanitizer(element.vdims[0].name)

# Define CDS data
r1_data, r2_data = ({'index': [], 'top': [], 'bottom': []} for i in range(2))
r1_data, r2_data = (defaultdict(list, {'index': [], 'top': [], 'bottom': []})
for i in range(2))
s1_data, s2_data = ({'x0': [], 'y0': [], 'x1': [], 'y1': []} for i in range(2))
w1_data, w2_data = ({'index': [], vdim: []} for i in range(2))
out_data = defaultdict(list, {'index': [], vdim: []})
Expand All @@ -1021,6 +1028,7 @@ def get_data(self, element, ranges, style):
cidx = element.get_dimension_index(self.color_index)
else:
cdim, cidx = None, None
hover = any(isinstance(t, HoverTool) for t in self.state.tools)

factors = []
for key, g in groups.items():
Expand Down Expand Up @@ -1068,9 +1076,15 @@ def get_data(self, element, ranges, style):
if len(outliers):
out_data['index'] += [label]*len(outliers)
out_data[vdim] += list(outliers)
if any(isinstance(t, HoverTool) for t in self.state.tools):
if hover:
for kd, k in zip(element.kdims, wrap_tuple(key)):
out_data[dimension_sanitizer(kd.name)] += [k]*len(outliers)
if hover:
for kd, k in zip(element.kdims, wrap_tuple(key)):
r1_data[dimension_sanitizer(kd.name)].append(k)
r2_data[dimension_sanitizer(kd.name)].append(k)
r1_data[vdim].append(q2)
r2_data[vdim].append(q2)

# Define combined data and mappings
bar_glyph = 'hbar' if self.invert_axes else 'vbar'
Expand Down
14 changes: 12 additions & 2 deletions tests/testplotinstantiation.py
Expand Up @@ -1314,12 +1314,22 @@ def test_box_whisker_datetime(self):
box = BoxWhisker((times, np.random.rand(len(times))), kdims=['Date'])
plot = bokeh_renderer.get_plot(box)
formatted = [box.kdims[0].pprint_value(t) for t in times]
if bokeh_version < str('0.12.7'):
formatted = [f.replace(':', ';') for f in formatted]
self.assertTrue(all(cds.data['index'][0] in formatted for cds in
plot.state.select(ColumnDataSource)
if len(cds.data.get('index', []))))

def test_box_whisker_hover(self):
xs, ys = np.random.randint(0, 5, 100), np.random.randn(100)
box = BoxWhisker((xs, ys), 'A').sort().opts(plot=dict(tools=['hover']))
plot = bokeh_renderer.get_plot(box)
src = plot.handles['vbar_1_source']
ys = box.aggregate(function=np.median).dimension_values('y')
hover_tool = plot.handles['hover']
self.assertEqual(src.data['y'], ys)
self.assertIn(plot.handles['vbar_1glyph_renderer'], hover_tool.renderers)
self.assertIn(plot.handles['vbar_2glyph_renderer'], hover_tool.renderers)
self.assertIn(plot.handles['circle_1glyph_renderer'], hover_tool.renderers)

def test_curve_datetime64(self):
dates = [np.datetime64(dt.datetime(2016,1,i)) for i in range(1, 11)]
curve = Curve((dates, np.random.rand(10)))
Expand Down