New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram events & bin hover label improvements #2113

Merged
merged 18 commits into from Oct 24, 2017

Conversation

Projects
None yet
2 participants
@alexcjohnson
Contributor

alexcjohnson commented Oct 23, 2017

Implements #2071 (histogram events get the input points selected) and #2086 (better display of data ranges for histogram bins) for both histogram and the 2d histogram types (histogram2d and histogram2dcontour)

// don't trust floating point equality - fraction of bin size to call
// "on the line" and ensure that they go the right way specified by
// linelow
var roundingError = 1e-9;

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

I'm a little surprised that this hadn't come up before (other than the test below that was actually wrong before), and obviously it's a bit questionable exactly what small fraction to put here... but if we just use straight < vs <= etc it's pretty easy to trick findBin on the edges.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

I'm a little surprised that this hadn't come up before (other than the test below that was actually wrong before), and obviously it's a bit questionable exactly what small fraction to put here... but if we just use straight < vs <= etc it's pretty easy to trick findBin on the edges.

// too small of a right gap: stop disambiguating data at the edge
_test(0, 0.0009, [0, 1, 2], false, [0, 1, 1, 2]);
_test(0.1, 0.009, [115, 125, 135], false, [115, 125, 125, 135]);

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

@etpinard @chriddyp as discussed offline this morning - compare to lines 699 and 701 above that are just on the other side of the cutoff I implemented in ad0e08f - seem reasonable?

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

@etpinard @chriddyp as discussed offline this morning - compare to lines 699 and 701 above that are just on the other side of the cutoff I implemented in ad0e08f - seem reasonable?

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

to be clear, what these tests are indicating is that the hover labels for these bins will be 115 - 124.99 and 125 - 134.99 on the disambiguated side of the cutoff, and 115 - 125 and 125 - 135 on the too-many-digits-to-disambiguate side.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

to be clear, what these tests are indicating is that the hover labels for these bins will be 115 - 124.99 and 125 - 134.99 on the disambiguated side of the cutoff, and 115 - 125 and 125 - 135 on the too-many-digits-to-disambiguate side.

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

Looks good 👍

@etpinard

etpinard Oct 24, 2017

Member

Looks good 👍

[cn1_00, Lib.dateTime2ms('2009-01-01', cn), cn1_10, Lib.dateTime2ms('2019-01-01', cn)]);
// occasionally we give extra precision for world dates (month when it should be year
// or day when it should be month). That's better than doing the opposite... not going
// to fix now, too many edge cases, better not to complicate the logic for them all.

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

The algorithm I came up with for figuring out how many digits to show is basically:

  • find the minimum distance from data points to bin edges from either side - this is just two numbers for the whole distribution
  • for two consecutive bins, find the largest digit that changes in both the left and right gaps (with a little shifting to account for rounding errors)

This algorithm is a little complicated to state, but it's quick to calculate and it's robust as far as I can see, on numeric, category, and gregorian date axes (though even in that case there is a weird edge case that confuses it). However, the larger variability of month and year lengths in world calendars causes it to sometimes think a digit doesn't change when in the actual data it does, and increasing the tolerance to catch this breaks some much more common cases.

I suspect we could fix this by looking at all bin edges independent of each other, but that's far more computationally intensive, and has some other potential weird pitfalls. I don't think providing a bit of extra precision in the label for a few world calendar cases justifies all that extra complexity.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

The algorithm I came up with for figuring out how many digits to show is basically:

  • find the minimum distance from data points to bin edges from either side - this is just two numbers for the whole distribution
  • for two consecutive bins, find the largest digit that changes in both the left and right gaps (with a little shifting to account for rounding errors)

This algorithm is a little complicated to state, but it's quick to calculate and it's robust as far as I can see, on numeric, category, and gregorian date axes (though even in that case there is a weird edge case that confuses it). However, the larger variability of month and year lengths in world calendars causes it to sometimes think a digit doesn't change when in the actual data it does, and increasing the tolerance to catch this breaks some much more common cases.

I suspect we could fix this by looking at all bin edges independent of each other, but that's far more computationally intensive, and has some other potential weird pitfalls. I don't think providing a bit of extra precision in the label for a few world calendar cases justifies all that extra complexity.

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

algo. Thanks for stating it in details.

Can you comment on what modifications (if any) your algo will need when we implement variable-sized bins #360?

@etpinard

etpinard Oct 24, 2017

Member

algo. Thanks for stating it in details.

Can you comment on what modifications (if any) your algo will need when we implement variable-sized bins #360?

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

The issue with dates is that the actual time increments - the "digits" we're looking to see if they have changed - have different sizes in terms of milliseconds. The bin sizes being different should not be a problem, though people could do things to thwart the algorithm, like having the first two bin edges on fairly round values but some subsequent edges at not-so-round values. The idea that two edges are enough to determine the biggest digit that always changes comes from the observation that if one bin is extra-round, its neighbor will not be. So to do this robustly with nonuniform bins, we could either:

  • Do this same algorithm on all bin edges. That's different from looking at all bin edges independently as I suggested above for world calendars, though that would solve this problem too - what I'm suggesting here is to calculate just one left-edge-gap and one right-edge-gap as we do now, but then test those gaps against all bin edges instead of the first two.
  • As a last resort, make a new attribute for folks to set their own rounding.

One common case mentioned in #360 is equally-sized bins on a log axis. There, the whole assumption of a single digit to round to is wrong, and we'll have to do something completely different or just give up on the idea of disambiguating the edges, and let the regular axis-formatting-for-hover system handle it unchanged.

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

The issue with dates is that the actual time increments - the "digits" we're looking to see if they have changed - have different sizes in terms of milliseconds. The bin sizes being different should not be a problem, though people could do things to thwart the algorithm, like having the first two bin edges on fairly round values but some subsequent edges at not-so-round values. The idea that two edges are enough to determine the biggest digit that always changes comes from the observation that if one bin is extra-round, its neighbor will not be. So to do this robustly with nonuniform bins, we could either:

  • Do this same algorithm on all bin edges. That's different from looking at all bin edges independently as I suggested above for world calendars, though that would solve this problem too - what I'm suggesting here is to calculate just one left-edge-gap and one right-edge-gap as we do now, but then test those gaps against all bin edges instead of the first two.
  • As a last resort, make a new attribute for folks to set their own rounding.

One common case mentioned in #360 is equally-sized bins on a log axis. There, the whole assumption of a single digit to round to is wrong, and we'll have to do something completely different or just give up on the idea of disambiguating the edges, and let the regular axis-formatting-for-hover system handle it unchanged.

@@ -1221,7 +1221,7 @@ axes.hoverLabelText = function(ax, val) {
var tx = axes.tickText(ax, ax.c2l(logOffScale ? -val : val), 'hover').text;
if(logOffScale) {
return val === 0 ? '0' : '-' + tx;
return val === 0 ? '0' : MINUS_SIGN + tx;

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

been around for a while, using regular dash instead of unicode minus. tested in 3aa03f7#diff-846cf5b534aa0d22bdd1da2b43ac3cbaR517

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

been around for a while, using regular dash instead of unicode minus. tested in 3aa03f7#diff-846cf5b534aa0d22bdd1da2b43ac3cbaR517

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

I was wondering about that. Thanks 👌

@etpinard

etpinard Oct 24, 2017

Member

I was wondering about that. Thanks 👌

_hover(gd, 250, 200);
assertHoverLabelContent({
nums: '3',
axis: '3.3'

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

⚠️ another potentially controversial decision: whenever every bin contains a single unique value, label exactly that value, not a range, even if those single values are not at the same places within the bins. But as soon as you have any bin with two distinct values in it, we use the range logic.

Note: for histogram2d I applied this independently to each axis, not to each brick. So if one x brick has only the point (2, 3) and one above it has only the point (2.1, 4), this will make the x bins show ranges, because that one x bin has multiple values, even though each brick has unique values in it.

@alexcjohnson

alexcjohnson Oct 23, 2017

Contributor

⚠️ another potentially controversial decision: whenever every bin contains a single unique value, label exactly that value, not a range, even if those single values are not at the same places within the bins. But as soon as you have any bin with two distinct values in it, we use the range logic.

Note: for histogram2d I applied this independently to each axis, not to each brick. So if one x brick has only the point (2, 3) and one above it has only the point (2.1, 4), this will make the x bins show ranges, because that one x bin has multiple values, even though each brick has unique values in it.

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

Nice touch here. I was worried that category histograms would get e.g, apples - lemons bin hover labels. 👌

@etpinard

etpinard Oct 24, 2017

Member

Nice touch here. I was worried that category histograms would get e.g, apples - lemons bin hover labels. 👌

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

That can happen - but only if the bins actually contain apples and lemons.

Plotly.newPlot(gd, [{
  x: ['apples','apples','apples','lemons','lemons','peaches','peaches','pears'],
  type: 'histogram',
  xbins: {start: -0.5, end: 3.5, size: 2}
}],{
  width: 400,
  height: 400
})

screen shot 2017-10-24 at 11 43 19 am

It's possible to put as many categories as you want into one bin, and as with numbers we'll only show the first and last ones in the label. Never happens automatically since it's in general confusing, but if you want to do it we won't stop you!

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

That can happen - but only if the bins actually contain apples and lemons.

Plotly.newPlot(gd, [{
  x: ['apples','apples','apples','lemons','lemons','peaches','peaches','pears'],
  type: 'histogram',
  xbins: {start: -0.5, end: 3.5, size: 2}
}],{
  width: 400,
  height: 400
})

screen shot 2017-10-24 at 11 43 19 am

It's possible to put as many categories as you want into one bin, and as with numbers we'll only show the first and last ones in the label. Never happens automatically since it's in general confusing, but if you want to do it we won't stop you!

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

but if you want to do it we won't stop you!

Cool. Would you mind adding a test case for this situation?

@etpinard

etpinard Oct 24, 2017

Member

but if you want to do it we won't stop you!

Cool. Would you mind adding a test case for this situation?

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

Cool. Would you mind adding a test case for this situation?

0331a16

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

Cool. Would you mind adding a test case for this situation?

0331a16

@etpinard

Looks great. I made one blocking comment (test plotly_hover event data).

Once merged, we should open a new issue or leave #2086 open to discuss what to do with cumulative histograms and variable-sized bins #360

Show outdated Hide outdated src/components/fx/hover.js
Show outdated Hide outdated src/traces/histogram/hover.js
@@ -1221,7 +1221,7 @@ axes.hoverLabelText = function(ax, val) {
var tx = axes.tickText(ax, ax.c2l(logOffScale ? -val : val), 'hover').text;
if(logOffScale) {
return val === 0 ? '0' : '-' + tx;
return val === 0 ? '0' : MINUS_SIGN + tx;

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

I was wondering about that. Thanks 👌

@etpinard

etpinard Oct 24, 2017

Member

I was wondering about that. Thanks 👌

.then(function() {
_hover(gd, 250, 200);
assertHoverLabelContent({
nums: '\u22125', // unicode minus

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

Thanks for 🔒 ing this down!

@etpinard

etpinard Oct 24, 2017

Member

Thanks for 🔒 ing this down!

}
else {
cdi.p0 = roundFn(binEdges[i]);
cdi.p1 = roundFn(binEdges[i + 1], true);

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

❤️ ing that you were able to put all thing new rounding logic in calc.

@etpinard

etpinard Oct 24, 2017

Member

❤️ ing that you were able to put all thing new rounding logic in calc.

// 15-day gap - still days
_test(0, 15 * day, [jan17, jan18, jan19], 'gregorian',
[jan17, jan18 - day, jan18, jan19 - day]);
// 28-day gap STILL gets days - in principle this might happen with data

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

thanks for writing this down

@etpinard

etpinard Oct 24, 2017

Member

thanks for writing this down

[cn1_00, Lib.dateTime2ms('2009-01-01', cn), cn1_10, Lib.dateTime2ms('2019-01-01', cn)]);
// occasionally we give extra precision for world dates (month when it should be year
// or day when it should be month). That's better than doing the opposite... not going
// to fix now, too many edge cases, better not to complicate the logic for them all.

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

algo. Thanks for stating it in details.

Can you comment on what modifications (if any) your algo will need when we implement variable-sized bins #360?

@etpinard

etpinard Oct 24, 2017

Member

algo. Thanks for stating it in details.

Can you comment on what modifications (if any) your algo will need when we implement variable-sized bins #360?

_hover(gd, 250, 200);
assertHoverLabelContent({
nums: '3',
axis: '3.3'

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

Nice touch here. I was worried that category histograms would get e.g, apples - lemons bin hover labels. 👌

@etpinard

etpinard Oct 24, 2017

Member

Nice touch here. I was worried that category histograms would get e.g, apples - lemons bin hover labels. 👌

})
.then(function() {
_hover(gd, 250, 200);
assertHoverLabelContent({

This comment has been minimized.

@etpinard

etpinard Oct 24, 2017

Member

Would mind checking that plotly_hover does emit the correct event data?

@etpinard

etpinard Oct 24, 2017

Member

Would mind checking that plotly_hover does emit the correct event data?

This comment has been minimized.

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

Good catch -> tested in 2696c1c

@alexcjohnson

alexcjohnson Oct 24, 2017

Contributor

Good catch -> tested in 2696c1c

Show outdated Hide outdated src/traces/histogram/calc.js
@etpinard

This comment has been minimized.

Show comment
Hide comment
@etpinard

etpinard Oct 24, 2017

Member

Nicely done 💃

Member

etpinard commented Oct 24, 2017

Nicely done 💃

@alexcjohnson alexcjohnson merged commit 9bfbabf into master Oct 24, 2017

1 check passed

ci/circleci Your tests passed on CircleCI!
Details

@alexcjohnson alexcjohnson deleted the histogram-events branch Oct 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment