Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default bins too narrow using Plot.binX #417

Closed
mkfreeman opened this issue May 25, 2021 · 11 comments
Closed

Default bins too narrow using Plot.binX #417

mkfreeman opened this issue May 25, 2021 · 11 comments
Labels
bug Something isn’t working

Comments

@mkfreeman
Copy link
Contributor

The default binning in Plot can create rects with a width that is too small to see (making a user -- namely, me -- believe the code is broken). As an example (also see this notebook):

Screen Shot 2021-05-25 at 6 12 33 PM

Compared to the same plot made using ggplot2:

ggplot_example

@mbostock
Copy link
Member

mbostock commented May 25, 2021

Good find. I think we should consider not applying an inset ({inset: 0}) if there are a lot of bins:

untitled-82

We could also cap the number of bins returned by the default d3.thresholdScott based on the width of the chart:

untitled-81

@Fil
Copy link
Contributor

Fil commented May 26, 2021

Doesn't this mean we have two issues?

a) in some cases the default binning strategy creates too many bins

this might be addressed at the level of d3.thresholdScott, not based on the chart's width; or on the level of Plot.bin, based on the chart's width?

b) insets can "reverse" a rect and make it invisible

the formula is (in rect.js):
.attr("width", i => Math.max(0, Math.abs(X2[i] - X1[i]) - this.insetLeft - this.insetRight))

here maybe the minimum width should be more than 0, perhaps 0.5? It would not eliminate all cases, in particular if you have a white stroke and the default fill-then-stroke paint order, but it would mean that a mark however small is never "zero-width".

This is a more general problem, for example when stacking values, the values that generate a rect that smaller than 1px can disappear from view if they have a white stroke. Should we decide we want all marks' geometries to be visible as described above, the rendering might still sometimes make them invisible, sometimes deliberately (fillOpacity: 0), sometimes unwittingly (stroke: white). I'm not sure it's possible to fix that, tbh, since we can't make a pixel be at the same time white and black.

It's not only rects, in practice we often have to add a half-pixel to a point's radius so that the colored surface area (after the inner part of the 1px white stroke has been deducted) is proportional to the value. The default r = sqrt(value) is not 100% correct and should be r = sqrt(value) + 1/2 strokeWidth if the stroke color is "substracting matter", eg white on a white background will make points smaller that .5px radius invisible. This is fixable by the user by setting r: {range: [.5, max] }, so that a value of 0 shows 0 color, but a value immediately > 0 shows a bit of color.

@Fil
Copy link
Contributor

Fil commented May 26, 2021

@severo has found a different avatar of this issue, when you bin on a single value:
https://talk.observablehq.com/t/histogram-with-plot-does-not-show-a-bar-for-a-single-value-array/5111/2

Plot.rectY([{ weight: 3 }], Plot.binX({ y: "count" }, { x: "weight" })).plot()

@Fil
Copy link
Contributor

Fil commented May 27, 2021

Another difference is that Plot's (and D3's) bins conflate nulls and zeros, resulting in about 35k members in the first bin, whereas ggplot2 counts about 20k members (there are 15061 nulls in the data).

@Fil
Copy link
Contributor

Fil commented May 27, 2021

Capture d’écran 2021-05-27 à 10 19 23

(For this extrememly skewed distribution, d3.thresholdFreedmanDiaconis returns almost 3 times as many bins as there are values!)

@mbostock
Copy link
Member

mbostock commented May 27, 2021

Filed d3/d3-array#203 for the null conflation issue.

Fil added a commit that referenced this issue May 29, 2021
…px), unless the original width or height is 0.

See discussion in #417
@Fil
Copy link
Contributor

Fil commented May 29, 2021

That was a productive issue with 3 [Edit: 4!] pull-requests :)

I don't know how to address @severo's example though:
Plot.rectY([{ weight: 3 }], Plot.binX({ y: "count" }, { x: "weight" })).plot() creates one bin on the [3, 3] range, and it's hard to give it a width.

@severo
Copy link

severo commented May 29, 2021

We might look at how other libraries handle this case

@Fil
Copy link
Contributor

Fil commented May 29, 2021

In RStudio it creates a single bar taking the whole width, and with a total extent of 1 (from 2.5 to 3.5).
Capture d’écran 2021-05-29 à 19 26 10

Fil added a commit that referenced this issue Jun 18, 2021
(e.g. binning of a one-element data)

degenerate case example given by @severo in #417
@Fil
Copy link
Contributor

Fil commented Jun 18, 2021

A fix to @severo's issue is implemented in #438

mbostock pushed a commit that referenced this issue Jul 31, 2021
(e.g. binning of a one-element data)

degenerate case example given by @severo in #417
mbostock added a commit that referenced this issue Aug 2, 2021
* cap the default number of bins at 200 (see #417)

* no closure; explicit args

* default argument

* explicit "auto" thresholds

* rename to thresholdAuto

* update README

Co-authored-by: Mike Bostock <mbostock@gmail.com>
@mbostock
Copy link
Member

Calling this fixed by #421 (and #470), though see #422 (comment).

@mbostock mbostock added the bug Something isn’t working label Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn’t working
Projects
None yet
Development

No branches or pull requests

4 participants