faster bin transform #1225

mbostock · 2023-01-15T04:41:14Z

Fixes #454. The main ideas are:

Instead of binning everything, then separately grouping (and faceting), and lastly intersecting the bins with the groups (i.e., binfilter), instead bin each group separately after grouping. I chose to “eject” from d3.bin for flexibility.
Coerce dates to numbers (and use a typed array). This makes a dramatic improvement to the speed of bisection.

I haven‘t implemented two-dimensional binning yet, but it should be possible without impacting the performance for one-dimensional binning. I also haven’t implemented cumulative binning but I don’t anticipate any major challenges doing so. These have now been implemented.

Notes here: https://observablehq.com/d/0fb511ca14875b15

mbostock · 2023-01-16T02:05:58Z

Looks like I’ve introduced a regression or two (probably the first and last bin). Getting close though!

mbostock · 2023-01-16T05:05:17Z

There’s still some polishing we could do to maybeBin—for example, we could re-implement quantization binning for numeric data instead of using bisection, and maybe we could change the default reducer for data to be a no-op when we detect that there’s no other channel defined in options to reference it. And we should review the changes to maybeBin closely, since it’s really easy to introduce errors in the edge cases. (Fortunately we have a lot of tests!)

But, pretty excited about this! The 1M test now renders in ~250 ms, down from ~15 s, a 60× improvement. 🚀

mbostock · 2023-01-16T05:10:46Z

Ah, this breaks one-dimensional cumulative binning (e.g., x: {value: "carat", cumulative: true}). I’ll need to fix that, and we should have a test since it isn’t currently tested. Update: fixed! 👍

Fil · 2023-01-17T16:21:09Z

src/transforms/bin.js

@@ -74,7 +83,7 @@ function binn(
  gx, // optionally group on x (exclusive with bx and gy)
  gy, // optionally group on y (exclusive with by and gx)
  {
-    data: reduceData = reduceIdentity,
+    data: reduceData = reduceIdentity, // TODO avoid materializing when unused?


Not a single one of the tests seems to be using the return value of reduceIdentity.reduce.

Yes, it’s rare, but changing it would break backwards compatibility. You can do it like this:

Plot.rectY(data, {...Plot.binX(), title: D => D.length})

You could detect these by looking at the passed-in options, or maybe this.channels, and seeing if any of the channel definitions there do not correspond to channels produced by the bin transform. If any such channel is found, then reduceData needs to default to reduceIdentity instead of “reduceNone” (i.e., produce undefined).

Fil · 2023-01-17T18:40:47Z

test/plots/bin-1m.js

+
+export async function bin1m() {
+  return Plot.plot({
+    marks: [Plot.rectY(dates, Plot.binX({y: "count", data: "first"}))]


Reading this, it feels like a hack. In the future we might want a "null" data reducer to convey the meaning (with no added performance benefit, since it would call a null reducer {reduce:()=>{}})?

Yes, we should add the null and/or "none" reducer in the future.

Fil · 2023-01-17T18:52:42Z

src/transforms/bin.js

-function binfilter([{x0, x1}, set]) {
-  return [x0, x1, set.size ? (I) => I.filter(set.has, set) : binempty];
+// non-cumulative distribution
+function bin1(E, T, V) {


Here's a bin quantizer that can be used in lieu of bin1 if we use the "number" ticks. However, due to floating-point rounding, we need to undershoot then correct course… this is really not your most beautiful piece of code—and I fear it might create more problems.

function binq(E, T, V) { if (T.length < 2) return bin1(E, T, V); // degenerate case const a = T[0]; const b = (1 + 1e-12) / (T[1] - T[0]); return (I) => { const B = E.map(() => []); for (const i of I) { let j = Math.floor(b * (V[i] - a)); if (T[j] > V[i]) j++; B[j]?.push(i); } return B; }; }

In my tests it's about 30% faster, so maybe worth a shot (later).

I think we’ll want to use the value returned by tickIncrement directly here (like we do in d3.bin) rather than “rediscovering” it as T[1] - T[0]. But yes I suggest we defer this optimization to later.

Fil

Not only faster but also quite easier to read than previously (binfilter).

This also goes around a subtle issue we had with d3-bin (evidenced by the modification of the shorthandBinRectY test). Although the default bin domain is "extent", when we passed d3.extent as the domain it was not recognized as being the default in (https://github.com/d3/d3-array/blob/191aa03f0519593e938f5a0cae545617866103e2/src/bin.js#L37), and nice was not applied. Which is why we had only 3 bins instead of the now correct 6. (wrong analysis, too subtle ;-) )

mbostock · 2023-01-17T22:19:12Z

Although the default bin domain is "extent", when we passed d3.extent as the domain it was not recognized as being the default

I investigated this; it was recognizing the domain function correctly as extent. Instead, the change in behavior is because we’ve changed the logic (in a way that I think is still valid).

Under the old logic, this happened:

The default thresholdAuto (capped thresholdScott) suggests 4 bins.
The nice domain of [154.83, 179.37] is extended to [150, 180] (d3.nice(154.83, 179.37, 4)).
The subsequent ticks are [150, 160, 170, 180] (d3.ticks(150, 180, 4)).

Whereas under the new logic:

The default thresholdAuto (capped thresholdScott) suggests 4 bins.
The tick increment is computed as 5 (d3.tickIncrement(154.83, 179.37, 4)).
The subsequent extended ticks are [150, 155, 160, 165, 170, 175, 180].

In other words, under the new logic we compute the tick increment before we extend the domain. That is possible because we are computing the extended (niced) ticks directly, rather than first nicing the domain and then recomputing the tick increment.

* bin 1m test * faster binning * fix first and last bin * fix first and last bin, again * fix last bin, again * bypass slow data reducer * data reducer is required * fix single-value bin * fix 1d cumulative

mbostock mentioned this pull request Jan 15, 2023

Could binning millions of values be faster? #454

Closed

mbostock force-pushed the mbostock/faster-bin branch from 5e2cd64 to 3c876fc Compare January 16, 2023 02:03

mbostock requested a review from Fil January 16, 2023 05:01

mbostock marked this pull request as ready for review January 16, 2023 05:02

Fil reviewed Jan 17, 2023

View reviewed changes

mbostock added 9 commits January 17, 2023 09:56

bin 1m test

d17fb18

faster binning

a055815

fix first and last bin

8906f26

fix first and last bin, again

e49d2ed

fix last bin, again

dd23fcb

bypass slow data reducer

224983e

data reducer is required

5f74388

fix single-value bin

d8e85c1

fix 1d cumulative

8634456

mbostock force-pushed the mbostock/faster-bin branch from 1d2243f to 8634456 Compare January 17, 2023 17:56

Fil reviewed Jan 17, 2023

View reviewed changes

Fil approved these changes Jan 17, 2023

View reviewed changes

mbostock merged commit b773d87 into main Jan 17, 2023

mbostock deleted the mbostock/faster-bin branch January 17, 2023 22:25

Fil mentioned this pull request Jan 18, 2023

Faster bins 2 #1229

Open

mbostock mentioned this pull request Mar 3, 2023

Upgrading from 0.6.1 to 0.6.2 seems to have broken (or changed in unexpected way) date handling #1311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

faster bin transform #1225

faster bin transform #1225

Uh oh!

mbostock commented Jan 15, 2023 •

edited

Loading

Uh oh!

mbostock commented Jan 16, 2023

Uh oh!

mbostock commented Jan 16, 2023

Uh oh!

mbostock commented Jan 16, 2023 •

edited

Loading

Uh oh!

Fil Jan 17, 2023

Uh oh!

mbostock Jan 17, 2023

Uh oh!

Fil Jan 17, 2023

Uh oh!

mbostock Jan 17, 2023

Uh oh!

Fil Jan 17, 2023 •

edited

Loading

Uh oh!

mbostock Jan 17, 2023

Uh oh!

Fil left a comment •

edited

Loading

Uh oh!

mbostock commented Jan 17, 2023 •

edited

Loading

Uh oh!

Uh oh!

faster bin transform #1225

faster bin transform #1225

Uh oh!

Conversation

mbostock commented Jan 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbostock commented Jan 16, 2023

Uh oh!

mbostock commented Jan 16, 2023

Uh oh!

mbostock commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fil Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

mbostock Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Fil Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

mbostock Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Fil Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbostock Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Fil left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbostock commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mbostock commented Jan 15, 2023 •

edited

Loading

mbostock commented Jan 16, 2023 •

edited

Loading

Fil Jan 17, 2023 •

edited

Loading

Fil left a comment •

edited

Loading

mbostock commented Jan 17, 2023 •

edited

Loading