Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto mark: never render rect; move zero-ness to autoSpec #1368

Merged
merged 15 commits into from
Mar 23, 2023

Conversation

tophtucker
Copy link
Contributor

@tophtucker tophtucker commented Mar 21, 2023

Still pretty rough… head's spinning a bit from all the combinations.

Fixes #1340 by preferring rectY over rect… but that messes with our zero-ness heuristic, which depended on the inferred mark implementation. Moved that logic to autoSpec but haven't fixed it in the heatmap case yet.

Fixes #1365 by partially reverting 6eab4f2 's changes to how we decide the mark type.

To-do:

  • Fix heatmap issue: only default to zero-ness if the bar is "standing on" the baseline, i.e. doesn't have both x1 & x2 or y1 & y2 set
  • Materialize less often?
  • Fix tests

Questions:

  • Should setting zero change which mark renders?
    • In the autoBarZero, zero was being used as a cue that we could draw bars instead of dots. I wonder if that's the right heuristic. The more salient thing about that dataset might be that there's exactly one data point per domain value?
    • Should setting zero change a line to an area? Areas' guarantee of a meaningful zero do distinguish them from lines, but lines could also have a meaningful zero. Area implies zero but zero doesn't imply area.
    • If someone has a temperature chart, they probably only expect setting zero to add a line. But maybe you shouldn't use zero for that?
    • Overall, it feels too "clever" and too surprising for setting zero to change the mark. But doesn't it feel wrong that you can't automatically get a bar chart from the alphabet dataset?

Current broken tests

Definitely gotta fix the heatmaps. The alphabet example I'm not sure about; maybe people should just have to specify bar there. The mean zero going back to being a line feels OK.

Before After
autoHeatmap autoHeatmap-changed
autoHeatmapOrdCont autoHeatmapOrdCont-changed
autoBarZero autoBarZero-changed
autoBarMeanZero autoBarMeanZero-changed

src/marks/auto.js Outdated Show resolved Hide resolved
@@ -75,19 +75,43 @@ export function autoSpec(data, options) {
: null;
}

// TODO: should we just always materialize in here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should materialize only when necessary; this should be moved into conditional checks below. (But there’s no rush to do that now while we’re still figuring out the heuristic.)

@tophtucker
Copy link
Contributor Author

The heatmap examples are now fixed by checking !colorReduce. Would that ever fail? I mean, culmen length and body mass both have meaningful zeros, but I still don't expect or want the baselines there, because the values aren't encoded by the length of the bars, only their position.

Binning / grouping should never change the zero-ness of anything, right? In a histogram, the binning is on the other dimension from the one with the meaningful zero.

In this PR we now have two places where we assert zero-ness: before inferring the mark type, and after. I'm trying to think through whether that makes sense. Before, we check only if a zero reducer has been applied. After, we check if we've picked a mark that has to "stand on" a baseline. That feels sorta reasonable?

For the autoBarZero test (the simple bar chart of alphabet), here's a demo of an alternate heuristic that prefers bars when there's one data point for each value in an ordinal domain. (But really it should probably depend on that and zero-ness.) https://observablehq.com/d/ac53225f9c7e967b

@tophtucker
Copy link
Contributor Author

tophtucker commented Mar 22, 2023

Paired with Mike. To move the zero-ness determination into autoSpec (which would move it above the determination of mark and transform implementations), we started down the road of mirroring the logic for deciding the mark and transform implementations, since zero-ness depends on that. But that started to feel like a boondoggle; we're going to refactor a bit, but the sequence of the logic will stay more similar to how it is now.

Here's my current understanding of how it should work:

  1. The mark type (line, bar, area…) depends only on the chosen fields and reducers, never on zero-ness. Picking a zero reducer (count, sum…) may affect the mark type, but passing in zero: true won't.
  2. The mark implementation (barX, barY, ...) may depend on zero-ness, because, when there are three channels we need to fill (either x, y1, and y2 or y, x1, and x2) but only two channels specified in options (x and y), we need to infer whether x1 should be inferred to be 0, or y1 should be inferred to be 0.
    • TODO: Should this really depend on the zero option the user passes in, or does it only depend on if you've specified a zero reducer on that dimension?
  3. The transform implementation does not depend on zero-ness.
  4. The zero-ness, if still undefined, is chosen based on the mark and transform implementations. A dimension is zero-ful if it's defined, we're not binning along that dimension, and the mark is a bar, area, rect, or rule extending continuously along that dimension. At that point it affects only whether a baseline is drawn.

And here's a sketch of the refactoring. The public interfaces for autoSpec and auto won't change, but the bulk of the logic — formerly all held in auto, then split into auto and autoSpec — will be re-consolidated in a new private function, autoImpl.

// expose information about what would be done; don't instantiate
export function autoSpec(data, options) {
    const {x, y, fx, fy, color, size, mark} = autoImpl(data, options);
    return {x, y, fx, fy, color, size, mark};
}

// decide mark and transform implementations and anything that was undefined in options; 
// everything but instantiating
function autoImpl(data, options) {
    // most of the auto logic gets re-consolidated here
    return {...options, markImpl, markOptions, transformImpl, transformOptions}
}

// actually instantiate; just the last few lines of today's auto
export function auto(data, options) {
    const {markImpl, markOptions, transformImpl, transformOptions, xZero, yZero, fx, fy, colorMode} 
      = autoImpl(data, options)
    return marks(/* ... */);
}

@tophtucker
Copy link
Contributor Author

OK, the only remaining two snapshot test failures are deliberate consequences of the change in philosophy to say that mark type should never depend on zero-ness. I've updated the tests so they won't fail any more, and updated the names: autoBarZero → autoDotZero; autoBarMeanZero → autoLineMeanZero.

Code Before After
alphabet, {x: {value: "frequency", zero: true}, y: "letter"} autoBarZero autoBarZero-changed
weather, {x: "date", y: {value: "temp_max", reduce: "mean", zero: true}} autoBarMeanZero autoBarMeanZero-changed

For the second one, it was already a bit of a toss-up which way we'd want to see. The first one's a bit of a shame; it feels a little weird that you can never get a simple bar chart from auto except by asking for it. 🤷

(Philosophical aside! I think part of the auto philosophy is that we should be able to infer mark types from information about the data. It feels nice to me to describe that property of the data that makes it deserve the mark, rather than just asking for the mark; it moves the location of human input upstream, so that we could potentially make other decisions based on it. The human annotation of zero-ness could also inform color scales, whereas the human annotation of bar-ness doesn't generalize so well. Better yet, if zero-ness were part of the table schema rather than part of the chart config, it could produce better defaults for all charts of that data. That feels like a good academic topic: what sorts of stronger types could we annotate numbers with to produce better default displays?)


I had my li’l auto matrix test, which works nicely in the Vite live preview:

export async function autoMatrix() {
  return htl.html`<div style="display: flex; flex-wrap: wrap;">
    <style>svg { width: 320px; }</style>
    ${await autoHistogram()}
    ${await autoDotZero()}
    ${await autoLineZero()}
    ${/* etc */}
  </div>`
}

But when I run yarn test I get:

  1) plot autoMatrix:
     TypeError: Cannot read properties of null (reading 'replace')
      at reindexStyle (file:///Users/toph/Development/plot/test/plot.js:74:62)
      at file:///Users/toph/Development/plot/test/plot.js:16:5
      at async Context.<anonymous> (file:///Users/toph/Development/plot/test/jsdom.js:29:14)

Because test/plot.js expects every top-level node to be a plot. If you think it'd be valuable to have I can try to make that work, but for now I took it out.

@tophtucker tophtucker marked this pull request as ready for review March 23, 2023 04:33
Comment on lines +23 to +30
// Greedily materialize columns for type inference; we’ll need them anyway to
// plot! Note that we don’t apply any type inference to the fx and fy
// channels, if present; these are always ordinal (at least for now).
const {x, y, color, size} = options;
const X = materializeValue(data, x);
const Y = materializeValue(data, y);
const C = materializeValue(data, color);
const S = materializeValue(data, size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved here from top of auto, which it was passing into autoSpec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want to do this materialization (also) inside of auto, so that it doesn’t need to be done twice (the second time being when we call auto.plot).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh right, we're not passing these materialized values back out of autoImpl… and we don't want to, because autoSpec shouldn't return the materialized values. So I guess auto should greedily materialize and autoImpl shouldn't bc autoSpec should return something cleaner if possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong; it doesn’t materialize twice because autoImpl returns the full markOptions with the already-materialized values. I think we’re good as-is. Nice!

Comment on lines 171 to 176
if (xZero === undefined)
xZero = X && transform !== binX && (mark === barX || mark === areaX || mark === rectX || mark === ruleY);
xZero =
X &&
transform !== bin &&
transform !== binX &&
(markImpl === barX || markImpl === areaX || markImpl === rectX || markImpl === ruleY);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the transform !== bin check for both xZero and yZero, which I think achieves the same thing as checking !colorReduce && !sizeReduce in the earlier implementation, i.e. it fixes the heatmap case.

@@ -211,23 +164,78 @@ export function auto(data, options) {
if (transform) {
if (transform === bin || transform === binX) markOptions.x = {value: X, ...xOptions};
if (transform === bin || transform === binY) markOptions.y = {value: Y, ...yOptions};
markOptions = transform(transformOptions, markOptions);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved down into auto bc it's instantiating

x: {
value: xValue ?? null,
reduce: xReduce ?? null,
zero: xZero ?? false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well coerce the input to a boolean here, in case the user passed in something like 0?

Suggested change
zero: xZero ?? false,
zero: !!xZero,

y: {
value: yValue ?? null,
reduce: yReduce ?? null,
zero: yZero ?? false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Suggested change
zero: yZero ?? false,
zero: !!yZero,

colorMode
} = autoImpl(data, options);

if (transform) markOptions = transform(transformOptions, markOptions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could fold this into autoImpl and then autoImpl still only needs to return markImpl and markOptions—not also transformImpl and transformOptions. Not necessary, though… 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i wasn't sure if there was some explicit benefit to not instantiating — like, if it could potentially have side effects, or be slower. also might depend on whether we'd want autoSpec to be able to report any info on transforms?

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👏👏

@Fil
Copy link
Contributor

Fil commented Mar 23, 2023

To fix the autoMatrix test we just need this tiny change:

--- a/test/plot.js
+++ b/test/plot.js
@@ -71,7 +71,7 @@ function reindexStyle(root) {
     const parent = style.parentNode;
     const uid = parent.getAttribute("class");
     for (const child of [parent, ...parent.querySelectorAll("[class]")]) {
-      child.setAttribute("class", child.getAttribute("class").replace(new RegExp(`\\b${uid}\\b`, "g"), name));
+      child.setAttribute("class", child.getAttribute("class")?.replace(new RegExp(`\\b${uid}\\b`, "g"), name));
     }
     style.textContent = style.textContent.replace(new RegExp(`[.]${uid}`, "g"), `.${name}`);
   }

transform !== bin &&
transform !== binX &&
!(transformImpl === bin || transformImpl === binX) &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point of this change that they're mutually exclusive and !(a || b) can short-circuit faster than !a && !b or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found it more readable by grouping the related checks.

@tophtucker
Copy link
Contributor Author

tophtucker commented Mar 23, 2023

To fix the autoMatrix test we just need this tiny change

Oooh, easy, thanks Fil! I'm gonna make that a separate PR because I also wanna think about which plots should actually be in the matrix and whether it makes sense to have that kind of redundancy. (Feels a little weird that the matrix is useful for looking at the live results, but doesn't add any coverage for the automated testing.)

And thanks Mike for lots of good lil edits! 🙏

@tophtucker tophtucker merged commit 640e3f9 into main Mar 23, 2023
@tophtucker tophtucker deleted the toph/never-rect branch March 23, 2023 14:00
@mbostock
Copy link
Member

I don’t think we should add the redundant matrix test. That was just an idea for debugging this problem.

chaichontat pushed a commit to chaichontat/plot that referenced this pull request Jan 14, 2024
…q#1368)

* Auto mark: never render rect; move zero-ness to autoSpec

* update test artifacts

* fix some tests; only set zero on a dimension if that dimension is defined

* Update src/marks/auto.js

Co-authored-by: Mike Bostock <mbostock@gmail.com>

* dont set zero-ness if colorReduce

* fix some tests

* prettier

* just committing the state after pairing so i have it

* revert auto file

* re-fix the original motivating bugs, i think

* autoImpl

* rm autoplot matrix test bc it didnt work with test runner

* transformImpl; coerce zero; sort imports; const

* normalize mark option

---------

Co-authored-by: Mike Bostock <mbostock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto mark: no display with zero: true Auto mark: no display with non-zero reducer and bar mark
3 participants