Skip to content

Boxplot produces invalid Vega-Lite: type encoding channel leaks into output #448

@cpsievert

Description

@cpsievert

Boxplot queries produce Vega-Lite JSON that fails validation against the Vega-Lite v6 schema. This issue was discovered in posit-dev/querychat#232, but the issue is still present on main.

BTW, in a future version of the Python package, I plan on not validating JSON against the schema by default, so that alone would fix this problem, but I also figured it's still probably worth reporting.

Reproduce with the CLI

ggsql exec \
  "SELECT grp, value FROM (VALUES ('A', 1), ('A', 2), ('A', 3), ('B', 4), ('B', 5), ('B', 6)) AS t(grp, value) VISUALISE grp AS x, value AS y DRAW boxplot" \
  --reader duckdb://memory

Paste the output into the Vega-Lite Editor and note the warnings about validation

What's wrong

Two issues in the generated spec:

  1. Each sub-layer's encoding contains a "type" channel, which is not a valid Vega-Lite encoding property:

    "encoding": {
      "type": {
        "field": "__ggsql_aes_type__",
        "type": "nominal",
        ...
      }
    }

    The boxplot stat's internal "type" aesthetic (used to tag rows as box/median/whisker/outlier) is correctly dropped from the data values but still emitted as an encoding.

  2. The y2 encoding on whisker/box layers carries axis, scale, type, and stack — properties that y2 (a secondary position channel) does not accept.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions