how to: deal with explicitly setting null values in data? #7

DougBurke · 2019-07-27T13:23:47Z

This is from #6, and is perhaps less-important than the others from that ticket, since I think you can work around this by providing data directly in JSON

*) Line Chart with Markers and Invalid Values

I experimentally added a NullValue constructor to DataValue, but I am not 100% happy with this choice, to be able to represent the following data:

"values": [
  {
    "x": 1,
    "y": 10
  },
  {
    "x": 2,
    "y": 30
  },
  {
    "x": 3,
    "y": null
  },

...

The text was updated successfully, but these errors were encountered:

jwoLondon · 2019-07-27T14:44:35Z

I guess we have several choices here in addition to direct JSON encoding.

In Elm at least, if you supply something that evaluates to an invalid number, this will be handled in the same way as null. For example 0 / 0

data =
    dataFromColumns []
        << dataColumn "x" (nums [ 1, 2, 3 ])
        << dataColumn "y" (nums [ 10, 30, 0 / 0 ])

While this works, it feels ugly to me and not sure how Haskell would handle that. Also litvis doesn't handle expressions that generate invalid numbers like this. So I agree we probably need something else.

Could hide the ugliness of the above (but still use it) with additional functions such as

nullNum : Float
nullNum =
    0 / 0

Again I don't like this solution much, but it is at least an option with the current implementation.

We could create a parallel series of maybeNums, maybeStrs etc. with type signatures of List (Maybe Float), List (Maybe String) etc. This would allow something like

data =
    dataFromColumns []
        << dataColumn "x" (nums [ 1, 2, 3 ])
        << dataColumn "y" (maybeNums [ Just 10, Just 30, Nothing ])

This would be the most idiomatic Elm (and presumably easily translatable into Haskell), but is rather verbose. The question is whether the benefits of this make it worthwhile given we can always replicate with direct JSON encoding.

DougBurke · 2019-07-29T12:02:55Z

In Haskell 0 / 0 becomes NaN, but I don't really like using this as an out-of-band value since it doesn't work for non-numeric types (so strings, date times, and bools in Vega-Lite land), and idiomatic Haskell would be to use the Maybe type.

For DataValue, adding an explicit NullValue (or some other appropriate name) seems to work, but I haven't checked against the spec to see if it makes sense in all cases, or only with dataRow.

For dataColumn I think the maybeXXX style approach looks cleanest, but - as you say - is rather verbose.

It may well be an issue that's not worth addressing at this time. I don't have any feel for how people are using the API when they are creating the data: are they just using dataFromJson or using the dataFromRows/Columns approach?. The one "advantage" I can see to the second option is that (apart from making my life simpler when comparing the output to the specifications ;-) is that it is a bit more "type safe", in the sense that you know you are giving the API something with the correct structure, rather than an opaque JSON value. However, I'm not sure that it's really that much of an advantage (as I can still create data which doesn't make sense to plot with the dataFromRows/Columns options).

DougBurke · 2019-08-02T12:18:49Z

Here's another example that I don't think we can encode with the current DataValue/dataRow setup: https://vega.github.io/vega-lite/examples/facet_bullet.html - namely rows like

      {"title":"Profit", "subtitle":"%", "ranges":[20,25,30],"measures":[21,23],"markers":[26]},

I ended up using dataFromJson to encode this.

While you could extend DataValue with a constructor that takes DataValues, I'm not sure that's a sensible change.

So this does suggest that it's not worth changing the current setup, and just point users to dataFromJson for complex scenarios.

jwoLondon · 2019-08-21T14:26:46Z

I've added a nullValue function (backed by a NullValue constructor) as a convenience for dataFromRows specification. For more complex cases, I've updated the API docs to suggest using dataFromJson. While this does violate the 'make impossible states impossible' principle, it probably makes things a little easier overall when null data values are required in data sources.

This also now allows impute, viaimNewValue to convert any arbitrary data value to an explicit nullValue.

DougBurke mentioned this issue Aug 14, 2019

how to: set Init to multiple values, not a scalar #16

Closed

jwoLondon closed this as completed Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to: deal with explicitly setting null values in data? #7

how to: deal with explicitly setting null values in data? #7

DougBurke commented Jul 27, 2019

jwoLondon commented Jul 27, 2019 •

edited

DougBurke commented Jul 29, 2019

DougBurke commented Aug 2, 2019

jwoLondon commented Aug 21, 2019

how to: deal with explicitly setting null values in data? #7

how to: deal with explicitly setting null values in data? #7

Comments

DougBurke commented Jul 27, 2019

jwoLondon commented Jul 27, 2019 • edited

DougBurke commented Jul 29, 2019

DougBurke commented Aug 2, 2019

jwoLondon commented Aug 21, 2019

jwoLondon commented Jul 27, 2019 •

edited