Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to: deal with explicitly setting null values in data? #7

Closed
DougBurke opened this issue Jul 27, 2019 · 4 comments
Closed

how to: deal with explicitly setting null values in data? #7

DougBurke opened this issue Jul 27, 2019 · 4 comments

Comments

@DougBurke
Copy link
Contributor

This is from #6, and is perhaps less-important than the others from that ticket, since I think you can work around this by providing data directly in JSON

*) Line Chart with Markers and Invalid Values

I experimentally added a NullValue constructor to DataValue, but I am not 100% happy with this choice, to be able to represent the following data:

"values": [
  {
    "x": 1,
    "y": 10
  },
  {
    "x": 2,
    "y": 30
  },
  {
    "x": 3,
    "y": null
  },

...

@jwoLondon
Copy link
Member

jwoLondon commented Jul 27, 2019

I guess we have several choices here in addition to direct JSON encoding.

  1. In Elm at least, if you supply something that evaluates to an invalid number, this will be handled in the same way as null. For example 0 / 0
data =
    dataFromColumns []
        << dataColumn "x" (nums [ 1, 2, 3 ])
        << dataColumn "y" (nums [ 10, 30, 0 / 0 ])

While this works, it feels ugly to me and not sure how Haskell would handle that. Also litvis doesn't handle expressions that generate invalid numbers like this. So I agree we probably need something else.

  1. Could hide the ugliness of the above (but still use it) with additional functions such as
nullNum : Float
nullNum =
    0 / 0

Again I don't like this solution much, but it is at least an option with the current implementation.

  1. We could create a parallel series of maybeNums, maybeStrs etc. with type signatures of List (Maybe Float), List (Maybe String) etc. This would allow something like
data =
    dataFromColumns []
        << dataColumn "x" (nums [ 1, 2, 3 ])
        << dataColumn "y" (maybeNums [ Just 10, Just 30, Nothing ])

This would be the most idiomatic Elm (and presumably easily translatable into Haskell), but is rather verbose. The question is whether the benefits of this make it worthwhile given we can always replicate with direct JSON encoding.

@DougBurke
Copy link
Contributor Author

In Haskell 0 / 0 becomes NaN, but I don't really like using this as an out-of-band value since it doesn't work for non-numeric types (so strings, date times, and bools in Vega-Lite land), and idiomatic Haskell would be to use the Maybe type.

For DataValue, adding an explicit NullValue (or some other appropriate name) seems to work, but I haven't checked against the spec to see if it makes sense in all cases, or only with dataRow.

For dataColumn I think the maybeXXX style approach looks cleanest, but - as you say - is rather verbose.

It may well be an issue that's not worth addressing at this time. I don't have any feel for how people are using the API when they are creating the data: are they just using dataFromJson or using the dataFromRows/Columns approach?. The one "advantage" I can see to the second option is that (apart from making my life simpler when comparing the output to the specifications ;-) is that it is a bit more "type safe", in the sense that you know you are giving the API something with the correct structure, rather than an opaque JSON value. However, I'm not sure that it's really that much of an advantage (as I can still create data which doesn't make sense to plot with the dataFromRows/Columns options).

@DougBurke
Copy link
Contributor Author

Here's another example that I don't think we can encode with the current DataValue/dataRow setup: https://vega.github.io/vega-lite/examples/facet_bullet.html - namely rows like

      {"title":"Profit", "subtitle":"%", "ranges":[20,25,30],"measures":[21,23],"markers":[26]},

I ended up using dataFromJson to encode this.

While you could extend DataValue with a constructor that takes DataValues, I'm not sure that's a sensible change.

So this does suggest that it's not worth changing the current setup, and just point users to dataFromJson for complex scenarios.

@jwoLondon
Copy link
Member

I've added a nullValue function (backed by a NullValue constructor) as a convenience for dataFromRows specification. For more complex cases, I've updated the API docs to suggest using dataFromJson. While this does violate the 'make impossible states impossible' principle, it probably makes things a little easier overall when null data values are required in data sources.

This also now allows impute, viaimNewValue to convert any arbitrary data value to an explicit nullValue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants