Allow inserting named dataset so it could be referenced by name #106

lucywang000 · 2020-04-03T07:27:20Z

problem

When I try to plot some pretty large dataset with oz, the delay of sending the data to the browser with oz/view! until it is rendered could be 3~5 seconds sometimes, which is not good.

analysis

The time includes:

json/transit serialization in server
network delay
json/transit deserialization in client
canvas rendering

If we could cache the data on client side, then the first three could be avoided all at once.

possible solution

According to the vega doc, vega supports three types of data:

inline json object
urls of data in supported format, i.e. csv/json/etc.
named data

The first two works out of the box, the last one need to be added through the vega view instance, for instance in the doc it recommends to do this:

vegaEmbed('#vis', spec).then(res =>
  res.view
    .insert('myData', [
      /* some data array */
    ])
    .run()
);

I think it'll be great if oz could support named data sources.

The vega view instance must be exposed somehow (this is also required in Ability to stream or update data in a viz #95)
add websocket commands in server/client so the server could send data+name to the client.

@metasoarous Does this sound good to you? If so I can start work on a pr for it.

The text was updated successfully, but these errors were encountered:

lucywang000 · 2020-04-03T09:41:46Z

I played with it a bit, but it looks like the view instance is recreated each time the spec is changed, so the solution I proposed above may not work.

metasoarous · 2020-11-16T07:24:11Z

Hi @lucywang000. Thanks so much for submitting this issue, and sorry for taking a while to get back to you.

I'm very happy that you've been thinking about this problem! I have as well, though not in any focused way (occasional musing is perhaps a better description).

I think the really slick thing to do here would be to traverse the specs, find the data elements are (keeping in mind layers might have their own data, and that hiccup docs might have multiple specs), checking to see if they've changed since last time you've updated the spec, and doing something smart that prevents you from having to resend the data (since it seems that what we're talking about here is how to tweak the visualization without having to resend all the data). This may be super challenging to orchestrate, but perhaps not impossible, and would make usage with larger datasets much nicer, as you suggest.

Something a little less automated, and maybe closer to what you had in mind, would be to more explicitly allow datasets to be specified separately from the rest of the specification in such a way prevents such data from having to be resent when visualizations (e.g.) update. This is closely related to what I had in mind for #9, but could maybe look different as well (as you mention, using a separate command for updating the data than the view(s) of it).

A few things to sus out though before we spend too much time on this:

In these cases, is sending the data the bulk of the time or is it rerendering?
If the problem is render-time, would something like the web-gl renderer help us?

I think you're also right that using the view object won't work as that gets recreated. But if sending the data is the problem, not re-rendering, then I think it might be possible to store datasets in a separate r/atom, and feed them into the visualizations from there.

For posterity's sake, this also seems to relate to #26 somewhat.

Please let me know if you have additional thoughts on this. Happy to bounce ideas around further. This isn't super high priority for me at the moment, but if you're still keen to crack this nut, I'd love the help!

Either way, thanks again for thinking about this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow inserting named dataset so it could be referenced by name #106

Allow inserting named dataset so it could be referenced by name #106

lucywang000 commented Apr 3, 2020 •

edited

Loading

lucywang000 commented Apr 3, 2020

metasoarous commented Nov 16, 2020

Allow inserting named dataset so it could be referenced by name #106

Allow inserting named dataset so it could be referenced by name #106

Comments

lucywang000 commented Apr 3, 2020 • edited Loading

problem

analysis

possible solution

lucywang000 commented Apr 3, 2020

metasoarous commented Nov 16, 2020

lucywang000 commented Apr 3, 2020 •

edited

Loading