table view struggles with too many rows #1489

kragol · 2020-07-15T22:46:46Z

The table view does not seem to open when trying to view a large table (e.g. millions of rows with a few dozen columns). It seems progressively slower to open when increasing the number of rows until roughly 1 million where it seems to hang indefinitely (or maybe I am just not patient enough).

The reason is likely that the extension is trying to load the table as a whole when a somewhat lazy solution should be preferred for very large arrays/tables.

pfitzseb · 2020-07-15T23:22:25Z

See https://github.com/JuliaComputing/TableView.jl for how that could look (module all the WebIO weirdness, of course). I'm not at all familiar with this side of the tables ecosystem though, unfortuntately. A Tables.jl based solution seems like it would be the most generic, but that would probably pull in too many dependencies.

davidanthoff · 2020-07-15T23:29:54Z

Yes, that is exactly so :) We need to change the implementation to a lazy one. The key there will be to not have a direct communication channel between the webview and the Julia process, but instead have any communication hop via the extension, otherwise things won't work in the remote scenarios.

One other thing I've been thinking about for these lazy scenarios: I think we should probably make a complete copy of the data that we want to view in-memory in the Julia process and then have the web view poll that copy for the lazy updates. Otherwise we would have to deal with a situation where someone opens say a DataFrame, and then edits the content of the data structure, while a grid of that data structure is visible, which would be a complete nightmare to handle in terms of race conditions. So the idea would be that if one calls vscodedisplay(x) on something, it will display a snapshot of x taken at that moment, always.

It would of course be even nicer if we found some way that such a lazy view would even work if the Julia REPL process is killed or blocked... I think medium to long term one solution might be that we serialize the whole table into an arrow buffer in the Julia process, get that somehow into the extension process (I think that could be made semi fast, even for very large tables) and then have the lazy part just operate between the webview and the full data copy in the extension. But that would require a lot more stuff than we have, for example a low dependency arrow writer, which is not on the horizon, as far as I can tell...

I'm not at all familiar with this side of the tables ecosystem though, unfortuntately. A Tables.jl based solution seems like it would be the most generic, but that would probably pull in too many dependencies.

The current table viewer is based on the https://github.com/queryverse/TableTraits.jl interface, which brings almost no dependency in, and I think most Tables.jl sources should fulfill that interface as well, so that seems to easiest way here, I think.

kragol · 2020-07-16T00:37:30Z

Nice! I see you are already on top of that issue.

There might be another difficulty in store about data integrity: tables could represent data stored on disk (or network) and lazily loaded into memory. In that case, you probably should not take a snapshot of the full table, so I don't know how you could guarantee that what is displayed is consistent with the state of the (full) table when the display command was issued. Besides, it seems like a typical problem that is faced by multi-user database software. Maybe there are ideas to take from there?

Personnally, I'd be happy with not lazy loading any data while the REPL process is busy and a basic refresh button that is only available when the REPL is idle. The idea would be that unless you just pressed that button (and you know that no one else is playing with the data source if it is stored on disk/network), you can't guarantee that what you see is the current content. Cherry on the cake would be some visual indication whenever the displayed table is known to be dirty.

Anyway, thanks for the great work, whatever time it takes!

pfitzseb · 2021-11-04T18:49:26Z

Should be fixed in the next version of the extension.

davidanthoff added the enhancement label Jul 15, 2020

davidanthoff added this to the Backlog milestone Jul 15, 2020

pfitzseb closed this as completed Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

table view struggles with too many rows #1489

table view struggles with too many rows #1489

kragol commented Jul 15, 2020

pfitzseb commented Jul 15, 2020

davidanthoff commented Jul 15, 2020

kragol commented Jul 16, 2020

pfitzseb commented Nov 4, 2021

table view struggles with too many rows #1489

table view struggles with too many rows #1489

Comments

kragol commented Jul 15, 2020

pfitzseb commented Jul 15, 2020

davidanthoff commented Jul 15, 2020

kragol commented Jul 16, 2020

pfitzseb commented Nov 4, 2021