New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
table view struggles with too many rows #1489
Comments
See https://github.com/JuliaComputing/TableView.jl for how that could look (module all the WebIO weirdness, of course). I'm not at all familiar with this side of the tables ecosystem though, unfortuntately. A Tables.jl based solution seems like it would be the most generic, but that would probably pull in too many dependencies. |
Yes, that is exactly so :) We need to change the implementation to a lazy one. The key there will be to not have a direct communication channel between the webview and the Julia process, but instead have any communication hop via the extension, otherwise things won't work in the remote scenarios. One other thing I've been thinking about for these lazy scenarios: I think we should probably make a complete copy of the data that we want to view in-memory in the Julia process and then have the web view poll that copy for the lazy updates. Otherwise we would have to deal with a situation where someone opens say a It would of course be even nicer if we found some way that such a lazy view would even work if the Julia REPL process is killed or blocked... I think medium to long term one solution might be that we serialize the whole table into an arrow buffer in the Julia process, get that somehow into the extension process (I think that could be made semi fast, even for very large tables) and then have the lazy part just operate between the webview and the full data copy in the extension. But that would require a lot more stuff than we have, for example a low dependency arrow writer, which is not on the horizon, as far as I can tell...
The current table viewer is based on the https://github.com/queryverse/TableTraits.jl interface, which brings almost no dependency in, and I think most Tables.jl sources should fulfill that interface as well, so that seems to easiest way here, I think. |
Nice! I see you are already on top of that issue. There might be another difficulty in store about data integrity: tables could represent data stored on disk (or network) and lazily loaded into memory. In that case, you probably should not take a snapshot of the full table, so I don't know how you could guarantee that what is displayed is consistent with the state of the (full) table when the display command was issued. Besides, it seems like a typical problem that is faced by multi-user database software. Maybe there are ideas to take from there? Personnally, I'd be happy with not lazy loading any data while the REPL process is busy and a basic refresh button that is only available when the REPL is idle. The idea would be that unless you just pressed that button (and you know that no one else is playing with the data source if it is stored on disk/network), you can't guarantee that what you see is the current content. Cherry on the cake would be some visual indication whenever the displayed table is known to be dirty. Anyway, thanks for the great work, whatever time it takes! |
Should be fixed in the next version of the extension. |
The table view does not seem to open when trying to view a large table (e.g. millions of rows with a few dozen columns). It seems progressively slower to open when increasing the number of rows until roughly 1 million where it seems to hang indefinitely (or maybe I am just not patient enough).
The reason is likely that the extension is trying to load the table as a whole when a somewhat lazy solution should be preferred for very large arrays/tables.
The text was updated successfully, but these errors were encountered: