-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
report now showing when using a bit more data #47
Comments
Hey @rmminusrslash, The report is large because the tool stores all the data necessary to generate interactive plots directly inside the HTML. We plan to fix it when we create a service version of the tool (where we decouple the data storage and the browser-based web service). For now there are two workarounds:
We understand this limits how you can use the tool now, and are working hard to get to the more feature-full version! |
Hey @emeli-dral, ah, I probably should have been more clear about what I was asking. I tried sampling when I figured out the root cause, up to 10K datapoints worked. Would it make sense to
The current behavior of failing silently might not be ideal until you release the full version (unless you expect people to try the tool mostly with toy data) |
Hey @rmminusrslash , We thought about adding an error message based on data size. But the limit would depend on the user infrastructure especially if used locally, so it would be hard to set a universal threshold when sampling should be applied. And as a priority, we are also working right now to speed up the UI which should solve part of cases when reports are too large to display. Hopefully, it will help a lot 🤞 We are thinking about adding a flag later that the user can set on their own ("large dataset") which would then generate a variation of report that is best suited for larger datasets. It will include not only sampling but a different aggregated views for some parts of the report. Agree on your comment of making the limitation for large datasets and sampling option even more clear for Jupyter notebook: we already added this now to the Quick-start part of the docs. |
Now reports by default do not use any raw data plots and this reduces reports size significantly |
Hey,
I wanted to run against a production dataset of small-mid size:
65 columns, 150K points in each dataframe.
If I reduce the dataset to one feature, the report shows. If I use all features, the report goes from 16MB to 600 MB and is not displaying (saved or in jupyter).
The text was updated successfully, but these errors were encountered: