-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exported HTML file is too large #15
Comments
Hi @emeli-dral, I tested the package on one of the datasets that I am using for a tutorial that I am preparing for my blog. the training set has a shape of (243728, 20) and the validation set has (91493, 20). I found that the generated HTML file could take up to 4 min to open in the browser when the report contains DriftTab only and it crashed when I added another tab on the report. I am using a MacBook Pro with 16GB of RAM. the browser is chrome. Overall I think the package will be very useful once optimised. this is something I have been looking for. by @gakuba |
Thanks @gakuba, we are aware of this issue! It happens because in dev version we store all the data inside the HTML. Unfortunately, since we have to prioritize our resources, it will not be fixed in the dev version. But we will take care of it in the production release. There we will have the service version of our dashboard and it will reduce the time of opening drastically. For now I can suggest to use some sampling strategy for your dataset, for instance random sapling or stratified sampling. If you reduce the size of your dataset it will help to reduce the size of final HTML and the time of the dashboard opening in browser. Also, we will release JSON version of dashboard soon, this will be significantly smaller than an HTML one, may be you could use it as well. We understand this limits how you can use the tool now, and are working hard to get to the full-scope version! Hope the suggested steps will help to test it. |
Thanks, @emeli-dral. the idea of having a dashboard will definitely be the ultimate solution that will see many people adopting this library. With a dashboard, I am thinking of something like Shapash. https://shapash.readthedocs.io/en/latest/index.html. |
a similar library as well https://explainerdashboard.readthedocs.io/en/latest/ would be keen to see the file size drop down as currently I can't really use it on the big datasets that I have |
Thank you for pointing at Shapash! We share the approach of having an application and are going to implement something like |
Hi @yanhong-zhao-ef , thank you for the example! The second step will be implementation of the Dashboard application, so that you could work with reports in browser. |
hey, @emeli-dral has this been fixed yet? Just checking back to see if I can use evidently on my datasets now :) |
Hi @yanhong-zhao-ef ! Thanks for checking in. We released two features you might like:
Let me know if you try, happy to help. |
Hello everyone, a little bit late to the party. @emeli-dral My aim is to use Evidently in order to calculate Drift, Performance etc.. If I understood correctly you are suggesting to avoid the HTML generation since it will store every data inside the report (that is also not completly secure) and will break html visualization. Any other advices? |
Having a flag to not store the data within the report would be great:) |
Let me share the current state of this problem. |
Would you ever plan to support an alternative plotting backend such as matplotlib instead of Plotly? We have training sets with 1M+ rows and for this scale of data interactivity is not worth the performance hit and size of files... |
Hi @SamRodkey, Right now, we are working on an alternative visualization option for large datasets: it will still use Plotly, but generate aggregated plots without retaining the data inside them. This will make the HTML more lightweight. The implementation is already in progress, but it will take some time to roll it out for all Metrics. For now, there are a few workarounds available:
Hope any of these works for you! |
Hi @gakuba, @yanhong-zhao-ef, @MattiaGallegati, @dvirginz, @SamRodkey, we just released the lightweight Evidently reports with aggregated visuals: https://github.com/evidentlyai/evidently/releases/tag/v0.3.2 By default, plots are now aggregated which makes the results HTML smaller. If you want to turn the old version on (with non-aggregated visuals), you can set the render option "raw data" as True. Let us know if this helps to address the issue! |
Thank you for the update. I will have a look at it.
Cheers,
…On Sat, 20 May 2023, 02:37 elenasamuylova, ***@***.***> wrote:
Hi @gakuba <https://github.com/gakuba>, @yanhong-zhao-ef
<https://github.com/yanhong-zhao-ef>, @MattiaGallegati
<https://github.com/MattiaGallegati>, @dvirginz
<https://github.com/dvirginz>, @SamRodkey <https://github.com/SamRodkey>,
we just released the lightweight Evidently reports with aggregated visuals:
https://github.com/evidentlyai/evidently/releases/tag/v0.3.2
By default, plots are now aggregated which makes the results HTML smaller.
If you want to turn the old version on (with non-aggregated visuals), you
can set the render option "raw data" as True.
Docs:
https://docs.evidentlyai.com/user-guide/customization/report-data-aggregation
Let us know if this helps to address the issue!
—
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBPMQWMJZIRIY26SV4YQALXG6OTHANCNFSM42OHHDAA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
No description provided.
The text was updated successfully, but these errors were encountered: