Exported HTML file is too large #15

emeli-dral · 2021-04-06T08:11:25Z

No description provided.

emeli-dral · 2021-04-06T08:12:17Z

Hi @emeli-dral, I tested the package on one of the datasets that I am using for a tutorial that I am preparing for my blog. the training set has a shape of (243728, 20) and the validation set has (91493, 20). I found that the generated HTML file could take up to 4 min to open in the browser when the report contains DriftTab only and it crashed when I added another tab on the report. I am using a MacBook Pro with 16GB of RAM. the browser is chrome. Overall I think the package will be very useful once optimised. this is something I have been looking for.

by @gakuba

emeli-dral · 2021-04-06T08:30:02Z

Thanks @gakuba, we are aware of this issue!

It happens because in dev version we store all the data inside the HTML. Unfortunately, since we have to prioritize our resources, it will not be fixed in the dev version. But we will take care of it in the production release. There we will have the service version of our dashboard and it will reduce the time of opening drastically.

For now I can suggest to use some sampling strategy for your dataset, for instance random sapling or stratified sampling. If you reduce the size of your dataset it will help to reduce the size of final HTML and the time of the dashboard opening in browser. Also, we will release JSON version of dashboard soon, this will be significantly smaller than an HTML one, may be you could use it as well.

We understand this limits how you can use the tool now, and are working hard to get to the full-scope version! Hope the suggested steps will help to test it.

gakuba · 2021-04-08T15:25:21Z

Thanks, @emeli-dral. the idea of having a dashboard will definitely be the ultimate solution that will see many people adopting this library. With a dashboard, I am thinking of something like Shapash. https://shapash.readthedocs.io/en/latest/index.html.
It makes it easier to run it. of course Shapash is for a different purpose but having the ability to span a web server using the library as Shapasha does, will be the way to go in my view.

yanhong-zhao-ef · 2021-04-13T13:31:10Z

a similar library as well https://explainerdashboard.readthedocs.io/en/latest/

would be keen to see the file size drop down as currently I can't really use it on the big datasets that I have

emeli-dral · 2021-04-13T14:01:36Z

Thanks, @emeli-dral. the idea of having a dashboard will definitely be the ultimate solution that will see many people adopting this library. With a dashboard, I am thinking of something like Shapash. https://shapash.readthedocs.io/en/latest/index.html.
It makes it easier to run it. of course Shapash is for a different purpose but having the ability to span a web server using the library as Shapasha does, will be the way to go in my view.

Thank you for pointing at Shapash! We share the approach of having an application and are going to implement something like Dashboard.run() for our reports.

emeli-dral · 2021-04-13T14:14:35Z

Hi @yanhong-zhao-ef , thank you for the example!
We are solving this right now. Firstly we will implement nice short metrics summary in JSON format. Hopefully, it will help easily use our tool with larger detests, automate reports generation and integrate the tool with others for visualisation or monitoring.

The second step will be implementation of the Dashboard application, so that you could work with reports in browser.

yanhong-zhao-ef · 2021-06-23T13:34:16Z

hey, @emeli-dral has this been fixed yet? Just checking back to see if I can use evidently on my datasets now :)

emeli-dral · 2021-06-24T11:17:50Z

Hi @yanhong-zhao-ef !

Thanks for checking in.

We released two features you might like:

JSON profiles. You can now generate the text summary of the report (e.g. metrics and statistical test results). This is a lightweight option and you can send the results to other dashboarding tools if you prefer. You can see how it looks for each report in the docs: https://docs.evidentlyai.com
Sampling. If you have a large dataset, you can configure random sampling or choose the n-th row. This will reduce the report size!

Let me know if you try, happy to help.

MattiaGallegati · 2022-05-28T14:15:19Z

Hello everyone, a little bit late to the party.

@emeli-dral
First of all, nice job, this tool looks like very promising for the future.

My aim is to use Evidently in order to calculate Drift, Performance etc..
The thing is, since I will try it on a "production-like" environment where the dataset can possibly be "large" (up to 10-20GB for training, with 20-30 columns, and maybe 5% of this ammount for comparison) I would like to know if there are any "known" performance limits on the features provided by Evidently.

If I understood correctly you are suggesting to avoid the HTML generation since it will store every data inside the report (that is also not completly secure) and will break html visualization.
What about JSON profiles? Will it break on this ammount of data?
Do you believe it will compute the drift in an "acceptable" time? (up to 1-2 hours is acceptable).

Any other advices?
Thank you.

dvirginz · 2022-06-23T06:13:17Z

Having a flag to not store the data within the report would be great:)

emeli-dral · 2023-02-08T13:33:14Z

Let me share the current state of this problem.
1/ We have split analyzers into individual metrics. Now we have metrics, which need to store the raw data to be visualized in the html, and metrics, which uses aggregated data only. Practically it means that now if one creates a report from metrics, which uses only aggregated data, the resulting file will be comparably small even for large datasets.
2/ We will update most metrics that store the raw data with the alternative visualisation on top of the aggregated data. The one using metrics will be able to choose which type of visualization to use. And we will update all presets to use only metrics with the option to use aggregated data only. This is still a work in progress.
3/ TestSuits already have lighter html since no raw data is stored there.

SamRodkey · 2023-04-03T15:07:22Z

Would you ever plan to support an alternative plotting backend such as matplotlib instead of Plotly? We have training sets with 1M+ rows and for this scale of data interactivity is not worth the performance hit and size of files...

elenasamuylova · 2023-05-08T11:45:06Z

Hi @SamRodkey,

Right now, we are working on an alternative visualization option for large datasets: it will still use Plotly, but generate aggregated plots without retaining the data inside them. This will make the HTML more lightweight.

The implementation is already in progress, but it will take some time to roll it out for all Metrics.

For now, there are a few workarounds available:

Sampling data before passing it to Reports.
Using Test Suites instead of Reports (they have more lightweight visualizations).
Generating the Evidently output as JSON / Python dictionary and visualizing it externally.

Hope any of these works for you!

elenasamuylova · 2023-05-19T16:36:55Z

Hi @gakuba, @yanhong-zhao-ef, @MattiaGallegati, @dvirginz, @SamRodkey, we just released the lightweight Evidently reports with aggregated visuals: https://github.com/evidentlyai/evidently/releases/tag/v0.3.2

By default, plots are now aggregated which makes the results HTML smaller.

If you want to turn the old version on (with non-aggregated visuals), you can set the render option "raw data" as True.
Docs: https://docs.evidentlyai.com/user-guide/customization/report-data-aggregation

Let us know if this helps to address the issue!

gakuba · 2023-05-19T22:45:13Z

Thank you for the update. I will have a look at it. Cheers,

…

On Sat, 20 May 2023, 02:37 elenasamuylova, ***@***.***> wrote: Hi @gakuba <https://github.com/gakuba>, @yanhong-zhao-ef <https://github.com/yanhong-zhao-ef>, @MattiaGallegati <https://github.com/MattiaGallegati>, @dvirginz <https://github.com/dvirginz>, @SamRodkey <https://github.com/SamRodkey>, we just released the lightweight Evidently reports with aggregated visuals: https://github.com/evidentlyai/evidently/releases/tag/v0.3.2 By default, plots are now aggregated which makes the results HTML smaller. If you want to turn the old version on (with non-aggregated visuals), you can set the render option "raw data" as True. Docs: https://docs.evidentlyai.com/user-guide/customization/report-data-aggregation Let us know if this helps to address the issue! — Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBPMQWMJZIRIY26SV4YQALXG6OTHANCNFSM42OHHDAA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

emeli-dral mentioned this issue Apr 6, 2021

A friendly comment #13

Closed

emeli-dral added the bug Something isn't working label Apr 6, 2021

emeli-dral added the backlog This request or bug is added to the backlog label Feb 8, 2023

emeli-dral added enhancement New feature or request and removed bug Something isn't working labels Feb 8, 2023

emeli-dral closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exported HTML file is too large #15

Exported HTML file is too large #15

emeli-dral commented Apr 6, 2021

emeli-dral commented Apr 6, 2021

emeli-dral commented Apr 6, 2021

gakuba commented Apr 8, 2021

yanhong-zhao-ef commented Apr 13, 2021

emeli-dral commented Apr 13, 2021

emeli-dral commented Apr 13, 2021

yanhong-zhao-ef commented Jun 23, 2021

emeli-dral commented Jun 24, 2021

MattiaGallegati commented May 28, 2022

dvirginz commented Jun 23, 2022

emeli-dral commented Feb 8, 2023

SamRodkey commented Apr 3, 2023

elenasamuylova commented May 8, 2023

elenasamuylova commented May 19, 2023

gakuba commented May 19, 2023 via email

Exported HTML file is too large #15

Exported HTML file is too large #15

Comments

emeli-dral commented Apr 6, 2021

emeli-dral commented Apr 6, 2021

emeli-dral commented Apr 6, 2021

gakuba commented Apr 8, 2021

yanhong-zhao-ef commented Apr 13, 2021

emeli-dral commented Apr 13, 2021

emeli-dral commented Apr 13, 2021

yanhong-zhao-ef commented Jun 23, 2021

emeli-dral commented Jun 24, 2021

MattiaGallegati commented May 28, 2022

dvirginz commented Jun 23, 2022

emeli-dral commented Feb 8, 2023

SamRodkey commented Apr 3, 2023

elenasamuylova commented May 8, 2023

elenasamuylova commented May 19, 2023

gakuba commented May 19, 2023 via email