Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2o.pd_plot efficiency improvements #7020

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 4 comments
Closed

h2o.pd_plot efficiency improvements #7020

exalate-issue-sync bot opened this issue May 11, 2023 · 4 comments
Assignees

Comments

@exalate-issue-sync
Copy link

I was trying to run {{h2o.pd_plot}} on some pretty big data ({{newdata}} was about 45 mil. rows) and was encountering long runtimes and very large ggplot objects. Reviewing the code I have a few suggestions:

it appeared that the {{rug_data[["text"]]}} variable is not used, so maybe that step could be removed: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1487|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1487|smart-link]

This line brings the data into R using {{as.data.frame}}: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1486|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1486|smart-link] but it is also brought in again in this line: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1497|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1497|smart-link] I think that first object could be re-used.

The rug part of the plot seems to be very expensive to produce. Could that be made optional? or perhaps only displayed under certain circumstances, for example {{row_index > -1}}? I found that just the rug component increased my plot size from 179MB to 712MB. And the saved .pdf version of the plot increased from 7KB to 59,700KB. With millions of rows, the rug plot ends up just being a virtually solid black bar.

Thanks!

@exalate-issue-sync
Copy link
Author

Tomas Fryda commented: [~accountid:5de85a8c7095a40d1253d65c] Thank you for the investigation and this report!

In the meanwhile, you might want to use [https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.partialPlot.html|https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.partialPlot.html|smart-link] which doesn’t produce the rug part of the plot and is more customizable (but doesn’t use {{ggplot2}}).

@exalate-issue-sync
Copy link
Author

Tomas Fryda commented: [~accountid:5de85a8c7095a40d1253d65c] I forgot to mention this earlier about the {{rug_data[["text"]]}} - it can be used depending on how you decide to plot the resulting object. For example with small data and {{plotly}} you can do something like {{plotly::ggplotly(h2o.pd_plot(...))}} which will make the plot interactive and the {{text}} is used on hover. So the {{rug_data[[”text”]]}}will still get generated in the new version but only if you don’t turn off the rug.

This change should end up in the next fix release - 3.36.1.2.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8673
Assignee: Tomas Fryda
Reporter: Paul Donnelly
State: Resolved
Fix Version: 3.36.1.2
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#6181

@h2o-ops h2o-ops closed this as completed May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants