You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to run {{h2o.pd_plot}} on some pretty big data ({{newdata}} was about 45 mil. rows) and was encountering long runtimes and very large ggplot objects. Reviewing the code I have a few suggestions:
it appeared that the {{rug_data[["text"]]}} variable is not used, so maybe that step could be removed: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1487|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1487|smart-link]
This line brings the data into R using {{as.data.frame}}: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1486|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1486|smart-link] but it is also brought in again in this line: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1497|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1497|smart-link] I think that first object could be re-used.
The rug part of the plot seems to be very expensive to produce. Could that be made optional? or perhaps only displayed under certain circumstances, for example {{row_index > -1}}? I found that just the rug component increased my plot size from 179MB to 712MB. And the saved .pdf version of the plot increased from 7KB to 59,700KB. With millions of rows, the rug plot ends up just being a virtually solid black bar.
Thanks!
The text was updated successfully, but these errors were encountered:
Tomas Fryda commented: [~accountid:5de85a8c7095a40d1253d65c] Thank you for the investigation and this report!
In the meanwhile, you might want to use [https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.partialPlot.html|https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.partialPlot.html|smart-link] which doesn’t produce the rug part of the plot and is more customizable (but doesn’t use {{ggplot2}}).
Tomas Fryda commented: [~accountid:5de85a8c7095a40d1253d65c] I forgot to mention this earlier about the {{rug_data[["text"]]}} - it can be used depending on how you decide to plot the resulting object. For example with small data and {{plotly}} you can do something like {{plotly::ggplotly(h2o.pd_plot(...))}} which will make the plot interactive and the {{text}} is used on hover. So the {{rug_data[[”text”]]}}will still get generated in the new version but only if you don’t turn off the rug.
This change should end up in the next fix release - 3.36.1.2.
Jira Issue: PUBDEV-8673
Assignee: Tomas Fryda
Reporter: Paul Donnelly
State: Resolved
Fix Version: 3.36.1.2
Attachments: N/A
Development PRs: Available
I was trying to run {{h2o.pd_plot}} on some pretty big data ({{newdata}} was about 45 mil. rows) and was encountering long runtimes and very large ggplot objects. Reviewing the code I have a few suggestions:
it appeared that the {{rug_data[["text"]]}} variable is not used, so maybe that step could be removed: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1487|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1487|smart-link]
This line brings the data into R using {{as.data.frame}}: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1486|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1486|smart-link] but it is also brought in again in this line: [https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1497|https://github.com/h2oai/h2o-3/blob/5cd419cd7d3fd55c2828693639f64dce396713d2/h2o-r/h2o-package/R/explain.R#L1497|smart-link] I think that first object could be re-used.
The rug part of the plot seems to be very expensive to produce. Could that be made optional? or perhaps only displayed under certain circumstances, for example {{row_index > -1}}? I found that just the rug component increased my plot size from 179MB to 712MB. And the saved .pdf version of the plot increased from 7KB to 59,700KB. With millions of rows, the rug plot ends up just being a virtually solid black bar.
Thanks!
The text was updated successfully, but these errors were encountered: