Add Rule of Thumb for Data Conversion #5509

exalate-issue-sync · 2023-05-22T20:06:43Z

We should add the following Rule of Thumb to the Data Sharing section of Sparkling Water
[http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/design/data_sharing.html|http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/design/data_sharing.html]

h3. Memory Consideration When Converting Between Data Frames Types

When Using Sparkling Water External Backend:

If you have allocated the recommended memory amount to your H2O cluster (4 x the size of your dataset), you don't need to worry about memory constraints when converting between a Spark DataFrame and an H2OFrame; there is no collision with Spark storage.

Note: the 4 x the size of your dataset assumes your dataset is represented as a CSV. If your dataset is represented as JSON, XML or parquet, the requirements may differ significantly.

When Using Sparkling Water Internal Backend:

In internal backend mode H2O-3 shares the JVM with Spark executors. In this case, you will want to allocate enough memory to run Spark transformations on your DataFrame (which means allocating a minimum memory of your dataset and memory for those transformations), plus allocate an additional 4 x the size of your dataset.

Note: there is data duplication when you convert between a Spark DataFrame and an H2Oframe (though H2O uses compression tricks to help reduce the memory requirements for this conversion); there is no data duplication when you convert between an H2OFrame and a Spark DataFrame because Sparkling Water uses a wrapper around the H2OFrame, which uses the RDD/DataFrame API.

DinukaH2O · 2023-05-23T13:07:58Z

JIRA Issue Migration Info

Jira Issue: SW-1581
Assignee: Jakub Hava
Reporter: Lauren DiPerna
State: Resolved
Fix Version: 3.26.5
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#1517

hasithjp · 2023-05-29T15:51:21Z

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2019-08-29T12:20:26.521-0700

DinukaH2O assigned jakubhava May 23, 2023

DinukaH2O closed this as completed May 23, 2023

DinukaH2O added the fixVersion/3.26.5 label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Rule of Thumb for Data Conversion #5509

Add Rule of Thumb for Data Conversion #5509

exalate-issue-sync bot commented May 22, 2023

DinukaH2O commented May 23, 2023

hasithjp commented May 29, 2023

Add Rule of Thumb for Data Conversion #5509

Add Rule of Thumb for Data Conversion #5509

Comments

exalate-issue-sync bot commented May 22, 2023

DinukaH2O commented May 23, 2023

hasithjp commented May 29, 2023