Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alternative to convert h2oframe to pandas frame using datatable for speedup. #15614

Closed
wendycwong opened this issue Jun 28, 2023 · 2 comments · Fixed by #15621
Closed

Add alternative to convert h2oframe to pandas frame using datatable for speedup. #15614

wendycwong opened this issue Jun 28, 2023 · 2 comments · Fixed by #15621
Assignees
Labels
feature Major Denote importance of issue to be fixed.
Milestone

Comments

@wendycwong
Copy link
Contributor

wendycwong commented Jun 28, 2023

If the user has datatable and tempfile installed as part of their python package, the conversion from h2o frame to pandas frame can be done at least two times faster testing on a 2GB dataset using Megan Kurka's workaround suggestion. Here is the code snippet from Bernard Ong and results:

image

@wendycwong wendycwong added feature Major Denote importance of issue to be fixed. labels Jun 28, 2023
@wendycwong
Copy link
Contributor Author

wendycwong commented Jun 28, 2023

customer support ticket: https://support.h2o.ai/helpdesk/tickets/105694

@wendycwong wendycwong self-assigned this Jun 30, 2023
@wendycwong wendycwong added this to the 3.42.0.2 milestone Jun 30, 2023
wendycwong added a commit that referenced this issue Jun 30, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup.
wendycwong added a commit that referenced this issue Jul 5, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup.

fixed typos found by Marek and Megan.
added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion.
Remove conversion using datatable.  It changes the integer type to int32 instead of supporting int64.
wendycwong added a commit that referenced this issue Jul 7, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup.

fixed typos found by Marek and Megan.
added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion.
Remove conversion using datatable.  It changes the integer type to int32 instead of supporting int64.
Clean up unused code relating to datatable.
add removal of temp file and temp directory for clean up.
Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py
Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai>
add two dataset to test.  Remove test using datatable.
wendycwong added a commit that referenced this issue Jul 13, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup.

fixed typos found by Marek and Megan.
added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion.
Remove conversion using datatable.  It changes the integer type to int32 instead of supporting int64.
Clean up unused code relating to datatable.
add removal of temp file and temp directory for clean up.
Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py
Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai>
add two dataset to test.  Remove test using datatable.
Add pyarrow module import.
quit if pandas version is too old
wendycwong added a commit that referenced this issue Jul 14, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. (#15621)

fixed typos found by Marek and Megan.
added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion.
Remove conversion using datatable.  It changes the integer type to int32 instead of supporting int64.
Clean up unused code relating to datatable.
add removal of temp file and temp directory for clean up.
Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py
Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai>
add two dataset to test.  Remove test using datatable.
Add pyarrow module import.
quit if pandas version is too old
@wendycwong
Copy link
Contributor Author

Merged.

maurever pushed a commit that referenced this issue Jul 24, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. (#15621)

fixed typos found by Marek and Megan.
added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion.
Remove conversion using datatable.  It changes the integer type to int32 instead of supporting int64.
Clean up unused code relating to datatable.
add removal of temp file and temp directory for clean up.
Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py
Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai>
add two dataset to test.  Remove test using datatable.
Add pyarrow module import.
quit if pandas version is too old
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Major Denote importance of issue to be fixed.
Projects
None yet
1 participant