-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative to convert h2oframe to pandas frame using datatable for speedup. #15614
Comments
customer support ticket: https://support.h2o.ai/helpdesk/tickets/105694 |
wendycwong
added a commit
that referenced
this issue
Jun 30, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup.
wendycwong
added a commit
that referenced
this issue
Jul 5, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. fixed typos found by Marek and Megan. added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion. Remove conversion using datatable. It changes the integer type to int32 instead of supporting int64.
wendycwong
added a commit
that referenced
this issue
Jul 7, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. fixed typos found by Marek and Megan. added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion. Remove conversion using datatable. It changes the integer type to int32 instead of supporting int64. Clean up unused code relating to datatable. add removal of temp file and temp directory for clean up. Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai> add two dataset to test. Remove test using datatable.
wendycwong
added a commit
that referenced
this issue
Jul 13, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. fixed typos found by Marek and Megan. added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion. Remove conversion using datatable. It changes the integer type to int32 instead of supporting int64. Clean up unused code relating to datatable. add removal of temp file and temp directory for clean up. Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai> add two dataset to test. Remove test using datatable. Add pyarrow module import. quit if pandas version is too old
wendycwong
added a commit
that referenced
this issue
Jul 14, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. (#15621) fixed typos found by Marek and Megan. added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion. Remove conversion using datatable. It changes the integer type to int32 instead of supporting int64. Clean up unused code relating to datatable. add removal of temp file and temp directory for clean up. Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai> add two dataset to test. Remove test using datatable. Add pyarrow module import. quit if pandas version is too old
Merged. |
maurever
pushed a commit
that referenced
this issue
Jul 24, 2023
…Bernard Ong. Added Python test to verify h2o frame to pandas transformation speedup. (#15621) fixed typos found by Marek and Megan. added polars as an alternative multi-thread conversion of h2o frame to pandas per Tomas Fryda suggestion. Remove conversion using datatable. It changes the integer type to int32 instead of supporting int64. Clean up unused code relating to datatable. add removal of temp file and temp directory for clean up. Update h2o-py/tests/testdir_misc/pyunit_gh_15614_polars_2_pandas.py Co-authored-by: Tomáš Frýda <tomas.fryda@h2o.ai> add two dataset to test. Remove test using datatable. Add pyarrow module import. quit if pandas version is too old
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If the user has datatable and tempfile installed as part of their python package, the conversion from h2o frame to pandas frame can be done at least two times faster testing on a 2GB dataset using Megan Kurka's workaround suggestion. Here is the code snippet from Bernard Ong and results:
The text was updated successfully, but these errors were encountered: