Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept H2OFrame as input to H2OFrame itself #15887

Closed
bkowshik opened this issue Oct 30, 2023 · 2 comments · Fixed by #15898
Closed

Accept H2OFrame as input to H2OFrame itself #15887

bkowshik opened this issue Oct 30, 2023 · 2 comments · Fixed by #15898
Assignees
Labels
Milestone

Comments

@bkowshik
Copy link

Came across this by accident where I assumed a DataFrame was a Pandas DataFrame and I wanted to convert to a H2O DataFrame. But, since it was already a H2O DataFrame it failed instead.

H2O 3.44.0.1 on Kaggle notebooks

Actual behavior

---------------------------------------------------------------------------
H2OTypeError                              Traceback (most recent call last)
Cell In[9], line 2
      1 train_df = h2o.import_file('/kaggle/input/playground-series-s3e24/train.csv')
----> 2 h2o.H2OFrame(train_df)

File /opt/conda/lib/python3.10/site-packages/h2o/frame.py:97, in H2OFrame.__init__(self, python_obj, destination_frame, header, separator, column_names, column_types, na_strings, skipped_columns, force_col_types)
     92 def __init__(self, python_obj=None, destination_frame=None, header=0, separator=",",
     93              column_names=None, column_types=None, na_strings=None, skipped_columns=None, force_col_types=False):
     95     coltype = U(None, "unknown", "uuid", "string", "float", "real", "double", "int", "long", "numeric",
     96                 "categorical", "factor", "enum", "time")
---> 97     assert_is_type(python_obj, None, list, tuple, dict, numpy_ndarray, pandas_dataframe, scipy_sparse)
     98     assert_is_type(destination_frame, None, str)
     99     assert_is_type(header, -1, 0, 1)

File /opt/conda/lib/python3.10/site-packages/h2o/utils/typechecks.py:444, in assert_is_type(var, *types, **kwargs)
    442 etn = _get_type_name(expected_type, dump=", ".join(args[1:]))
    443 vtn = _get_type_name(type(var))
--> 444 raise H2OTypeError(var_name=vname, var_value=var, var_type_name=vtn, exp_type_name=etn, message=message,
    445                    skip_frames=skip_frames)

H2OTypeError: Argument `python_obj` should be a None | list | tuple | dict | numpy.ndarray | pandas.DataFrame | scipy.sparse.issparse, got H2OFrame

Expected behavior

Converting a Pandas DataFrame, to a H2O DataFrame.

train_df = pd.read_csv('/kaggle/input/playground-series-s3e24/train.csv')
h2o.H2OFrame(train_df)

The following code snippet should work too without throwing an error.

train_df = h2o.import_file('/kaggle/input/playground-series-s3e24/train.csv')
h2o.H2OFrame(train_df)
@bkowshik bkowshik added the bug label Oct 30, 2023
@wendycwong wendycwong assigned wendycwong and sebhrusen and unassigned wendycwong Oct 30, 2023
@sebhrusen
Copy link
Contributor

sebhrusen commented Oct 31, 2023

precising expected behaviour: the wrapped frame should share same data with the original one, but have a different id, this means that they are represented by different objects on the backend—we explicitely created a new frame using the Py client after all—, even if they may share the data content (no deep cloning).

fr_ori = h2o.import_file('h2o://iris')
fr_new = h2o.H2OFrame(fr_ori)

assert id(fr_ori) != id(fr_new)
assert fr_ori.key != fr_new.key
pd.testing.assert_frame_equal(fr_ori.as_data_frame(),  fr_new.as_data_frame())

@sebhrusen sebhrusen added this to the 3.44.0.2 milestone Oct 31, 2023
@sebhrusen sebhrusen modified the milestones: 3.44.0.2, 3.44.0.3 Nov 3, 2023
sebhrusen added a commit that referenced this issue Nov 22, 2023
#15898)

* generates a shallow copy when Py H2OFrame constructor is called with an existing H2OFrame

* add support for columns selection params

* added proper tests for H2OFrame constructor and discovered a bunch of bugs…

* cosmetics

* added references to new issues discovered when writing H2OFrame tests
@sebhrusen
Copy link
Contributor

will be in the next 3.44.0.3 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants