Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row sliced frame (AstRowSlice) shared domain with original vectors - results in unique() error #7916

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments

Comments

@exalate-issue-sync
Copy link

H2O calculates unique() values on categorical vectors simply by copying the domain of the vector(+ adding NA, this is currently WIP). If the column is a result of AstRowSlice operation, the domain is WRONG, as it contains domain of the original vector.

{code:python}
# make sure domains are recalculated with each temp assign
df_example = h2o.H2OFrame({'time': ['M','M','M','D','D','M','M','D'],
'amount': [1,4,5,0,0,1,3,0]})

df_example['amount'] = df_example['amount'].asfactor()
filtered = df_example[df_example['time']=='D', 'amount']
uniques = filtered['amount'].unique()
assert len(uniques) == 1
assert uniques.as_data_frame().iat[0,0] == 0

{code}

One solution is to reduce the domain (discussion: #4848 (comment)) when the rows are sliced - controversial step. Second solution is to adjust the domain in-place after the unique() operation is done.

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: The domain should remain as-is, as this is desired behavior. The domain has to be collected in-place when doing the unique() operation. Being resolved as a part of [https://0xdata.atlassian.net/browse/HEXDEV-762|https://0xdata.atlassian.net/browse/HEXDEV-762|smart-link] .

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7723
Assignee: Pavel Pscheidl
Reporter: Pavel Pscheidl
State: Resolved
Fix Version: 3.30.1.2
Attachments: N/A
Development PRs: N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant