Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that .asnumeric() will convert 'enum' columns to underlying and show right approach #7377

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Assignees

Comments

@exalate-issue-sync
Copy link

h1. +Add to Docs:+

If columns that are “enum” type want to be converted to “numeric” type, they should be converted to “character” first, then use “numeric”. Otherwise, the values may be converted to underlying factor values, not the expected mapped values.

Python Example:

{noformat}prostate["column"] = prostate["column"].ascharacter().asnumeric(){noformat}

R Example:

{noformat}prostate[, 2] <- as.character(prostate[, 2])
prostate[, 2] <- as.numeric(prostate[, 2]){noformat}

This behavior is similar to R datatables.

Sections to consider adding to:

[https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/frame.html?highlight=asnumeric#h2o.H2OFrame.asnumeric|https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/frame.html?highlight=asnumeric#h2o.H2OFrame.asnumeric|smart-link]

[https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/change-column-type.html?highlight=asnumeric|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/change-column-type.html?highlight=asnumeric|smart-link]

[https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/as.numeric.html|https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/as.numeric.html|smart-link]

[https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.asnumeric.html|https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.asnumeric.html|smart-link]

h1. Background

User found that .asnumeric() may behave differently than expected. The new “numeric” values are based on the underlying values (not the mapped values).

{noformat}df1 = pd.DataFrame([[1.1, '1.1%'], [2, '2'], [3, '3']])
hframe1 = h2o.H2OFrame(df1, column_types=['real', 'enum'])
hframe1{noformat}

Output:

||0||1||
|1.1|1.1%|
|2|2|
|3|3|

{noformat}hframe1["1"] = hframe1["1"].gsub(pattern="%", replacement="")
hframe1["1"] = hframe1["1"].trim()
hframe1{noformat}

||0||1||
|1.1|1.1|
|2|2|
|3|3|

{noformat}hframe1.asnumeric(){noformat}

||0||1||
|1.1|{color:#6554c0}0{color}|
|2|1|
|3|2|

^ value 1.1 was converted to 0 (and other values become underlying values). This is similar to how R works.

Related issue: [https://h2oai.atlassian.net/browse/PUBDEV-8272|https://h2oai.atlassian.net/browse/PUBDEV-8272|smart-link]

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8278
Assignee: hannah.tillman
Reporter: Neema Mashayekhi
State: Resolved
Fix Version: 3.34.0.4
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#5822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants