Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update as factor procedure documentation to show multiple column use #7309

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Closed
Assignees

Comments

@exalate-issue-sync
Copy link

Update “asfactor” function header comments to:

{noformat}"""
Convert column/columns in the current frame to categoricals.

:returns: new H2OFrame with columns of the "enum" type.

#Single column
df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
df['cylinders'] = df['cylinders'].asfactor()
df['cylinders'].describe()

#Multiple columns
df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
df[['cylinders','economy_20mpg']] = df[['cylinders','economy_20mpg']].asfactor()
df[['cylinders','economy_20mpg']].describe()
"""
{noformat}

Modify documentation at the following location [https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/change-column-type.html|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/change-column-type.html|smart-link] as following:

 

{noformat}import h2o
h2o.init()

import the cars dataset:

cars_df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")

#df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
#df[['cylinders','economy_20mpg']] = df[['cylinders','economy_20mpg']].asfactor()
#df[['cylinders','economy_20mpg']].describe()
#boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

check the column type for the cylinders column

print(cars_df["cylinders"].isnumeric())
#[True]

change the column type to a factor

cars_df['cylinders'] = cars_df['cylinders'].asfactor()

verify that the column is now a factor

print(cars_df["cylinders"].isfactor())
#[True]

change the column type back to numeric

cars_df["cylinders"] = cars_df["cylinders"].asnumeric()

verify that the column is numeric and not a factor

print(cars_df["cylinders"].isfactor())
#[False]
print(cars_df["cylinders"].isnumeric())
#[True]

#Change multiple columns to a factor
cars_df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
cars_df[['cylinders','economy_20mpg']] = cars_df[['cylinders','economy_20mpg']].asfactor()

verify that the column is now a factor

print(cars_df["cylinders"].isfactor())
print(cars_df["economy_20mpg"].isfactor()){noformat}

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8348
Assignee: hannah.tillman
Reporter: Uri Smashnov
State: Resolved
Fix Version: 3.34.0.3
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#5786

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants