Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Screeplot fails for PCA in R and Python #7483

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 3 comments
Closed

Screeplot fails for PCA in R and Python #7483

exalate-issue-sync bot opened this issue May 11, 2023 · 3 comments
Assignees

Comments

@exalate-issue-sync
Copy link

Python fails:

{code:python}from h2o.estimators import H2OPrincipalComponentAnalysisEstimator

Import the birds dataset into H2O:

birds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")

Split the dataset into a train and valid set:

train, valid = birds.split_frame(ratios = [.8], seed = 1234)

Build and train the model:

birds_pca = H2OPrincipalComponentAnalysisEstimator(k = 5,
use_all_factor_levels = True,
pca_method = "glrm",
transform = "standardize",
impute_missing = True)
birds_pca.train(training_frame = train)

birds_pca.screeplot()

{code}


KeyError Traceback (most recent call last)
in
----> 1 birds_pca.screeplot()

~/anaconda3/envs/py_36/lib/python3.6/site-packages/h2o/model/dim_reduction.py in screeplot(self, type, **kwargs)
104 """
105 # check for matplotlib. exit if absent.
--> 106 is_server = kwargs.pop("server")
107 if kwargs:
108 raise ValueError("Unknown arguments %s to screeplot()" % ", ".join(kwargs.keys()))

KeyError: 'server

R fails:

{code:r}h2o.varimp(birds_pca)

Warning message:

“This model doesn't have variable importances”{code}

@exalate-issue-sync
Copy link
Author

Tomas Fryda commented: [~accountid:5dc4f5bbb6e6b50c58af0624] The problem with python is easy to solve (currently it expects that a user always provides the {{server}} keyword which should be optional).

The R part seems to me that is correct, I didn’t find any mention in the documentation that {{varimp}} should be supported in R. You can use {{birds_pca@model$importance}} or {{summary(birds_pca)}} to display the “varimp”.

Maybe I am missing something, I looked just in [https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html#faq|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html#faq|smart-link] and the {{varimp}} is there mentioned only for python to get the same result as in R’s {{@model$importance}}.

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8167
Assignee: Tomas Fryda
Reporter: Neema Mashayekhi
State: Resolved
Fix Version: 3.32.1.4
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#5517

@h2o-ops h2o-ops closed this as completed May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants