Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAM not working with weight column #7252

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments
Closed

GAM not working with weight column #7252

exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

Some dumb ass (me) forget to check GAM with weights column.

@exalate-issue-sync
Copy link
Author

Wendy commented: Code from Michalk:

{noformat} h2o_model = H2OGeneralizedAdditiveEstimator(family="tweedie",
gam_columns=["X3"],
weights_column="W",
lambda_=0,
tweedie_variance_power=1.5,
tweedie_link_power=0)

h2o_model.train(x=["X1", "X2"], y="Y", training_frame=train)

train["W"] = train_w_clone
h2o_model.predict(train){noformat}

@exalate-issue-sync
Copy link
Author

Wendy commented: Check out offset columns as well since you are already at it.

@exalate-issue-sync
Copy link
Author

Wendy commented: Here is the complete code from Arun:

{noformat}np.random.seed(1234)
n_rows = 10

data = {
"X1": np.random.randn(n_rows),
"X2": np.random.randn(n_rows),
"X3": np.random.randn(n_rows),
"W": np.random.choice([10, 20], size=n_rows),
"Y": np.random.choice([0, 0, 0, 0, 0, 10, 20, 30], size=n_rows)
}

train = h2o.H2OFrame(pd.DataFrame(data))
print(train)
h2o_model = H2OGeneralizedAdditiveEstimator(family="tweedie",
gam_columns=["X3"],
weights_column="W",
lambda_=0,
tweedie_variance_power=1.5,
tweedie_link_power=0)
h2o_model.train(x=["X1", "X2"], y="Y", training_frame=train)

h2o_model.predict(train){noformat}

The strange part is after the execution of h2o_model.train() causes the train dataframe missing chunks in the weight column:

  • {H2OServerError}HTTP 500 Server Error:
  • Server error water.util.DistributedException:
  • Error: DistributedException from /192.168.86.20:54321: 'Missing chunk 0 for vector $04ff04000000ffffffff$_be690d82bb89f0d48e91987ff4014394; Vec info: is not in DKV; home=/192.168.86.20:54321; self=/192.168.86.20:54321'
  • Request: None
  • Stacktrace: DistributedException from /192.168.86.20:54321: 'Missing chunk 0 for vector $04ff04000000ffffffff$_be690d82bb89f0d48e91987ff4014394; Vec info: is not in DKV; home=/192.168.86.20:54321; self=/192.168.86.20:54321', caused by java.lang.IllegalStateException: Missing chunk 0 for vector $04ff04000000ffffffff$_be690d82bb89f0d48e91987ff4014394; Vec info: is not in DKV; home=/192.168.86.20:54321; self=/192.168.86.20:54321
  • water.MRTask.getResult(MRTask.java:655)
  • water.MRTask.getResult(MRTask.java:665)
  • water.MRTask.doAll(MRTask.java:525)
  • water.MRTask.doAll(MRTask.java:492)
  • water.fvec.Frame.deepSlice(Frame.java:1249)
  • water.rapids.ast.prims.mungers...

@exalate-issue-sync
Copy link
Author

Wendy commented: Looks like the weight vector is deleted. Went back and realized that I added Scope.track(weight) by mistake. This is only necessary if I am generating a weight vector of ones!

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8407
Assignee: Wendy
Reporter: Wendy
State: Resolved
Fix Version: 3.34.0.4
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#5882
#5889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant