Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLM Beta Constraints not consistent and not working sometimes as well #7681

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 13 comments
Closed

Comments

@exalate-issue-sync
Copy link

df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv",
col_types={"DEFAULT_PAYMENT_NEXT_MONTH": "enum"})
constraints = h2o.H2OFrame({'names':["LIMIT_BAL"],
'lower_bounds': [0.0001]
})
glm_beta = H2OGeneralizedLinearEstimator(model_id="beta_glm",
beta_constraints=constraints
)
glm_beta.train(y="DEFAULT_PAYMENT_NEXT_MONTH", training_frame=df)

@exalate-issue-sync
Copy link
Author

Megan Kurka commented: It may also be nice if you can pass in the beta_constraints as an R list or Python dictionary rather than an H2OFrame. It adds an extra step the user needs to do.

Here is a Python example:

{code:python}beta_constraints = {'LIMIT_BAL': {'upper_bound': 1e6, ‘lower_bound’: 0.0001}}{code}

@exalate-issue-sync
Copy link
Author

Wendy commented: Very clear.

@exalate-issue-sync
Copy link
Author

Wendy commented: Need to add a check in code to make sure constraints are satisfied from michalk.

@exalate-issue-sync
Copy link
Author

Wendy commented: Michalk has reported this in [https://github.com//pull/5252/files|https://github.com//pull/5252/files|smart-link] .

{noformat}#!/usr/bin/env python# -- encoding: utf-8 --from future import absolute_import, division, print_function, unicode_literals
import h2ofrom h2o.estimators
import H2OGeneralizedLinearEstimator
from tests import pyunit_utils

def test_glm_beta_constraints():
df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv", col_types={"DEFAULT_PAYMENT_NEXT_MONTH": "enum"})
lb_limit_bal = 0.0001
constraints = h2o.H2OFrame({'names':["LIMIT_BAL"],
'lower_bounds': [lb_limit_bal],
'upper_bounds': [1e6]})

make sure we have the column names in expected order, the backend does weird things when the order is different

constraints = constraints[["names", "lower_bounds", "upper_bounds"]]
glm_beta = H2OGeneralizedLinearEstimator(model_id="beta_glm", beta_constraints=constraints, seed=42)
glm_beta.train(y="DEFAULT_PAYMENT_NEXT_MONTH", training_frame=df)
assert glm_beta.coef()["LIMIT_BAL"] >= lb_limit_bal

if name == "main":
pyunit_utils.standalone_test(test_glm_beta_constraints)
else:
test_glm_beta_constraints(){noformat}

@exalate-issue-sync
Copy link
Author

Wendy commented: It looks like beta constraints are only applied to COD solver only. After discussion with Michalk, I am going to set a warning when people enabled beta constrains with different solver.

@exalate-issue-sync
Copy link
Author

Wendy commented: Actually, the constraints are changed to be standardized before applying it. However, due to line search, the beta values are changed as well. This is why MK’s test is failing.

@exalate-issue-sync
Copy link
Author

Wendy commented: Make sure to check with glmnet implementation.

@exalate-issue-sync
Copy link
Author

Wendy commented: The current implementation is this:

apply beta constraints, then apply line search. Hence, the final coefficients may not satisfy the beta constraints.

My first change is the disable line search and just use beta constraints. However, I don’t think this is a good practice. I believe performance will suffer even though the coefficients will satisfy the beta constrains for suer

Apply line search first and then apply beta constraints at the end. I believe this is the best approach. It finds the best descend direction, then use line search to find the best stepsize to use. Finally the beta constraint will make sure the final coefficients satisfy the beta constraints.

Lacking any theoretical derivation except my gut feeling, I turned to experiments. I run them on binomial and gaussian datasets generated satisfying the GLM model assumptions. Experiments were run with all three suggestions above: current implementation (beta constraints and then line search), beta constraints with no LS (line search), and finally LS and then beta constraints. Here is a table showing the performances for solver coordinate_descent:

!image-20211027-223642.png|width=863,height=603!

I also tested solver IRLSM:

!image-20211027-223711.png|width=1240,height=409!

For both COD and IRLSM, performences between the two cases:

Beta constraints after LS;

Beta constraints with no LS

are very similar. The current implementation is incorrect because the size of the GLM coefficients are not constrainted correctly and hence is no good.

Based on the test results, I will be adding beta constraints after LS and for solver IRLSM and COD.

@exalate-issue-sync
Copy link
Author

Wendy commented: Two more tasks to go:

Check R glmnet implementation

measure performance numbers for GLM toolbox

@exalate-issue-sync
Copy link
Author

Wendy commented: To understand the experiments, it looks like the setup with beta constrains without LS performs best. That said, IRLSM provides better result than COD with beta constraints.

@exalate-issue-sync
Copy link
Author

Wendy commented: Completed implementation of allowing a user to use python dict to specify beta constraints.

@exalate-issue-sync
Copy link
Author

Wendy commented: I did a performance sweep and it does not look like beta constraint fixes affected the training time:

!image-20211103-235408.png|width=1058,height=174!

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7973
Assignee: Wendy
Reporter: Megan Kurka
State: Resolved
Fix Version: 3.34.0.4
Attachments: Available (Count: 3)
Development PRs: Available

Linked PRs from JIRA

#5252
#5842

Attachments From Jira

Attachment Name: image-20211027-223642.png
Attached By: Wendy
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7973/image-20211027-223642.png

Attachment Name: image-20211027-223711.png
Attached By: Wendy
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7973/image-20211027-223711.png

Attachment Name: image-20211103-235408.png
Attached By: Wendy
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7973/image-20211103-235408.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant