GLM Beta Constraints not consistent and not working sometimes as well #7681

exalate-issue-sync · 2023-05-11T17:27:32Z

df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv",
col_types={"DEFAULT_PAYMENT_NEXT_MONTH": "enum"})
constraints = h2o.H2OFrame({'names':["LIMIT_BAL"],
'lower_bounds': [0.0001]
})
glm_beta = H2OGeneralizedLinearEstimator(model_id="beta_glm",
beta_constraints=constraints
)
glm_beta.train(y="DEFAULT_PAYMENT_NEXT_MONTH", training_frame=df)

exalate-issue-sync · 2023-05-11T17:27:34Z

Megan Kurka commented: It may also be nice if you can pass in the beta_constraints as an R list or Python dictionary rather than an H2OFrame. It adds an extra step the user needs to do.

Here is a Python example:

{code:python}beta_constraints = {'LIMIT_BAL': {'upper_bound': 1e6, ‘lower_bound’: 0.0001}}{code}

exalate-issue-sync · 2023-05-11T17:27:36Z

Wendy commented: Very clear.

exalate-issue-sync · 2023-05-11T17:27:37Z

Wendy commented: Need to add a check in code to make sure constraints are satisfied from michalk.

exalate-issue-sync · 2023-05-11T17:27:39Z

Wendy commented: Michalk has reported this in [https://github.com//pull/5252/files|https://github.com//pull/5252/files|smart-link] .

{noformat}#!/usr/bin/env python# -- encoding: utf-8 --from future import absolute_import, division, print_function, unicode_literals
import h2ofrom h2o.estimators
import H2OGeneralizedLinearEstimator
from tests import pyunit_utils

def test_glm_beta_constraints():
df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv", col_types={"DEFAULT_PAYMENT_NEXT_MONTH": "enum"})
lb_limit_bal = 0.0001
constraints = h2o.H2OFrame({'names':["LIMIT_BAL"],
'lower_bounds': [lb_limit_bal],
'upper_bounds': [1e6]})

make sure we have the column names in expected order, the backend does weird things when the order is different

constraints = constraints[["names", "lower_bounds", "upper_bounds"]]
glm_beta = H2OGeneralizedLinearEstimator(model_id="beta_glm", beta_constraints=constraints, seed=42)
glm_beta.train(y="DEFAULT_PAYMENT_NEXT_MONTH", training_frame=df)
assert glm_beta.coef()["LIMIT_BAL"] >= lb_limit_bal

if name == "main":
pyunit_utils.standalone_test(test_glm_beta_constraints)
else:
test_glm_beta_constraints(){noformat}

exalate-issue-sync · 2023-05-11T17:27:41Z

Wendy commented: It looks like beta constraints are only applied to COD solver only. After discussion with Michalk, I am going to set a warning when people enabled beta constrains with different solver.

exalate-issue-sync · 2023-05-11T17:27:43Z

Wendy commented: Actually, the constraints are changed to be standardized before applying it. However, due to line search, the beta values are changed as well. This is why MK’s test is failing.

exalate-issue-sync · 2023-05-11T17:27:44Z

Wendy commented: Make sure to check with glmnet implementation.

exalate-issue-sync · 2023-05-11T17:27:46Z

Wendy commented: The current implementation is this:

apply beta constraints, then apply line search. Hence, the final coefficients may not satisfy the beta constraints.

My first change is the disable line search and just use beta constraints. However, I don’t think this is a good practice. I believe performance will suffer even though the coefficients will satisfy the beta constrains for suer

Apply line search first and then apply beta constraints at the end. I believe this is the best approach. It finds the best descend direction, then use line search to find the best stepsize to use. Finally the beta constraint will make sure the final coefficients satisfy the beta constraints.

Lacking any theoretical derivation except my gut feeling, I turned to experiments. I run them on binomial and gaussian datasets generated satisfying the GLM model assumptions. Experiments were run with all three suggestions above: current implementation (beta constraints and then line search), beta constraints with no LS (line search), and finally LS and then beta constraints. Here is a table showing the performances for solver coordinate_descent:

!image-20211027-223642.png|width=863,height=603!

I also tested solver IRLSM:

!image-20211027-223711.png|width=1240,height=409!

For both COD and IRLSM, performences between the two cases:

Beta constraints after LS;

Beta constraints with no LS

are very similar. The current implementation is incorrect because the size of the GLM coefficients are not constrainted correctly and hence is no good.

Based on the test results, I will be adding beta constraints after LS and for solver IRLSM and COD.

exalate-issue-sync · 2023-05-11T17:27:48Z

Wendy commented: Two more tasks to go:

Check R glmnet implementation

measure performance numbers for GLM toolbox

exalate-issue-sync · 2023-05-11T17:27:49Z

Wendy commented: To understand the experiments, it looks like the setup with beta constrains without LS performs best. That said, IRLSM provides better result than COD with beta constraints.

exalate-issue-sync · 2023-05-11T17:27:51Z

Wendy commented: Completed implementation of allowing a user to use python dict to specify beta constraints.

exalate-issue-sync · 2023-05-11T17:27:53Z

Wendy commented: I did a performance sweep and it does not look like beta constraint fixes affected the training time:

!image-20211103-235408.png|width=1058,height=174!

h2o-ops · 2023-05-14T20:48:16Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-7973
Assignee: Wendy
Reporter: Megan Kurka
State: Resolved
Fix Version: 3.34.0.4
Attachments: Available (Count: 3)
Development PRs: Available

Linked PRs from JIRA

#5252
#5842

Attachments From Jira

Attachment Name: image-20211027-223642.png
Attached By: Wendy
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7973/image-20211027-223642.png

Attachment Name: image-20211027-223711.png
Attached By: Wendy
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7973/image-20211027-223711.png

Attachment Name: image-20211103-235408.png
Attached By: Wendy
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-7973/image-20211103-235408.png

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.34.0.4 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM Beta Constraints not consistent and not working sometimes as well #7681

GLM Beta Constraints not consistent and not working sometimes as well #7681

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023

GLM Beta Constraints not consistent and not working sometimes as well #7681

GLM Beta Constraints not consistent and not working sometimes as well #7681

Comments

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

make sure we have the column names in expected order, the backend does weird things when the order is different

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

Beta constraints after LS;

Beta constraints with no LS

exalate-issue-sync bot commented May 11, 2023

Check R glmnet implementation

measure performance numbers for GLM toolbox

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

exalate-issue-sync bot commented May 11, 2023

h2o-ops commented May 14, 2023