-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLM Beta Constraints not consistent and not working sometimes as well #7681
Comments
Megan Kurka commented: It may also be nice if you can pass in the beta_constraints as an R list or Python dictionary rather than an H2OFrame. It adds an extra step the user needs to do. Here is a Python example: {code:python}beta_constraints = {'LIMIT_BAL': {'upper_bound': 1e6, ‘lower_bound’: 0.0001}}{code} |
Wendy commented: Very clear. |
Wendy commented: Need to add a check in code to make sure constraints are satisfied from michalk. |
Wendy commented: Michalk has reported this in [https://github.com//pull/5252/files|https://github.com//pull/5252/files|smart-link] . {noformat}#!/usr/bin/env python# -- encoding: utf-8 --from future import absolute_import, division, print_function, unicode_literals def test_glm_beta_constraints(): make sure we have the column names in expected order, the backend does weird things when the order is differentconstraints = constraints[["names", "lower_bounds", "upper_bounds"]] if name == "main": |
Wendy commented: It looks like beta constraints are only applied to COD solver only. After discussion with Michalk, I am going to set a warning when people enabled beta constrains with different solver. |
Wendy commented: Actually, the constraints are changed to be standardized before applying it. However, due to line search, the beta values are changed as well. This is why MK’s test is failing. |
Wendy commented: Make sure to check with glmnet implementation. |
Wendy commented: The current implementation is this: apply beta constraints, then apply line search. Hence, the final coefficients may not satisfy the beta constraints. My first change is the disable line search and just use beta constraints. However, I don’t think this is a good practice. I believe performance will suffer even though the coefficients will satisfy the beta constrains for suer Apply line search first and then apply beta constraints at the end. I believe this is the best approach. It finds the best descend direction, then use line search to find the best stepsize to use. Finally the beta constraint will make sure the final coefficients satisfy the beta constraints. Lacking any theoretical derivation except my gut feeling, I turned to experiments. I run them on binomial and gaussian datasets generated satisfying the GLM model assumptions. Experiments were run with all three suggestions above: current implementation (beta constraints and then line search), beta constraints with no LS (line search), and finally LS and then beta constraints. Here is a table showing the performances for solver coordinate_descent: !image-20211027-223642.png|width=863,height=603! I also tested solver IRLSM: !image-20211027-223711.png|width=1240,height=409! For both COD and IRLSM, performences between the two cases: Beta constraints after LS;Beta constraints with no LSare very similar. The current implementation is incorrect because the size of the GLM coefficients are not constrainted correctly and hence is no good. Based on the test results, I will be adding beta constraints after LS and for solver IRLSM and COD. |
Wendy commented: Two more tasks to go: Check R glmnet implementationmeasure performance numbers for GLM toolbox |
Wendy commented: To understand the experiments, it looks like the setup with beta constrains without LS performs best. That said, IRLSM provides better result than COD with beta constraints. |
Wendy commented: Completed implementation of allowing a user to use python dict to specify beta constraints. |
Wendy commented: I did a performance sweep and it does not look like beta constraint fixes affected the training time: !image-20211103-235408.png|width=1058,height=174! |
JIRA Issue Migration Info Jira Issue: PUBDEV-7973 Linked PRs from JIRA Attachments From Jira Attachment Name: image-20211027-223642.png Attachment Name: image-20211027-223711.png Attachment Name: image-20211103-235408.png |
df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/kaggle/CreditCard/creditcard_train_cat.csv",
col_types={"DEFAULT_PAYMENT_NEXT_MONTH": "enum"})
constraints = h2o.H2OFrame({'names':["LIMIT_BAL"],
'lower_bounds': [0.0001]
})
glm_beta = H2OGeneralizedLinearEstimator(model_id="beta_glm",
beta_constraints=constraints
)
glm_beta.train(y="DEFAULT_PAYMENT_NEXT_MONTH", training_frame=df)
The text was updated successfully, but these errors were encountered: