-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement constrained GLM #6722
Comments
wendycwong
added a commit
that referenced
this issue
Jul 17, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R.
wendycwong
added a commit
that referenced
this issue
Aug 2, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722L extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct.
wendycwong
added a commit
that referenced
this issue
Oct 10, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove.
wendycwong
added a commit
that referenced
this issue
Nov 29, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove.
wendycwong
added a commit
that referenced
this issue
Dec 20, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests.
wendycwong
added a commit
that referenced
this issue
Dec 28, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test.
wendycwong
added a commit
that referenced
this issue
Dec 28, 2023
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test.
wendycwong
added a commit
that referenced
this issue
Jan 23, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test. GH-6722: Add exact line search.
I have based my implementation on the following doc: |
wendycwong
added a commit
that referenced
this issue
Feb 6, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test. GH-6722: Add exact line search.
wendycwong
added a commit
that referenced
this issue
Feb 21, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test. GH-6722: Add exact line search.
wendycwong
added a commit
that referenced
this issue
Feb 26, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test. GH-6722: Add exact line search.
wendycwong
pushed a commit
that referenced
this issue
Feb 28, 2024
…parameters to set the number of inner iteations. Fixed bug with state storing the wrong ginfo.
wendycwong
added a commit
that referenced
this issue
Feb 29, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. adding support to remove collinear columns. fixed collinear column test. GH-6722: Add exact line search.
wendycwong
pushed a commit
that referenced
this issue
Feb 29, 2024
…parameters to set the number of inner iteations. Fixed bug with state storing the wrong ginfo.
wendycwong
added a commit
that referenced
this issue
Mar 14, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722:complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: add python tests. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test.
Here is the document describing my constrained GLM implementation: |
wendycwong
added a commit
that referenced
this issue
Apr 5, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test. GH-6722: Allow users to set parameters so that they can control how the constraint parameters change. GH-6722 moved some linear constraint checks before expensive=true GH-6722 make objective() the function call to get training objective results with linear constraints. incorporate Tomas F comments
wendycwong
added a commit
that referenced
this issue
Apr 12, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test. GH-6722: Allow users to set parameters so that they can control how the constraint parameters change. GH-6722 moved some linear constraint checks before expensive=true GH-6722 make objective() the function call to get training objective results with linear constraints. incorporate Tomas F comments Incorporate Veronika Maurever comments.
wendycwong
added a commit
that referenced
this issue
Apr 13, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test. GH-6722: Allow users to set parameters so that they can control how the constraint parameters change. GH-6722 moved some linear constraint checks before expensive=true GH-6722 make objective() the function call to get training objective results with linear constraints. incorporate Tomas F comments Incorporate Veronika Maurever comments.
wendycwong
added a commit
that referenced
this issue
Apr 15, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test. GH-6722: Allow users to set parameters so that they can control how the constraint parameters change. GH-6722 moved some linear constraint checks before expensive=true GH-6722 make objective() the function call to get training objective results with linear constraints. incorporate Tomas F comments
wendycwong
added a commit
that referenced
this issue
Apr 17, 2024
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test. GH-6722: Allow users to set parameters so that they can control how the constraint parameters change. GH-6722 moved some linear constraint checks before expensive=true GH-6722 make objective() the function call to get training objective results with linear constraints. incorporate Tomas F comments
wendycwong
added a commit
that referenced
this issue
Apr 26, 2024
* GH-6722: add ability to have user find glm coefficient names without waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R. GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization. GH-6722: complete tests to make sure constraint extraction with or without standardization is correct. GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints. GH-6722: adding redundant constraint matrix check GH-6722: add QR to check for rank of constraint matrix and added test GH-6722: adding error message about redundant constraints so that user will know which one to remove. GH-6722: added contributinos from constraints for objective function. GH-6722: adding constraint contribution to gram and gradient GH-6722: added test to make sure constraints contribution to gram and gradient is correct. GH-6722: Added constraint parameters update and constraint stopping conditions. GH-6722: finished taken care of gram zero cols and added tests. GH-6722: Add exact line search. GH-6722: Complete constraints/derivative/gram contribution update from beta change. Add short circuit test. GH-6722: Allow users to set parameters so that they can control how the constraint parameters change. GH-6722 moved some linear constraint checks before expensive=true GH-6722 make objective() the function call to get training objective results with linear constraints. Incorporate Tomas F comments Incorporate Veronika Maurever comments.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Support ticket: https://support.h2o.ai/a/tickets/104118
We would like to make a request to add additional constraint options to the H2OGeneralizedLinearEstimator in fitting GLMs. We saw that H2OGeneralizedLinearEstimator already has some constraint options, like the beta_constraints, which can apply upper/lower bounds to the coefficients, and non_negative, which can specify the coefficients to be greater than 0. However, some of our GLM models require additional types of constraints, such as specifying the coefficient of one predictor to be larger than another, based on our intuition or understanding of the relationship of the predictors. So we would like to make a request to have additional constraint options to be added if possible.
To help clarify our request discussed in the email below, we’ll use this model as an example:
Y = intercept + aX1 + bX2 + cZ1 + dZ2 + eX3 + fX4
In this example, this is a GLM model where
Y is the target.
X1 & X2 represent two separate continuous predictors
a is the coefficient for X1
b is the coefficient for X2
Z represents a categorical predictor with several levels
Z1 represents the 1st level of predictor Z
Z2 represents the 2st level of predictor Z
c is the coefficient for Z1 (effect of level 1 of predictor Z)
d is the coefficient for Z2 (effect of level 2 of predictor Z)
e & f and X3 & X4 just represent some additional predictors/coefficients in the model
We would like to make a request for the following constraint options:
Both linear equality constraints and linear inequality constraints
Example of a linear equality constraint here would be 2a + 2c = 1, constraining on the coefficients for X1 & Z1
Example of a linear inequality constraint here would be 2a + 2c > 1
Be able to specify one or more predictors in the same constraint. For example:
2a = 1 (equality constraint on just one predictor)
2a > 1 (inequality constraint on just one predictor)
2a + 3b - 2c = 1 (equality constraint on multiple predictors)
2a + 3b - 2c > 1 (inequality constraint on multiple predictors)
Have the flexibility to specify constraints on only certain levels of a categorical predictor in a constraint. For example:
2a + 2c = 1, this puts a constraint on Z1, which is the first level of categorical predictor Z. The coefficient for Z2, which is the second level of predictor Z, is not mentioned in the constraints and thus is not constrained
2a + 2c > 1
Be able to specify both categorical and continuous predictors in the same constraint. For example:
2a + 2c = 1, this puts a constraint on X1, which is a continuous predictor and Z1, which is a level in a categorical predictor
2a + 2c > 1
For inequality constraints, if we could have options to do both
strictly greater than/less than (e.g. 2a + 3c > 1) and
greater than or equal to/less than or equal to (e.g. 2a + 3c >= 1)
that would be great. If we could only do greater than or equal to/less than or equal to (and not strictly greater than/less than), that’s fine too. (Greater than or equal to/less than or equal to is what I saw more often in many of the examples I saw in researching optimization with constraints).
Be able to specify multiple equality and/or inequality constraints (including combinations of both) at the same time for a model
In the example above, this could be having the following 4 constraints for the same model at the same time:
2a + 2c = 1
2a + 3d > 1
3b + 2e + 2f= 2
2d – a > 2
P-values: I saw a note in the beta_constraints page that p-values are currently not calculated for constrained problems. If possible, could we also make a request for p-values to still be calculated with models with constraints? The reason we would like to request this is due to p-value being an important metric to us when we are selecting predictors to keep in the model.
In doing some research on this, my understanding is that this is a constrained optimization problem where the maximum log likelihood would be the objective function to be maximized here while considering the equality and inequality constraints during the optimization. In looking at different resources, I found a book called “Constrained Optimization and Lagrange Multiplier Methods” by Dimitri P. Bertsekas that discusses some examples similar to the type of constraints we are requesting. It also provides many potential approaches in solving the optimization with these type of constraints.
I’ll reference a few notes in the book in case these would be helpful:
Section 1.4 Constrained Minimization: This section provides an introduction to both equality constrained & inequality constrained problems in general. The type of constraints mentioned in this section (h(x) as equality constraints and g(x) as inequality constraints) are similar to the ones we are requesting.
Section 4.4 Lagrangian Methods – Local Convergence: This section discussed the method of solving a system of Lagrangian equations, which I saw referenced quite often as a method for optimization with constraints
There’s also a couple of other optimization methods mentioned in the book as well, such as the
penalty function method (section 2.1),
method of multipliers (section 2.2),
and exact penalty functions (section 4.1-4.3).
Some of the methods for inequalities are discussed in section 3. In section 3.1, there’s a discussion on solving problems with inequality constraints by turning the inequality problem into an equality problem through adding additional variables into the constraint
Section 4.4.3 also talks about two ways of solving for inequality constraints:
using the active set method (splitting the inequality constraints into active vs. inactive constraints)
consider the inequality constraints directly and solve using quadratic programming subproblems
I also found this paper called “Sequential Quadratic Programming” by Paul T Boggs and Jon W Tolle that discusses solving for constraints similar to the ones we are requesting in section 1 Introduction of the paper. The rest of the sections in the paper also discuss various methods in solving the relevant constrained optimization problem.
The text was updated successfully, but these errors were encountered: