-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-7366: GAM cross-validation #4869
Conversation
3b622d8
to
e1eed2e
Compare
h2o-algos/src/main/java/hex/gam/MatrixFrameUtils/GenerateGamMatrixOneColumn.java
Outdated
Show resolved
Hide resolved
h2o-algos/src/main/java/hex/gam/MatrixFrameUtils/GenerateGamMatrixOneColumn.java
Show resolved
Hide resolved
e1eed2e
to
930e5e0
Compare
18a3b4f
to
578911e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wendycwong, thank you for this PR. It looks good to me. I have just a note about using CamelCase style as much as possible to keep our Java code clear and readable.
@@ -31,7 +35,15 @@ | |||
|
|||
|
|||
public class GAM extends ModelBuilder<GAMModel, GAMModel.GAMParameters, GAMModel.GAMModelOutput> { | |||
|
|||
double[][] _knots; | |||
boolean _cv_on = false; // set to true if cross-validation is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any other internal rule for using an underscore in Java code than at the beginning of a parameter? Because the standard is to use CamelCase. It is not clear to have two different styles in our Java code.
boolean _cv_on = false; // set to true if cross-validation is enabled | |
boolean _cvOn = false; // set to true if cross-validation is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _cvOn is a good idea. I changed it.
3a4ebe9
to
45c41b3
Compare
I'll be finishing this PR as Wendy is on vacation. |
45c41b3
to
58fb008
Compare
PUBDEV-7366: Added processing to choose best lambda/alpha from cross-validation results PUBDEV-7366: Split init into two parts to accomodate cross-validation. PUBDEV-7366: Incorporate Pavel code review comments. incorporate veronika comments
GamMojoModelTest fixed. One failure was resolved by rebasing (the NaN problem that's already fixed on master). I'll go through the code one more time tomorrow and then merge or ask for further reviews eventually. |
Added two more tests for multinomial and regression, did some minor code polishing (was not really required) and did one micro-optimization - validation keys were copied to the model for no reason. Just moved the pointer there 😉 . Waiting for tests. Feel free to review @michalkurka and others. No big changes since the recent reviews. |
…ing validation keys to model - small optimization.
best_lambda = g._output._best_lambda; | ||
} | ||
|
||
g.write_lock(_job); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a copy&paste from GLM; it doesn't seem needed here because the GAM models are not modified here
init(true); //this can change the seed if it was set to -1 | ||
if (_doInit) // disable when in CV and building the main model | ||
init(true); //this can change the seed if it was set to -1 | ||
validateGamParameters(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why do we need this, however, if we keep _doInit
then we should do
if (_doInit) {
init(true)
} else {
validateGamParameters();
}
however, skipping init is suspicious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this. I though this is there because of the way GAM is instantiated, but it seems to be false theory. I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only things that seems to be really effectively required from init is the validateGamParameters
anyway.
1e005ce
to
c40a93b
Compare
h2o-algos/src/main/java/hex/gam/MatrixFrameUtils/GenerateGamMatrixOneColumn.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Pscheidl
There are 4 other key management issues, solved 3 of them, working on the last one. Those were shadowed by the former problems. |
Resolved. One validation frame lifecycle was to blame - it was not properly deleted, as the IcedHashSet was not initialized if validation frame was declared by the user. It was only initialized when cross validation was done. This is no longer true and all the tests pass locally. An important issue the newly added tests caught. One more round on Jenkins and we're good to go. Minor change really in terms of code. |
This one dataset is reused and it's deleted in |
Oh, now I see what you're asking about. Let me fix that really quickly. |
* PUBDEV-7366: Enabled and support GAM cross-validation on backend. PUBDEV-7366: Added processing to choose best lambda/alpha from cross-validation results PUBDEV-7366: Split init into two parts to accomodate cross-validation. PUBDEV-7366: Incorporate Pavel code review comments. incorporate veronika comments * JUnit tests for multinom/regression. Small code style fixes. Not copying validation keys to model - small optimization. * Always evaluate init on xval. Test key retention/removal. * Produce Frame with even chunk layout for spline calculations. * Assign validKeys and update job in one block. * Remove commented unused code * Proper cleanup of xval frames. * Remove unused Scope.exit(). Simplify model locking. Co-authored-by: Pavel Pscheidl <pavel@h2o.ai>
This reverts commit dcfc82f.
* PUBDEV-7366: Enabled and support GAM cross-validation on backend. PUBDEV-7366: Added processing to choose best lambda/alpha from cross-validation results PUBDEV-7366: Split init into two parts to accomodate cross-validation. PUBDEV-7366: Incorporate Pavel code review comments. incorporate veronika comments * JUnit tests for multinom/regression. Small code style fixes. Not copying validation keys to model - small optimization. * Always evaluate init on xval. Test key retention/removal. * Produce Frame with even chunk layout for spline calculations. * Assign validKeys and update job in one block. * Remove commented unused code * Proper cleanup of xval frames. * Remove unused Scope.exit(). Simplify model locking. Co-authored-by: Pavel Pscheidl <pavel@h2o.ai> (cherry picked from commit dcfc82f)
This PR completes the work required in JIRA: https://0xdata.atlassian.net/browse/PUBDEV-7366
There are only 7 files to review. The other four are tests.
Users can now use cross-validation to find the best alpha/lambda values when building a GAM model. In the near future, we will be adding more hyper-parameters search capability. Stay tuned...
I have added to Java backend to ensure that cross-validation works for GAM. I added tests in R, Java and Python to make sure everything checks out.