Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-7366: GAM cross-validation #4869

Merged
merged 8 commits into from
Sep 23, 2020
Merged

Conversation

wendycwong
Copy link
Contributor

@wendycwong wendycwong commented Aug 22, 2020

This PR completes the work required in JIRA: https://0xdata.atlassian.net/browse/PUBDEV-7366

There are only 7 files to review. The other four are tests.

Users can now use cross-validation to find the best alpha/lambda values when building a GAM model. In the near future, we will be adding more hyper-parameters search capability. Stay tuned...

I have added to Java backend to ensure that cross-validation works for GAM. I added tests in R, Java and Python to make sure everything checks out.

@wendycwong wendycwong force-pushed the PUBDEV_7366_gam_cross_validation branch 5 times, most recently from 3b622d8 to e1eed2e Compare August 25, 2020 18:13
@wendycwong wendycwong force-pushed the PUBDEV_7366_gam_cross_validation branch from e1eed2e to 930e5e0 Compare August 27, 2020 16:30
@wendycwong wendycwong force-pushed the PUBDEV_7366_gam_cross_validation branch 4 times, most recently from 18a3b4f to 578911e Compare September 1, 2020 04:09
Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendycwong, thank you for this PR. It looks good to me. I have just a note about using CamelCase style as much as possible to keep our Java code clear and readable.

@@ -31,7 +35,15 @@


public class GAM extends ModelBuilder<GAMModel, GAMModel.GAMParameters, GAMModel.GAMModelOutput> {

double[][] _knots;
boolean _cv_on = false; // set to true if cross-validation is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other internal rule for using an underscore in Java code than at the beginning of a parameter? Because the standard is to use CamelCase. It is not clear to have two different styles in our Java code.

Suggested change
boolean _cv_on = false; // set to true if cross-validation is enabled
boolean _cvOn = false; // set to true if cross-validation is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _cvOn is a good idea. I changed it.

h2o-algos/src/main/java/hex/gam/GAM.java Show resolved Hide resolved
@wendycwong wendycwong force-pushed the PUBDEV_7366_gam_cross_validation branch 2 times, most recently from 3a4ebe9 to 45c41b3 Compare September 4, 2020 00:43
@Pscheidl Pscheidl self-assigned this Sep 8, 2020
@Pscheidl
Copy link
Contributor

Pscheidl commented Sep 8, 2020

I'll be finishing this PR as Wendy is on vacation.

@Pscheidl Pscheidl force-pushed the PUBDEV_7366_gam_cross_validation branch from 45c41b3 to 58fb008 Compare September 10, 2020 12:51
@Pscheidl Pscheidl requested review from Pscheidl and removed request for Pscheidl September 10, 2020 12:52
PUBDEV-7366: Added processing to choose best lambda/alpha from cross-validation results
PUBDEV-7366: Split init into two parts to accomodate cross-validation.
PUBDEV-7366: Incorporate Pavel code review comments.
incorporate veronika comments
@Pscheidl
Copy link
Contributor

GamMojoModelTest fixed. One failure was resolved by rebasing (the NaN problem that's already fixed on master). I'll go through the code one more time tomorrow and then merge or ask for further reviews eventually.

@Pscheidl
Copy link
Contributor

Added two more tests for multinomial and regression, did some minor code polishing (was not really required) and did one micro-optimization - validation keys were copied to the model for no reason. Just moved the pointer there 😉 .

Waiting for tests. Feel free to review @michalkurka and others. No big changes since the recent reviews.

…ing validation keys to model - small optimization.
best_lambda = g._output._best_lambda;
}

g.write_lock(_job);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a copy&paste from GLM; it doesn't seem needed here because the GAM models are not modified here

init(true); //this can change the seed if it was set to -1
if (_doInit) // disable when in CV and building the main model
init(true); //this can change the seed if it was set to -1
validateGamParameters();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we need this, however, if we keep _doInit then we should do

if (_doInit) {
init(true)
} else {
validateGamParameters();
}

however, skipping init is suspicious

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this. I though this is there because of the way GAM is instantiated, but it seems to be false theory. I'll look into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only things that seems to be really effectively required from init is the validateGamParameters anyway.

@Pscheidl Pscheidl force-pushed the PUBDEV_7366_gam_cross_validation branch 10 times, most recently from 1e005ce to c40a93b Compare September 20, 2020 17:56
Copy link
Contributor

@michalkurka michalkurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Pscheidl

@Pscheidl
Copy link
Contributor

There are 4 other key management issues, solved 3 of them, working on the last one. Those were shadowed by the former problems.

@Pscheidl
Copy link
Contributor

Pscheidl commented Sep 22, 2020

Resolved. One validation frame lifecycle was to blame - it was not properly deleted, as the IcedHashSet was not initialized if validation frame was declared by the user. It was only initialized when cross validation was done. This is no longer true and all the tests pass locally. An important issue the newly added tests caught.

One more round on Jenkins and we're good to go. Minor change really in terms of code.

@Pscheidl
Copy link
Contributor

This one dataset is reused and it's deleted in remove_impl of GamModel. That's the original design. I just made sure it's deleted in all two cases (user-provided validation data) and not just xval.

@Pscheidl
Copy link
Contributor

Oh, now I see what you're asking about. Let me fix that really quickly.

@Pscheidl Pscheidl merged commit dcfc82f into rel-zeno Sep 23, 2020
@Pscheidl Pscheidl deleted the PUBDEV_7366_gam_cross_validation branch September 23, 2020 07:02
sebhrusen pushed a commit that referenced this pull request Sep 24, 2020
* PUBDEV-7366: Enabled and support GAM cross-validation on backend.
PUBDEV-7366: Added processing to choose best lambda/alpha from cross-validation results
PUBDEV-7366: Split init into two parts to accomodate cross-validation.
PUBDEV-7366: Incorporate Pavel code review comments.
incorporate veronika comments

* JUnit tests for multinom/regression. Small code style fixes. Not copying validation keys to model - small optimization.

* Always evaluate init on xval. Test key retention/removal.

* Produce Frame with even chunk layout for spline calculations.

* Assign validKeys and update job in one block.

* Remove commented unused code

* Proper cleanup of xval frames.

* Remove unused Scope.exit(). Simplify model locking.

Co-authored-by: Pavel Pscheidl <pavel@h2o.ai>
michalkurka pushed a commit that referenced this pull request Sep 28, 2020
flaviusburca pushed a commit to mware-solutions/h2o-3 that referenced this pull request Apr 21, 2021
* PUBDEV-7366: Enabled and support GAM cross-validation on backend.
PUBDEV-7366: Added processing to choose best lambda/alpha from cross-validation results
PUBDEV-7366: Split init into two parts to accomodate cross-validation.
PUBDEV-7366: Incorporate Pavel code review comments.
incorporate veronika comments

* JUnit tests for multinom/regression. Small code style fixes. Not copying validation keys to model - small optimization.

* Always evaluate init on xval. Test key retention/removal.

* Produce Frame with even chunk layout for spline calculations.

* Assign validKeys and update job in one block.

* Remove commented unused code

* Proper cleanup of xval frames.

* Remove unused Scope.exit(). Simplify model locking.

Co-authored-by: Pavel Pscheidl <pavel@h2o.ai>
(cherry picked from commit dcfc82f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants